Commit graph

34404 commits

Author SHA1 Message Date
Joey Hess
68227154fb
switch HTTP P2P protocol to base64url
Base64 can include '/', and with UUIDs and keys both used in routes,
the encoding needs to avoid that. Use base64url everywhere in the HTTP
protocol for consistency.
2024-07-11 12:31:41 -04:00
Joey Hess
14e0f778b7
simplify 2024-07-11 11:50:44 -04:00
Joey Hess
2228d56db3
serveGet invalidation 2024-07-11 11:42:32 -04:00
Joey Hess
a7383b5c59
move serveruuid into routes
In particular the generic get route needs it, so that when a single http
server is serving multiple repositories, it knows what repository to
use.
2024-07-11 11:19:20 -04:00
Joey Hess
3b37b9e53f
fix serveGet hang
This came down to SendBytes waiting on the waitv. Nothing ever filled
it.

Only Annex.Proxy needs the waitv, and it handles filling it. So make it
optional.
2024-07-11 07:46:52 -04:00
Joey Hess
8cb1332407
update 2024-07-10 16:10:08 -04:00
Joey Hess
f9b7ce7224
add Annex worker pool to P2PHttp
This will be needed for get and store, since those need to run Annex
actions.

withLocalP2PConnections will also probably use it.
2024-07-10 12:19:47 -04:00
Joey Hess
7c588a5791
implement remove-before
The reason to use removeBeforeRemoteEndTime is twofold.

First, removeBefore sends two protocol commands. Currently, the HTTP
protocol runner only supports sending a single command per invocation.

Secondly, the http server gets a monotonic timestamp from the client. So
translating back to a POSIXTime would be annoying.

The timestamp flow with a proxy will be:

- client gets timestamp, which gets the monotonic timestamp from the
  proxied remote via the proxy. The timestamp is currently not
  proxied when there is a single proxy.
- client calls remove-before
- http server calls removeBeforeRemoteEndTime which sends REMOVE-BEFORE
  to the proxied remote.
2024-07-10 10:03:26 -04:00
Joey Hess
48f76cb3e8
implement serveRemove and send WWW-Authenticate header on auth failure 2024-07-10 09:13:01 -04:00
Joey Hess
97d0fc9b65
git-annex p2phttp options 2024-07-10 00:01:55 -04:00
Joey Hess
6a8a4d1775
authentication is implemented
just need to make Command.P2PHttp generate a GetServerMode from options
2024-07-09 20:54:47 -04:00
Joey Hess
08371c3745
started on auth 2024-07-09 17:30:55 -04:00
Joey Hess
b5b3d8cde2
update 2024-07-09 14:30:50 -04:00
Joey Hess
a3dd8b4bcb
capture API version in routes
Needed so the client can send it.
2024-07-09 12:04:29 -04:00
Joey Hess
751b8e0baf
implemented serveCheckPresent
Still need a way to run Proto though
2024-07-09 09:08:42 -04:00
Joey Hess
3f402a20a8
implement Locker 2024-07-08 21:00:10 -04:00
Joey Hess
b758b01692
add lockids to http p2p protocol 2024-07-08 20:18:55 -04:00
Joey Hess
69c4f07ab0
finish get API 2024-07-08 13:27:50 -04:00
Joey Hess
82d66ede5e
convert lockcontent api to http long polling
Websockets would work, but the problem with using them for this is that
each lockcontent call is a separate websocket connection. And that's an
actual TCP connection. One TCP connection per file dropped would be too
expensive. With http long polling, regular http pipelining can be used,
so it will reuse a TCP connection.

Unfortunately, at least with servant, bi-directional streams with long
polling don't result in true bidirectional full duplex communication.
Servant processes the whole client body stream before generating the server
body stream. I think it's entirely possible to do full bi-directional
communication over http, but it would need changes to servant.

And, there's no way for the client to tell if the server successfully
locked the content, since the server will keep processing the client
stream no matter what.:

So, added a new api endpoint, keeplocked. lockcontent will lock the key
for 10 minutes with retention lock, and then a call to keeplocked will
keep it locked for as long as needed. This does mean that there will
need to be a Map of locks by key, and I will probably want to add
some kind of lock identifier that lockcontent returns.
2024-07-08 12:57:46 -04:00
Joey Hess
838169ee86
status 2024-07-07 16:16:11 -04:00
Joey Hess
1dbb5ec70d
servant API type is complete 2024-07-07 12:59:12 -04:00
Joey Hess
4133063ab1
Merge branch 'master' into httpproto 2024-07-07 12:08:24 -04:00
Joey Hess
86ce3bf1e4
started servant implementation of HTTP P2P protocol 2024-07-07 12:08:10 -04:00
Joey Hess
9595f77584
Merge branch 'master' of ssh://git-annex.branchable.com 2024-07-05 15:37:43 -04:00
Joey Hess
40306d3fcf
finalizing HTTP P2p protocol some more
Added v2-v0 endpoints. These are tedious, but will be needed in order to
use the HTTP protocol to proxy to repositories with older git-annex,
where git-annex-shell will be speaking an older version of the protocol.

Changed GET to use 422 when the content is not present. 404 is needed to
detect when a protocol version is not supported.
2024-07-05 15:34:58 -04:00
Joey Hess
2fb3ef4d41
finalizing HTTP P2P protocol
Managed to avoid netstrings. Actually, using netstrings while streaming
lazy ByteString turns out to be very difficult. So instead, have a
header that specifies the expected amount of data, and then it can just
arrange to send a different amount of data if it needs to indicate
INVALID.

Also improved the interface for GET of a key.
2024-07-05 15:03:51 -04:00
Joey Hess
5e564947d7
use netstrings for framing binary data with json at the end
This will be easy to implement with servant. It's also very efficient,
and fairly future-proof. Eg, could add another frame with other data.

This does make it a bit harder to use this protocol, but netstrings
probably take about 5 minutes to implement? Let's see...

import Text.Read
import Data.List

toNetString :: String -> String
toNetString s = show (length s) ++ ":" ++ s ++ ","

nextNetString :: String -> Maybe (String, String)
nextNetString s = case break (== ':') s of
        ([], _) -> Nothing
        (sn, rest) -> do
                n <- readMaybe sn
                let (v, rest') = splitAt n (drop 1 rest)
                return (v, drop 1 rest')

Ok, well, that took about 10 minutes ;-)
2024-07-05 11:53:03 -04:00
Joey Hess
95ba4d4480
thoughts on CGI, and use json 2024-07-05 10:08:43 -04:00
git-annex@4a0625db6ced1ac00744697d5bac41393bcde646
81c9808cfa Added a comment 2024-07-05 10:22:46 +00:00
Joey Hess
3f9569e27f
update 2024-07-04 15:26:05 -04:00
Joey Hess
2ca51fe947
Merge branch 'master' of ssh://git-annex.branchable.com 2024-07-04 15:18:17 -04:00
Joey Hess
0bfdc57d25
update 2024-07-04 15:18:06 -04:00
Joey Hess
f452bd448a
REMOVE-BEFORE and GETTIMESTAMP proxying
For clusters, the timestamps have to be translated, since each node can
have its own idea about what time it is. To translate a timestamp, the
proxy remembers what time it asked the node for a timestamp in
GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE.

This does mean that a remove from a cluster has to call GETTIMESTAMP on
every node before dropping from nodes. Not very efficient. Although
currently it tries to drop from every single node anyway, which is also
not very efficient.

I thought about caching the GETTIMESTAMP from the nodes on the first
call. That would improve efficiency. But, since monotonic clocks on
!Linux don't advance when the computer is suspended, consider what might
happen if one node was suspended for a while, then came back. Its
monotonic timestamp would end up behind where the proxying expects it to
be. Would that result in removing when it shouldn't, or refusing to
remove when it should? Have not thought it through. Either way, a
cluster behaving strangly for an extended period of time because one
of its nodes was briefly asleep doesn't seem like good behavior.
2024-07-04 15:09:34 -04:00
Joey Hess
99b7a0cfe9
use REMOVE-BEFORE in P2P protocol
Only clusters still need to be fixed to close this todo.
2024-07-04 13:47:38 -04:00
Joey Hess
1243af4a18
toward SafeDropProof expiry checking
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.

Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.

Made Remote.Git check expiry when dropping from a local remote.

Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.

Fixing the remaining 2 build warnings should complete this work.

Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?

There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
2024-07-04 12:39:06 -04:00
Joey Hess
98dbfb6bbd
Merge branch 'master' into p2p_locking 2024-07-04 09:52:02 -04:00
Joey Hess
f69661ab65
status 2024-07-03 17:04:12 -04:00
Joey Hess
543c610a31
REMOVE-BEFORE and GETTIMESTAMP
Only implemented server side, not used client side yet.

And not yet implemented for proxies/clusters, for which there's a build
warning about unhandled cases.

This is P2P protocol version 3. Probably will be the only change in that
version..

Added a dependency on clock to access a monotonic clock.
On i386-ancient, that is at version 0.2.0.0.
2024-07-03 17:01:58 -04:00
yarikoptic
933254b4fe Added a comment 2024-07-03 20:42:11 +00:00
Joey Hess
665d3d66a5
Merge branch 'master' into p2p_locking 2024-07-03 15:54:14 -04:00
Joey Hess
44b3136fdf
update 2024-07-03 15:53:25 -04:00
Joey Hess
6a95eb08ce
status 2024-07-03 15:01:34 -04:00
Joey Hess
d2b27ca136
add content retention files
This allows lockContentShared to lock content for eg, 10 minutes and
if the process then gets terminated before it can unlock, the content
will remain locked for that amount of time.

The Windows implementation is not yet tested.

In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio
or remotedaemon is serving the P2P protocol, and is asked to
LOCKCONTENT, and that process gets killed, the content will not be
subject to deletion. This is not a perfect solution to
doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most
of the way there, without needing any P2P protocol changes.

This is only done in v10 and higher repositories (or on Windows). It
might be possible to backport it to v8 or earlier, but it would
complicate locking even further, and without a separate lock file, might
be hard. I think that by the time this fix reaches a given user, they
will probably have been running git-annex 10.x long enough that their v8
repositories will have upgraded to v10 after the 1 year wait. And it's
not as if git-annex hasn't already been subject to this problem (though
I have not heard of any data loss caused by it) for 6 years already, so
waiting another fraction of a year on top of however long it takes this
fix to reach users is unlikely to be a problem.
2024-07-03 14:58:39 -04:00
Joey Hess
badcb502a4
todo 2024-07-03 13:15:09 -04:00
Joey Hess
487a11a4af
Merge branch 'assistantpointerrace' 2024-07-02 18:04:40 -04:00
Joey Hess
24d63e8c8e
update 2024-07-02 18:04:29 -04:00
Joey Hess
b2a24a1669
update 2024-07-02 16:16:37 -04:00
Joey Hess
069b976698
drafting P2P protocol over http 2024-07-02 16:14:45 -04:00
Joey Hess
623f483a68
add news item for git-annex 10.20240701 2024-07-02 12:31:23 -04:00
Joey Hess
12a0ca9656
assistant: Fix a race condition that could cause a pointer file to get ingested into the annex
This was caused by commit fb8ab2469d putting
an isPointerFile check in the wrong place. So if the file was not a pointer
file at that point, but got replaced by one before the file got locked
down, the pointer file would be ingested into the annex.

The fix is simply to move the isPointerFile check to after safeToAdd locks
down the file. Now if the file changes to a pointer file after the
isPointerFile check, ingestion will see that it changed after lockdown,
and will refuse to add it to the annex.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-07-02 12:25:30 -04:00