Both are only at bare proof of concept stage. Still need to deal with
signaling validity and invalidity, and checking it.
And there's a bad bug: After -JN*2 requests, another request hangs!
So, I think it's failing to free up the Annex worker and end of request
lifetime.
Perhaps I need to use this:
https://docs.servant.dev/en/stable/cookbook/managed-resource/ManagedResource.html
The reason to use removeBeforeRemoteEndTime is twofold.
First, removeBefore sends two protocol commands. Currently, the HTTP
protocol runner only supports sending a single command per invocation.
Secondly, the http server gets a monotonic timestamp from the client. So
translating back to a POSIXTime would be annoying.
The timestamp flow with a proxy will be:
- client gets timestamp, which gets the monotonic timestamp from the
proxied remote via the proxy. The timestamp is currently not
proxied when there is a single proxy.
- client calls remove-before
- http server calls removeBeforeRemoteEndTime which sends REMOVE-BEFORE
to the proxied remote.
This is less efficient, but it guarantees that the timestamps that the
client is sending are local timestamps, which turns out to be necessary
for the HTTP PTP protocol server.
Websockets would work, but the problem with using them for this is that
each lockcontent call is a separate websocket connection. And that's an
actual TCP connection. One TCP connection per file dropped would be too
expensive. With http long polling, regular http pipelining can be used,
so it will reuse a TCP connection.
Unfortunately, at least with servant, bi-directional streams with long
polling don't result in true bidirectional full duplex communication.
Servant processes the whole client body stream before generating the server
body stream. I think it's entirely possible to do full bi-directional
communication over http, but it would need changes to servant.
And, there's no way for the client to tell if the server successfully
locked the content, since the server will keep processing the client
stream no matter what.:
So, added a new api endpoint, keeplocked. lockcontent will lock the key
for 10 minutes with retention lock, and then a call to keeplocked will
keep it locked for as long as needed. This does mean that there will
need to be a Map of locks by key, and I will probably want to add
some kind of lock identifier that lockcontent returns.
Enough to let lockcontent routes be included and servant-client be used.
But not enough to use servant-client with those routes. May need to
implement a separate runner for that part of the protocol?
Also some misc other stuff needed to use servant-client.
And fix exposing of UUID in the JSON types. UUID does actually have
aeson instances, but they're used elsewhere (metadata --batch, although
only included to get it to compile, not actually used in there) and not
suitable for use here since this must work with every possible UUID.
lockcontent had to be disabled until I can implement HasClient ClientM WebSocket
and in clientGet, it's not clear how to use the v1 and v0 versions,
which don't have a DataLengthHeader
Added v2-v0 endpoints. These are tedious, but will be needed in order to
use the HTTP protocol to proxy to repositories with older git-annex,
where git-annex-shell will be speaking an older version of the protocol.
Changed GET to use 422 when the content is not present. 404 is needed to
detect when a protocol version is not supported.
Managed to avoid netstrings. Actually, using netstrings while streaming
lazy ByteString turns out to be very difficult. So instead, have a
header that specifies the expected amount of data, and then it can just
arrange to send a different amount of data if it needs to indicate
INVALID.
Also improved the interface for GET of a key.
This will be easy to implement with servant. It's also very efficient,
and fairly future-proof. Eg, could add another frame with other data.
This does make it a bit harder to use this protocol, but netstrings
probably take about 5 minutes to implement? Let's see...
import Text.Read
import Data.List
toNetString :: String -> String
toNetString s = show (length s) ++ ":" ++ s ++ ","
nextNetString :: String -> Maybe (String, String)
nextNetString s = case break (== ':') s of
([], _) -> Nothing
(sn, rest) -> do
n <- readMaybe sn
let (v, rest') = splitAt n (drop 1 rest)
return (v, drop 1 rest')
Ok, well, that took about 10 minutes ;-)
For clusters, the timestamps have to be translated, since each node can
have its own idea about what time it is. To translate a timestamp, the
proxy remembers what time it asked the node for a timestamp in
GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE.
This does mean that a remove from a cluster has to call GETTIMESTAMP on
every node before dropping from nodes. Not very efficient. Although
currently it tries to drop from every single node anyway, which is also
not very efficient.
I thought about caching the GETTIMESTAMP from the nodes on the first
call. That would improve efficiency. But, since monotonic clocks on
!Linux don't advance when the computer is suspended, consider what might
happen if one node was suspended for a while, then came back. Its
monotonic timestamp would end up behind where the proxying expects it to
be. Would that result in removing when it shouldn't, or refusing to
remove when it should? Have not thought it through. Either way, a
cluster behaving strangly for an extended period of time because one
of its nodes was briefly asleep doesn't seem like good behavior.
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.
Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.
Made Remote.Git check expiry when dropping from a local remote.
Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.
Fixing the remaining 2 build warnings should complete this work.
Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?
There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
Better to advance while suspended, that way a stale content retention
lock will expire while suspended.
The fallback the Monotonic to support older kernels assumes that there
are not systems where Boottime sometimes succeeds and sometimes fails.
If that ever happened, the clock would probably not monotonically
advance as it read from different clocks on different calls! I don't see
any indication in clock_gettime(2) errono list that it can fail
intermittently so probably this is ok.