Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.
Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.
Made Remote.Git check expiry when dropping from a local remote.
Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.
Fixing the remaining 2 build warnings should complete this work.
Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?
There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
The error message is not displayed to the use, but this mirrors the
behavior when a regular get from a special remote fails. At least now
there is not a protocol error.
Now that storeKey can have a different object file passed to it, this
complication is not needed. This avoids a lot of strange situations,
and will also be needed if streaming is eventually supported.
Still needs some work.
The reason that the waitv is necessary is because without it,
runNet loops back around and reads the next protocol message. But it's
not finished reading the whole bytestring yet, and so it reads some part
of it.
Working, but lots of room for improvement...
Without streaming, so there is a delay before download begins as the
file is retreived from the special remote.
And when resuming it retrieves the whole file from the special remote
*again*.
Also, if the special remote throws an exception, currently it
shows as "protocol error".
This allows an error message from a proxied special remote to be
displayed to the client.
In the case where removal from several nodes of a cluster fails,
there can be several errors. What to do? I decided to only show
the first error to the user. Probably in this case the user is not in a
position to do anything about an error message, so best keep it simple.
If the problem with the first node is fixed, they'll see the error from
the next node.
That error is now rethrown on the client, so it will be displayed.
For example:
$ git-annex fsck x --fast --from AMS-dir
fsck x (special remote reports: directory /home/joey/tmp/bench2/dir is not accessible) failed
No protocol version check is needed. Because in order to talk to a
proxied special remote, the client has to be running the upcoming
git-annex release. Which has this fix in it.
This is early, but already working for CHECKPRESENT.
However, when the special remote throws an exception on checkPresent,
this happens:
[2024-06-28 13:22:18.520884287] (P2P.IO) [ThreadId 4] P2P > ERROR directory /home/joey/tmp/bench2/dir is not accessible
[2024-06-28 13:22:18.521053135] (P2P.IO) [ThreadId 4] P2P < ERROR expected SUCCESS or FAILURE
git-annex: client error: expected SUCCESS or FAILURE
(fixing location log) p2pstdio: 1 failed
** Based on the location log, x
** was expected to be present, but its content is missing.
failed
Before it was using a node that might have had a higher cost.
Also threw in a random selection from amoung the low cost nodes. Of
course this is a poor excuse for load balancing, but it's better than
nothing. Most of the time...