Since the annex-tracking-branch is pushed first, git-annex has already
updated the export database when the DATA-PRESENT arrives. Which means
that just using checkPresent is enough to verify that there is some file
on the special remote in the export location for the key.
So, the simplest possible implementation of this happened to work!
(I also tested it with chunked specialremotes, which also works, as long
as the chunk size used is the same as the configured chunk size. In that
case, the lack of a chunk log is not a problem. Doubtful this will ever
make sense to use with a chunked special remote though, that gets pretty
deep into re-implementing git-annex.)
Updated the client side upload tip with a missing step, and reorged for clarity.
Only indicate that we're done with the bytestring once it all gets written.
Otherwise, the end of it may get garbage collected before we can process
it, leading to a hang.
This seems to have been introduced in commit
cdc4bd7443. Which oddly was trying to fix a
very similar problem, but specific to a cluster node. In that commit,
things got out of order, with it signaling it was done with the bytestring
before it has written all of it to the file.
My test case for this bug is a directory special remote
with a file being sent to it via a proxy accessed via ssh or http.
The file was 10 mb, and it hung on the last few kb of it not being
received.
I've also tested this fix in the case of proxying to a cluster node
directory special remote over http, which was the case
cdc4bd7443 was dealing with.
There is a TODO left in the code for exporttree special remotes. If
possible, it should check if one of the export locations contains the
content of the key.
I anticipate lots of external special remote programs will neglect
implementing this. Still, it's the right thing to do to assume that some
of them may write files out of order. Probably most external special
remotes will not be used with a proxy. When someone is using one with a
proxy, they can always get it fixed to send ORDERED.
A one second delay made it seem really choppy and slow when the special
remote was sending content fairly steadily but was bottlenecked on
running gpg on 10 mb chunks.
This does not appreciably increase CPU, although of course if the
special remote is very slow it will add up over time.
It would perhaps be better to use inotify, like tailVerify does.
Currently works for special remotes that don't use fileRetriever. Ones that
do will download to another filename and rename it into place, defeating
the streaming.
This actually benchmarks slightly slower when getting a large file from
a fast proxied special remote. However, when the proxied special remote
is slow, it will be a big win.
Without actually simulating cluster implementation at all. Instead, only
the essential fact that cluster gateways know what changes they have
made to each node of a cluster. That is enough for sims like
sizebalanced_cluster.
Probably a good idea for freezing, but especially I hope this fixes a
problem with git-annex sim run that caused it to sometimes crash in
removeDirectoryRecursive with directory not empty, presumably because a
thread was writing there at the same time.
Without any connections, the step command will not try to do any actions
on a special remote.
But even without any connections, it's still possible for a drop action
explicitly run "on" the special remote to do, when numcopies = 0 or
there is a trusted repo. So guard all actions against running on a
special remote too.
Once it has a list of actions, it can perform them all.
A disappointing optimisation at least in my test case, which it sped up
by less than 1 second out of 12. But still it did make it faster.
Noticed that it was quite slow compared with things like action
sendwanted. Guessed that the slowdown is largely due to every step
doing a simulated git pull/push.
So, rather than always doing a pull/push, only do those when no actions
are found without doing a pull/push.
This does mean that step will sometimes experience a split brain
situation, but that seems like a good thing? Because step ought to
explore as many possible scenarios as it reasonably can.
When getting from a remote, have to check that the repo doing the
getting thinks the remote contains the key, but also that the remote
actually does. Before this bug fix, it would get from a repo that used
to have the key, but that had dropped it since the last git pull.
Had to add Read instances to Key and NumCopies and some other similar
types. I only expect to use those in serializing a sim. Of course, this
risks that implementation changes break reading old data. For a sim,
that would not be a big problem.
Not tested yet. This emulates the same checking that is done when
dropping. Note that when dropping from a special remote it is not able
to make a locked copy.