This allows lockContentShared to lock content for eg, 10 minutes and
if the process then gets terminated before it can unlock, the content
will remain locked for that amount of time.
The Windows implementation is not yet tested.
In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio
or remotedaemon is serving the P2P protocol, and is asked to
LOCKCONTENT, and that process gets killed, the content will not be
subject to deletion. This is not a perfect solution to
doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most
of the way there, without needing any P2P protocol changes.
This is only done in v10 and higher repositories (or on Windows). It
might be possible to backport it to v8 or earlier, but it would
complicate locking even further, and without a separate lock file, might
be hard. I think that by the time this fix reaches a given user, they
will probably have been running git-annex 10.x long enough that their v8
repositories will have upgraded to v10 after the 1 year wait. And it's
not as if git-annex hasn't already been subject to this problem (though
I have not heard of any data loss caused by it) for 6 years already, so
waiting another fraction of a year on top of however long it takes this
fix to reach users is unlikely to be a problem.
This was caused by commit fb8ab2469d putting
an isPointerFile check in the wrong place. So if the file was not a pointer
file at that point, but got replaced by one before the file got locked
down, the pointer file would be ingested into the annex.
The fix is simply to move the isPointerFile check to after safeToAdd locks
down the file. Now if the file changes to a pointer file after the
isPointerFile check, ingestion will see that it changed after lockdown,
and will refuse to add it to the annex.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
The error message is not displayed to the use, but this mirrors the
behavior when a regular get from a special remote fails. At least now
there is not a protocol error.
Now that storeKey can have a different object file passed to it, this
complication is not needed. This avoids a lot of strange situations,
and will also be needed if streaming is eventually supported.
Wanted to also list a cluster's nodes when showing info for the cluster,
but that's hard because it needs getting the name of the proxying
remote, which is some prefix of the cluster's name, but if the names
contain dashes there's no good way to know which prefix it is.
Still needs some work.
The reason that the waitv is necessary is because without it,
runNet loops back around and reads the next protocol message. But it's
not finished reading the whole bytestring yet, and so it reads some part
of it.
Working, but lots of room for improvement...
Without streaming, so there is a delay before download begins as the
file is retreived from the special remote.
And when resuming it retrieves the whole file from the special remote
*again*.
Also, if the special remote throws an exception, currently it
shows as "protocol error".
This makes eg git-annex get default to using the cluster rather than an
arbitrary node, which is better UI.
The actual cost of accessing a proxied node vs using the cluster is
basically the same. But using the cluster allows smarter load-balancing
to be done on the cluster.
Before it was using a node that might have had a higher cost.
Also threw in a random selection from amoung the low cost nodes. Of
course this is a poor excuse for load balancing, but it's better than
nothing. Most of the time...
The VIA extension is still needed to avoid some extra work and ugly
messages, but this is enough that it actually works.
This filters out the RemoteSides that are a proxied connection via a
remote gateway to the cluster.
The VIA extension will not filter those out, but will send VIA to them
on connect, which will cause the ones that are accessed via the listed
gateways to be filtered out.
Walking a tightrope between security and convenience here, because
git-annex-shell needs to only proxy for things when there has been
an explicit, local action to configure them.
In this case, the user has to have run `git-annex extendcluster`,
which now sets annex-cluster-gateway on the remote.
Note that any repositories that the gateway is recorded to
proxy for will be proxied onward. This is not limited to cluster nodes,
because checking the node log would not add any security; someone could
add any uuid to it. The gateway of course then does its own
checking to determine if it will allow proxying for the remote.
When there are multiple gateways to a cluster, this sets up proxying
for nodes that are accessed via a remote gateway.
Eg, when running in nyc and amsterdam is the remote gateway,
and it has node1 and node2, this sets up proxying for
amsterdam-node1 and amsterdam-node2. A client that has nyc as a remote
will see proxied remotes nyc-amsterdam-node1 and nyc-amsterdam-node2.