alternative solution

This commit is contained in:
Joey Hess 2015-10-07 11:23:27 -04:00
parent 4bfaaf184c
commit 23b8f6c1fe

View file

@ -2,6 +2,10 @@ Concurrent dropping of a file has problems when drop --from is
used. (Also when the assistant or sync --content decided to drop from a used. (Also when the assistant or sync --content decided to drop from a
remote.) remote.)
[[!toc]]
# refresher
First, let's remember how it works in the case where we're just dropping First, let's remember how it works in the case where we're just dropping
from 2 repos concurrently. git-annex uses locking to detect and prevent from 2 repos concurrently. git-annex uses locking to detect and prevent
data loss: data loss:
@ -43,6 +47,8 @@ Yay, still ok.
Locking works in those cases to prevent concurrent dropping of a file. Locking works in those cases to prevent concurrent dropping of a file.
# the bug
But, when drop --from is used, the locking doesn't work: But, when drop --from is used, the locking doesn't work:
<pre> <pre>
@ -67,6 +73,8 @@ as part of its check of numcopies, and keep it locked
while it's asking B to drop it. Then when B tells A to drop it, while it's asking B to drop it. Then when B tells A to drop it,
it'll be locked and that'll fail (and vice-versa). it'll be locked and that'll fail (and vice-versa).
# the bug part 2
<pre> <pre>
Three repos; C might be a special remote, so w/o its own locking: Three repos; C might be a special remote, so w/o its own locking:
@ -108,6 +116,8 @@ Note that this is analgous to the fix above; in both cases
the change is from checking if content is in a location, to locking it in the change is from checking if content is in a location, to locking it in
that location while performing a drop from another location. that location while performing a drop from another location.
# the bug part 3 (where it gets really nasty)
<pre> <pre>
4 repos; C and D might be special remotes, so w/o their own locking: 4 repos; C and D might be special remotes, so w/o their own locking:
@ -126,14 +136,19 @@ How do we get locking in this case?
Adding locking to C and D is not a general option, because special remotes Adding locking to C and D is not a general option, because special remotes
are dumb key/value stores; they may have no locking operations. are dumb key/value stores; they may have no locking operations.
## a solution: require locking
What could be done is, change from checking if the remote has content, to What could be done is, change from checking if the remote has content, to
trying to lock it there. If the remote doesn't support locking, it can't trying to lock it there. If the remote doesn't support locking, it can't
be guaranteed to have a copy. be guaranteed to have a copy. Require N locked copies for a drop to
succeed.
So, drop --from would no longer be supported in these configurations. So, drop --from would no longer be supported in these configurations.
To drop the content from C, B would have to --force the drop, or move the To drop the content from C, B would have to --force the drop, or move the
content from C to B, and then drop it from B. content from C to B, and then drop it from B.
### impact when using assistant/sync --content
Need to consider whether this might cause currently working topologies Need to consider whether this might cause currently working topologies
with the assistant/sync --content to no longer work. Eg, might content with the assistant/sync --content to no longer work. Eg, might content
pile up in a transfer remote? pile up in a transfer remote?
@ -162,3 +177,26 @@ pile up in a transfer remote?
> and then later C, and only then be removed from A. > and then later C, and only then be removed from A.
> If moves were used, the object moves from A to B, and so there's only > If moves were used, the object moves from A to B, and so there's only
> 1 copy instead of the 2 as before, in the interim until C gets connected. > 1 copy instead of the 2 as before, in the interim until C gets connected.
## a solution: require (minimal) locking
Instead of requiring N locked copies of content when dropping,
require only 1 locked copy. Check that content is on the other N-1
remotes w/o requiring locking (but use it if the remote supports locking).
This seems likely to behave similarly to using moves to work around the
limitations of the earlier solution, and should be easier to implement in
the assistant/sync --content, as well as less impactful on the manual user.
Unlike using moves, it does not decrease robustness, most of the time;
barring the kind of race this bug is about, numcopies behaves as desired.
When there is a race, some of the non-locked copies might be removed,
dipping below numcopies, but the 1 locked copy remains, so the data is not
entirely lost.
Dipping below desired numcopies in an unusual race condition, and then
doing extra work later to recover may be good enough.
Note that this solution will still result in drop --from failing in some
situations where it works now; manual users still need to switch their
workflows to using moves in such situations.