0133b7e5a8
In cases where numcopies checks prevented the resumed move from dropping the object from the source repository, it now relies on a log of recent moves to replicate the behavior of the interrupted command. Performance: Probably noticable impact, since it has to add to the log, check the log, and remove from the log. Seems worth it to avoid this annoying edge case. The log functions are pretty well optimised to avoid unncessary work. An performance improvement to make later would be to avoid cleanup doing anything if it's not written to the log file, and has confirmed that the log file does not contain the log line. This commit was sponsored by Jake Vosloo on Patreon.
83 lines
4 KiB
Markdown
83 lines
4 KiB
Markdown
When a `git annex move` is interrupted at a point where the content has
|
|
been transferred, but not yet dropped from the remote, resuming the move
|
|
will often refuse to drop the content, because it would violate numcopies.
|
|
|
|
Eg, if numcopies is 2, and there is only 1 extant copy, on a remote,
|
|
git-annex move --from remote will normally ignore numcopies (since it's not
|
|
getting any worse) and remove the content from the remote after
|
|
transferring it. But, on resume, git-annex sees there are 2 copies and
|
|
numcopies is 2, so it can't drop the copy from the remote.
|
|
|
|
This happens to me often enough to be annoying. Note that being interrupted
|
|
during checksum verification makes it happen, so the window is relatively
|
|
wide.
|
|
|
|
I think it can also happen with move --to, although I can't remember seeing
|
|
that.
|
|
|
|
Perhaps some local state could avoid this problem?
|
|
|
|
--[[Joey]]
|
|
|
|
> One simple way would be to drop the content from the remote before moving
|
|
> it to annex/objects/. Then if the move were interrupted before the drop,
|
|
> it could resume the interrupted transfer, and numcopies would work the
|
|
> same as it did when the move started.
|
|
>
|
|
> > After an interrupted move, whereis would say the content is present,
|
|
> > but eg an annex link to it would be broken. That seems surprising,
|
|
> > and if the user doesn't think to resume the move, fsck would have to be
|
|
> > made to deal with it. I don't much like this approach, it seems to
|
|
> > change an invariant that usually existance of copy on disk is ground
|
|
> > truth, and location tracking tries to reflect it. With this, location
|
|
> > tracking would be correct, but only because the content is in an
|
|
> > unusual place on disk that it can be recovered from.
|
|
>
|
|
> Or: Move to annex/objects/ w/o updating local location log.
|
|
> Then do the drop, updating the remote's location log as now.
|
|
> Then update local location log.
|
|
> >
|
|
> > If interrupted, and then the move is resumed, it will see
|
|
> > there's a local copy, and drop again from the remote. Either that
|
|
> > finishes the interrupted drop, or the drop already happened and it's a
|
|
> > noop. Either way, the local location log then gets updated.
|
|
> > That should clean things up.
|
|
> >
|
|
> > But, if a sync is done with the remote first, and then the move
|
|
> > is resumed, it will no longer think the remote has a copy. This is
|
|
> > where the only copy can appear missing (in whereis). So a fsck
|
|
> > will be needed to recover. Or, move could be made to recover from
|
|
> > this too, noticing the local copy and updating the location log to
|
|
> > reflect it.
|
|
> >
|
|
> > Still, if the move is interrupted and never resumed, after a sync
|
|
> > with the remote, the only copy appears missing, which does seem
|
|
> > potentially confusing.
|
|
|
|
> Local state could be a file listing keys that have had a move started
|
|
> but not finished. When doing the same move, it should be allowed to
|
|
> succeed even if numcopies would prevent it. More accurately, it
|
|
> should disregard the local copy when checking numcopies for a move
|
|
> --from. And for a move --to, it should disregard the remote copy.
|
|
> May need 2 separate lists for the two kinds of moves.
|
|
>
|
|
> > This is complex to implement, but it avoids the gotchas in the earlier
|
|
> > ideas, so I think is best. --[[Joey]]
|
|
|
|
> > > Implementation will involve willDropMakeItWorse,
|
|
> > > which is passed a deststartedwithcopy that currently comes from
|
|
> > > inAnnex/checkPresent. Check the log, and if
|
|
> > > the interrupted move started with the move destination
|
|
> > > not having a copy, pass False.
|
|
|
|
Are there any situations where this would be surprising? Eg, if git-annex
|
|
move were interrupted, and then a year later, run again, and proceeded
|
|
to apparently violate numcopies?
|
|
|
|
Maybe, OTOH I've run into this problem probably weeks after the first move
|
|
got interrupted. Eg, if files are always moved from repo A to repo B,
|
|
leaving repo A empty, this problem can cause stuff to build up on repo A
|
|
unexpectedly. And in such a case, the timing of the resumed move does not
|
|
matter, the user expected files to always get eventually moved from A.
|
|
|
|
[[fixed|done]] --[[Joey]]
|