d032b0885d
This fixes a bug where a file that was not preferred content could be transferred to a remote. This happened when the file got deleted after the sync started running. The only time checkMatcher is run without a Key is in calls to checkFileMatcher, which are only done by add, addurl, import, and smudge --clean. Those won't be affected by this kind of race. Anything else that might be precaching and have a similar race as sync will also be fixed, but I don't know if it actually affected anything other than sync. As well as fixing a bug, this also probably makes sync and --auto faster by avoiding the redundant key lookup. This commit was sponsored by Graham Spencer on Patreon.
42 lines
2 KiB
Markdown
42 lines
2 KiB
Markdown
A couple of times I have seen a git-annex sync -C . upload files to the
|
|
remote that are not part of the remote's preferred content. In the most
|
|
recent case, I had moved the file to another directory while the sync was
|
|
downloading a previous file. I suspect that the file being removed causes
|
|
preferred content checks to mess up. --[[Joey]]
|
|
|
|
> Reproduced reliably as follows: Have a bigfile in the remote
|
|
> and a smallfile in the local repo. Have the remote's preferred content
|
|
> be "not (copies=1)". Have the local repo's preferred content
|
|
> `include=*`. Run `git-annex sync -C.` while that's running, `git rm
|
|
> smallfile`. (bigfile has to be big enough to give time to run that
|
|
> command)
|
|
>
|
|
> smallfile gets sent to the remote unexpectedly. If it's not deleted
|
|
> first, that does not happen.
|
|
>
|
|
> --[[Joey]]
|
|
|
|
> > Hmm, so limitCopies uses checkKey, which for MatchingFile, uses
|
|
> > lookupKey. And with a deleted file, lookupKey falls
|
|
> > into a case where it uses catKeyFile, but since the file has been
|
|
> > removed from the index, that also fails. And when it fails,
|
|
> > that means it assumes it does not have 1 copy, and so the
|
|
> > "not (copies=1)" evalulates to true, so it thinks it's matched as
|
|
> > preferred content.
|
|
> >
|
|
> > The preferred content is being checked via wantSend, which already knows
|
|
> > the key in this case.
|
|
> >
|
|
> > It knows the key already because sync uses seekFilteredKeys and so it's
|
|
> > already streamed the file though and looked up the key before
|
|
> > it's deleted. If the file got deleted before that could look up the
|
|
> > key, it would skip it. It may be that recent changes to add this
|
|
> > streaming for performance led to this bug.
|
|
> >
|
|
> > So one fix might be to change it to use MatchingKey,
|
|
> > and so avoid the later lookup? Investigating the git history
|
|
> > and the code I see no reason not to do this. It didn't used to be that
|
|
> > MatchingKey included an AssociatedFile, which is probably why it was
|
|
> > not used in this case originally.
|
|
|
|
[[fixed|done]]
|