git-annex/doc/bugs/sync_content_copies_unwanted_removed_files.mdwn
Joey Hess d032b0885d
use MatchingKey when a Key is known
This fixes a bug where a file that was not preferred content could be
transferred to a remote. This happened when the file got deleted after
the sync started running.

The only time checkMatcher is run without a Key is in calls to
checkFileMatcher, which are only done by add, addurl, import, and
smudge --clean. Those won't be affected by this kind of race. Anything
else that might be precaching and have a similar race as sync will also
be fixed, but I don't know if it actually affected anything other than
sync.

As well as fixing a bug, this also probably makes sync and --auto faster
by avoiding the redundant key lookup.

This commit was sponsored by Graham Spencer on Patreon.
2020-11-09 15:17:22 -04:00

42 lines
2 KiB
Markdown

A couple of times I have seen a git-annex sync -C . upload files to the
remote that are not part of the remote's preferred content. In the most
recent case, I had moved the file to another directory while the sync was
downloading a previous file. I suspect that the file being removed causes
preferred content checks to mess up. --[[Joey]]
> Reproduced reliably as follows: Have a bigfile in the remote
> and a smallfile in the local repo. Have the remote's preferred content
> be "not (copies=1)". Have the local repo's preferred content
> `include=*`. Run `git-annex sync -C.` while that's running, `git rm
> smallfile`. (bigfile has to be big enough to give time to run that
> command)
>
> smallfile gets sent to the remote unexpectedly. If it's not deleted
> first, that does not happen.
>
> --[[Joey]]
> > Hmm, so limitCopies uses checkKey, which for MatchingFile, uses
> > lookupKey. And with a deleted file, lookupKey falls
> > into a case where it uses catKeyFile, but since the file has been
> > removed from the index, that also fails. And when it fails,
> > that means it assumes it does not have 1 copy, and so the
> > "not (copies=1)" evalulates to true, so it thinks it's matched as
> > preferred content.
> >
> > The preferred content is being checked via wantSend, which already knows
> > the key in this case.
> >
> > It knows the key already because sync uses seekFilteredKeys and so it's
> > already streamed the file though and looked up the key before
> > it's deleted. If the file got deleted before that could look up the
> > key, it would skip it. It may be that recent changes to add this
> > streaming for performance led to this bug.
> >
> > So one fix might be to change it to use MatchingKey,
> > and so avoid the later lookup? Investigating the git history
> > and the code I see no reason not to do this. It didn't used to be that
> > MatchingKey included an AssociatedFile, which is probably why it was
> > not used in this case originally.
[[fixed|done]]