use MatchingKey when a Key is known

This fixes a bug where a file that was not preferred content could be
transferred to a remote. This happened when the file got deleted after
the sync started running.

The only time checkMatcher is run without a Key is in calls to
checkFileMatcher, which are only done by add, addurl, import, and
smudge --clean. Those won't be affected by this kind of race. Anything
else that might be precaching and have a similar race as sync will also
be fixed, but I don't know if it actually affected anything other than
sync.

As well as fixing a bug, this also probably makes sync and --auto faster
by avoiding the redundant key lookup.

This commit was sponsored by Graham Spencer on Patreon.
This commit is contained in:
Joey Hess 2020-11-09 15:12:08 -04:00
parent d84371ff73
commit d032b0885d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 36 additions and 3 deletions

View file

@ -72,7 +72,7 @@ checkMatcher :: FileMatcher Annex -> Maybe Key -> AssociatedFile -> AssumeNotPre
checkMatcher matcher mkey afile notpresent notconfigured d
| isEmpty matcher = notconfigured
| otherwise = case (mkey, afile) of
(_, AssociatedFile (Just file)) -> go =<< fileMatchInfo file
(Nothing, AssociatedFile (Just file)) -> go =<< fileMatchInfo file
(Just key, _) -> go (MatchingKey key afile)
_ -> d
where

View file

@ -1,3 +1,11 @@
git-annex (8.20201104) UNRELEASED; urgency=medium
* sync --content: Fix a bug where a file that was not preferred content
could be transferred to a remote. This happened when the file got deleted
after the sync started running.
-- Joey Hess <id@joeyh.name> Mon, 09 Nov 2020 15:15:20 -0400
git-annex (8.20201103) upstream; urgency=medium
* move: Improve resuming a move that was interrupted after the object

View file

@ -5,8 +5,8 @@ downloading a previous file. I suspect that the file being removed causes
preferred content checks to mess up. --[[Joey]]
> Reproduced reliably as follows: Have a bigfile in the remote
> and a smallfile in the local repo. Have the remote in group source,
> preferred content standard. Have the local repo's preferred content
> and a smallfile in the local repo. Have the remote's preferred content
> be "not (copies=1)". Have the local repo's preferred content
> `include=*`. Run `git-annex sync -C.` while that's running, `git rm
> smallfile`. (bigfile has to be big enough to give time to run that
> command)
@ -15,3 +15,28 @@ preferred content checks to mess up. --[[Joey]]
> first, that does not happen.
>
> --[[Joey]]
> > Hmm, so limitCopies uses checkKey, which for MatchingFile, uses
> > lookupKey. And with a deleted file, lookupKey falls
> > into a case where it uses catKeyFile, but since the file has been
> > removed from the index, that also fails. And when it fails,
> > that means it assumes it does not have 1 copy, and so the
> > "not (copies=1)" evalulates to true, so it thinks it's matched as
> > preferred content.
> >
> > The preferred content is being checked via wantSend, which already knows
> > the key in this case.
> >
> > It knows the key already because sync uses seekFilteredKeys and so it's
> > already streamed the file though and looked up the key before
> > it's deleted. If the file got deleted before that could look up the
> > key, it would skip it. It may be that recent changes to add this
> > streaming for performance led to this bug.
> >
> > So one fix might be to change it to use MatchingKey,
> > and so avoid the later lookup? Investigating the git history
> > and the code I see no reason not to do this. It didn't used to be that
> > MatchingKey included an AssociatedFile, which is probably why it was
> > not used in this case originally.
[[fixed|done]]