avoid uncessary call to inAnnex

sync --content: Avoid a redundant checksum of a file that was
incrementally verified, when used on NTFS and perhaps other filesystems.

When sync has just gotten the content, it does not need to check inAnnex a
second time. On NTFS, for some reason the write of the inode cache after
it gets the content is not immediately able to be read, and with an
empty/non-matching inode cache due to that stale data, inAnnex falls back
to hashing the whole object to determine if it's present.

Sponsored-by: Brock Spratlen on Patreon
This commit is contained in:
Joey Hess 2021-10-01 12:02:35 -04:00
parent 17a31f8e1b
commit b9a1cc512d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 33 additions and 10 deletions

View file

@ -11,6 +11,8 @@ git-annex (8.20210904) UNRELEASED; urgency=medium
* Sped up git-annex smudge --clean by 25%.
* Resume where it left off when copying a file to/from a local git remote
was interrupted.
* sync --content: Avoid a redundant checksum of a file that was
incrementally verified, when used on NTFS and perhaps other filesystems.
-- Joey Hess <id@joeyh.name> Fri, 03 Sep 2021 12:02:55 -0400

View file

@ -809,10 +809,11 @@ syncFile ebloom rs af k = do
let (have, lack) = partition (\r -> Remote.uuid r `elem` locs) rs
got <- anyM id =<< handleget have inhere
putrs <- handleput lack
let inhere' = inhere || got
putrs <- handleput lack inhere'
u <- getUUID
let locs' = concat [if inhere || got then [u] else [], putrs, locs]
let locs' = concat [if inhere' then [u] else [], putrs, locs]
-- To handle --all, a bloom filter is populated with all the keys
-- of files in the working tree in the first pass. On the second
@ -855,14 +856,15 @@ syncFile ebloom rs af k = do
| Remote.readonly r || remoteAnnexReadOnly (Remote.gitconfig r) = return False
| isThirdPartyPopulated r = return False
| otherwise = wantSend True (Just k) af (Remote.uuid r)
handleput lack = catMaybes <$> ifM (inAnnex k)
( forM lack $ \r ->
ifM (wantput r <&&> put r)
( return (Just (Remote.uuid r))
, return Nothing
)
, return []
)
handleput lack inhere
| inhere = catMaybes <$>
( forM lack $ \r ->
ifM (wantput r <&&> put r)
( return (Just (Remote.uuid r))
, return Nothing
)
)
| otherwise = return []
put dest = includeCommandAction $
Command.Move.toStart' dest Command.Move.RemoveNever af k ai si

View file

@ -724,3 +724,5 @@ from 159 to 296985).
Git Annex is great. It works quite nicely with my multi-gigabyte backup files (largest around 180GB) via the BLAKE2B160E backend :)
[[!meta title="windows: sync -C takes longer than get, apparently extra checksum"]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 11"""
date="2021-10-01T15:57:12Z"
content="""
Fixed by avoiding sync calling inAnnex when it knows it has the content,
because it just got it.
This does leave open the possibility that there are similar situations
elsewhere, that lead to either extra work like this, or to incorrect
behavior. Since sqlite write followed by a read is generally something
git-annex is careful of, and also since it is generally careful to have
reasonable behavior is sqlite somehow loses data, I'm not too worried about
incorrect behavior. I feel comfortable closing this bug with just this fix,
despite not getting to the bottom of the issue of why sqlite writes are
not immediately able to be read back on NTFS.
"""]]