fsck: avoid redundant checksum when transfer is Verified

When downloading content from a remote, if the content is able to be
verified during the transfer, skip checksumming it a second time.

Note that in this case, the fsck output does not include "(checksum)"
which it does when the checksumming is done separately from the download.

This commit was sponsored by Brock Spratlen on Patreon.
This commit is contained in:
Joey Hess 2021-04-14 13:22:54 -04:00
parent 5ee14db037
commit 5783a8d081
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 41 additions and 8 deletions

View file

@ -7,6 +7,8 @@ git-annex (8.20210331) UNRELEASED; urgency=medium
* diffdriver: Support unlocked files.
* forget: Preserve currently exported trees, avoiding problems with
exporttree remotes in some unusual circumstances.
* fsck: When downloading content from a remote, if the content is able
to be verified during the transfer, skip checksumming it a second time.
-- Joey Hess <id@joeyh.name> Thu, 01 Apr 2021 12:17:26 -0400

View file

@ -156,17 +156,20 @@ performRemote key afile backend numcopies remote =
dispatch (Right True) = withtmp $ \tmpfile ->
getfile tmpfile >>= \case
Nothing -> go True Nothing
Just True -> go True (Just tmpfile)
Just False -> do
Just (Right verification) -> go True (Just (tmpfile, verification))
Just (Left _) -> do
warning "failed to download file from remote"
void $ go True Nothing
return False
dispatch (Right False) = go False Nothing
go present localcopy = check
go present lv = check
[ verifyLocationLogRemote key ai remote present
, verifyRequiredContent key ai
, withLocalCopy localcopy $ checkKeySizeRemote key remote ai
, withLocalCopy localcopy $ checkBackendRemote backend key remote ai
, withLocalCopy (fmap fst lv) $ checkKeySizeRemote key remote ai
, case fmap snd lv of
Just Verified -> return True
_ -> withLocalCopy (fmap fst lv) $
checkBackendRemote backend key remote ai
, checkKeyNumCopies key afile numcopies
]
ai = mkActionItem (key, afile)
@ -185,13 +188,13 @@ performRemote key afile backend numcopies remote =
cleanup `after` a tmp
getfile tmp = ifM (checkDiskSpace (Just (P.takeDirectory tmp)) key 0 True)
( ifM (getcheap tmp)
( return (Just True)
( return (Just (Right UnVerified))
, ifM (Annex.getState Annex.fast)
( return Nothing
, Just . isRight <$> tryNonAsync (getfile' tmp)
, Just <$> tryNonAsync (getfile' tmp)
)
)
, return (Just False)
, return Nothing
)
getfile' tmp = Remote.retrieveKeyFile remote key (AssociatedFile Nothing) (fromRawFilePath tmp) dummymeter
dummymeter _ = noop

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-04-14T17:07:50Z"
content="""
Only some remotes support checksums in-flight; this recently includes
downloads from other git-annex repositories over ssh. Progress
on that front is being tracked at
<https://git-annex.branchable.com/todo/OPT__58_____34__bundle__34___get_+_check___40__of_checksum__41___in_a_single_operation/>
Most special remotes can't yet, but that should change eventually
for at least some of them.
I've made fsck notice when content was able to be verified as part of a
transfer, and avoid a redundant checksum of them.
What I've not done, and don't think I will be able to, is make the file
not be written to disk by fsck in that case. Since the `retrieveKeyFile`
interface is explicitly about writing to a file on disk, it would take ether
a whole separate interface being implemented for all remotes that avoids
writing to the file when they can checksum in flight, or it would need
some change to the `retrieveKeyFile` interface to do the same.
Neither seems worth the complication to implement just to reduce disk IO in
this particular case. And it seems likely that, for files that fit in
memory, it never actually reaches disk before it's deleted. Also if this is
a concern for you, you can I guess avoid fscking remotes too frequently or
use a less fragile medium?
"""]]