fsck: avoid redundant checksum when transfer is Verified
When downloading content from a remote, if the content is able to be verified during the transfer, skip checksumming it a second time. Note that in this case, the fsck output does not include "(checksum)" which it does when the checksumming is done separately from the download. This commit was sponsored by Brock Spratlen on Patreon.
This commit is contained in:
parent
5ee14db037
commit
5783a8d081
3 changed files with 41 additions and 8 deletions
|
@ -7,6 +7,8 @@ git-annex (8.20210331) UNRELEASED; urgency=medium
|
|||
* diffdriver: Support unlocked files.
|
||||
* forget: Preserve currently exported trees, avoiding problems with
|
||||
exporttree remotes in some unusual circumstances.
|
||||
* fsck: When downloading content from a remote, if the content is able
|
||||
to be verified during the transfer, skip checksumming it a second time.
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Thu, 01 Apr 2021 12:17:26 -0400
|
||||
|
||||
|
|
|
@ -156,17 +156,20 @@ performRemote key afile backend numcopies remote =
|
|||
dispatch (Right True) = withtmp $ \tmpfile ->
|
||||
getfile tmpfile >>= \case
|
||||
Nothing -> go True Nothing
|
||||
Just True -> go True (Just tmpfile)
|
||||
Just False -> do
|
||||
Just (Right verification) -> go True (Just (tmpfile, verification))
|
||||
Just (Left _) -> do
|
||||
warning "failed to download file from remote"
|
||||
void $ go True Nothing
|
||||
return False
|
||||
dispatch (Right False) = go False Nothing
|
||||
go present localcopy = check
|
||||
go present lv = check
|
||||
[ verifyLocationLogRemote key ai remote present
|
||||
, verifyRequiredContent key ai
|
||||
, withLocalCopy localcopy $ checkKeySizeRemote key remote ai
|
||||
, withLocalCopy localcopy $ checkBackendRemote backend key remote ai
|
||||
, withLocalCopy (fmap fst lv) $ checkKeySizeRemote key remote ai
|
||||
, case fmap snd lv of
|
||||
Just Verified -> return True
|
||||
_ -> withLocalCopy (fmap fst lv) $
|
||||
checkBackendRemote backend key remote ai
|
||||
, checkKeyNumCopies key afile numcopies
|
||||
]
|
||||
ai = mkActionItem (key, afile)
|
||||
|
@ -185,13 +188,13 @@ performRemote key afile backend numcopies remote =
|
|||
cleanup `after` a tmp
|
||||
getfile tmp = ifM (checkDiskSpace (Just (P.takeDirectory tmp)) key 0 True)
|
||||
( ifM (getcheap tmp)
|
||||
( return (Just True)
|
||||
( return (Just (Right UnVerified))
|
||||
, ifM (Annex.getState Annex.fast)
|
||||
( return Nothing
|
||||
, Just . isRight <$> tryNonAsync (getfile' tmp)
|
||||
, Just <$> tryNonAsync (getfile' tmp)
|
||||
)
|
||||
)
|
||||
, return (Just False)
|
||||
, return Nothing
|
||||
)
|
||||
getfile' tmp = Remote.retrieveKeyFile remote key (AssociatedFile Nothing) (fromRawFilePath tmp) dummymeter
|
||||
dummymeter _ = noop
|
||||
|
|
|
@ -0,0 +1,28 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-04-14T17:07:50Z"
|
||||
content="""
|
||||
Only some remotes support checksums in-flight; this recently includes
|
||||
downloads from other git-annex repositories over ssh. Progress
|
||||
on that front is being tracked at
|
||||
<https://git-annex.branchable.com/todo/OPT__58_____34__bundle__34___get_+_check___40__of_checksum__41___in_a_single_operation/>
|
||||
Most special remotes can't yet, but that should change eventually
|
||||
for at least some of them.
|
||||
|
||||
I've made fsck notice when content was able to be verified as part of a
|
||||
transfer, and avoid a redundant checksum of them.
|
||||
|
||||
What I've not done, and don't think I will be able to, is make the file
|
||||
not be written to disk by fsck in that case. Since the `retrieveKeyFile`
|
||||
interface is explicitly about writing to a file on disk, it would take ether
|
||||
a whole separate interface being implemented for all remotes that avoids
|
||||
writing to the file when they can checksum in flight, or it would need
|
||||
some change to the `retrieveKeyFile` interface to do the same.
|
||||
|
||||
Neither seems worth the complication to implement just to reduce disk IO in
|
||||
this particular case. And it seems likely that, for files that fit in
|
||||
memory, it never actually reaches disk before it's deleted. Also if this is
|
||||
a concern for you, you can I guess avoid fscking remotes too frequently or
|
||||
use a less fragile medium?
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue