This commit is contained in:
Joey Hess 2022-01-07 12:19:43 -04:00
parent 022e63cdde
commit 21c0d5be6e
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 29 additions and 0 deletions

View file

@ -222,6 +222,7 @@ getViaTmpFromDisk rsp v key af action = checkallowed $ do
tmpfile <- prepTmp key
resuming <- liftIO $ R.doesPathExist tmpfile
(ok, verification) <- action tmpfile
liftIO $ print ok
-- When the temp file already had content, we don't know if
-- that content is good or not, so only trust if it the action
-- Verified it in passing. Otherwise, force verification even

View file

@ -5,6 +5,10 @@ git-annex (8.20211232) UNRELEASED; urgency=medium
preserve it in the imported tree so it does not get deleted.
* enableremote, renameremote: Better handling of the unusual case where
multiple special remotes have been initialized with the same name.
* Recover from over the wire errors when downloading from remotes,
by deleting the object file when verification of it fails. This allows
the next attempt at a download to succeed, rather than using the same
content and failing again.
-- Joey Hess <id@joeyh.name> Mon, 03 Jan 2022 14:01:14 -0400

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2022-01-07T16:12:20Z"
content="""
Current thinking on deleting corrupted tmp files: If a download succeeds,
and verification then fails, the whole file content has been downloaded,
and is corrupt. So it would be ok to always delete it then, as far as p2p
transfers goes.
For other remotes, the same is often true. The only exceptions are like
rsync and bittorrent, which can recover from corruption on retry. But,
I don't think either rsync or bittorrent will usually write corrupt data
to a file anyway. They would catch over-the-wire corruption with rolling
checksums etc. So, it seems like a verification should never fail after
a successful rsync or bittorrent download. Unless the disk corrupted the
data in the meantime. Which is an unlikely situation, and not one that it's
really necessary for git-annex to recover from with optimal efficiency.
... Oh interesting.. It already is supposed to do that, in
getViaTmpFromDisk. It seems, what is happening is the transfer fails
when all the file content is present, and so it never gets to the point of
verifying it, let alone deleting it.
"""]]