force verification when resuming download
When resuming a download and not using a rolling checksummer like rsync, the partial file we start with might contain garbage, in the case where a file changed as it was being downloaded. So, disabling verification on resumes risked a bad object being put into the annex. Even downloads with rsync are currently affected. It didn't seem worth the added complexity to special case those to prevent verification, especially since git-annex is using rsync less often now. This commit was sponsored by Brock Spratlen on Patreon.
This commit is contained in:
parent
31e1adc005
commit
4015c5679a
5 changed files with 41 additions and 19 deletions
|
@ -303,9 +303,19 @@ getViaTmp v key action = checkDiskSpaceToGet key False $
|
||||||
getViaTmpFromDisk :: VerifyConfig -> Key -> (FilePath -> Annex (Bool, Verification)) -> Annex Bool
|
getViaTmpFromDisk :: VerifyConfig -> Key -> (FilePath -> Annex (Bool, Verification)) -> Annex Bool
|
||||||
getViaTmpFromDisk v key action = do
|
getViaTmpFromDisk v key action = do
|
||||||
tmpfile <- prepTmp key
|
tmpfile <- prepTmp key
|
||||||
|
resuming <- liftIO $ doesFileExist tmpfile
|
||||||
(ok, verification) <- action tmpfile
|
(ok, verification) <- action tmpfile
|
||||||
|
-- When the temp file already had content, we don't know if
|
||||||
|
-- that content is good or not, so only trust if it the action
|
||||||
|
-- Verified it in passing. Otherwise, force verification even
|
||||||
|
-- if the VerifyConfig normally disables it.
|
||||||
|
let verification' = if resuming
|
||||||
|
then case verification of
|
||||||
|
Verified -> Verified
|
||||||
|
_ -> MustVerify
|
||||||
|
else verification
|
||||||
if ok
|
if ok
|
||||||
then ifM (verifyKeyContent v verification key tmpfile)
|
then ifM (verifyKeyContent v verification' key tmpfile)
|
||||||
( ifM (pruneTmpWorkDirBefore tmpfile (moveAnnex key))
|
( ifM (pruneTmpWorkDirBefore tmpfile (moveAnnex key))
|
||||||
( do
|
( do
|
||||||
logStatus key InfoPresent
|
logStatus key InfoPresent
|
||||||
|
|
|
@ -11,8 +11,9 @@ git-annex (6.20180228) UNRELEASED; urgency=medium
|
||||||
will no longer be preserved when copying them to and from ssh remotes.
|
will no longer be preserved when copying them to and from ssh remotes.
|
||||||
Other remotes never supported preserving that information, so
|
Other remotes never supported preserving that information, so
|
||||||
this is not considered a regression.
|
this is not considered a regression.
|
||||||
* Note that annex.verify=false does not currently prevent verification
|
* Some downloads will be verified, even when annex.verify=false.
|
||||||
of contents received from git-annex-shell when using the new protocol.
|
This is done in some edge cases where there's a likelyhood than an
|
||||||
|
object was downloaded incorrectly.
|
||||||
* Support exporttree=yes for rsync special remotes.
|
* Support exporttree=yes for rsync special remotes.
|
||||||
* Dial back optimisation when building on arm, which prevents
|
* Dial back optimisation when building on arm, which prevents
|
||||||
ghc and llc from running out of memory when optimising some files.
|
ghc and llc from running out of memory when optimising some files.
|
||||||
|
|
|
@ -151,12 +151,13 @@ instance ToUUID (RemoteA a) where
|
||||||
|
|
||||||
data Verification
|
data Verification
|
||||||
= UnVerified
|
= UnVerified
|
||||||
-- ^ content was not verified during transfer, but is probably
|
-- ^ Content was not verified during transfer, but is probably
|
||||||
-- ok, so if verification is disabled, don't verify it
|
-- ok, so if verification is disabled, don't verify it
|
||||||
| Verified
|
| Verified
|
||||||
-- ^ content of key was verified during transfer
|
-- ^ Content was verified during transfer, so don't verify it
|
||||||
|
-- again.
|
||||||
| MustVerify
|
| MustVerify
|
||||||
-- ^ content likely to have been altered during transfer,
|
-- ^ Content likely to have been altered during transfer,
|
||||||
-- verify even if verification is normally disabled
|
-- verify even if verification is normally disabled
|
||||||
|
|
||||||
unVerified :: Monad m => m Bool -> m (Bool, Verification)
|
unVerified :: Monad m => m Bool -> m (Bool, Verification)
|
||||||
|
|
|
@ -1226,6 +1226,10 @@ Here are all the supported configuration settings.
|
||||||
from remotes. If you trust a remote and don't want the overhead
|
from remotes. If you trust a remote and don't want the overhead
|
||||||
of these checksums, you can set this to `false`.
|
of these checksums, you can set this to `false`.
|
||||||
|
|
||||||
|
Note that even when this is set to `false`, git-annex does verification
|
||||||
|
in some edge cases, where it's likely the case than an
|
||||||
|
object was downloaded incorrectly.
|
||||||
|
|
||||||
* `remote.<name>.annex-export-tracking`
|
* `remote.<name>.annex-export-tracking`
|
||||||
|
|
||||||
When set to a branch name or other treeish, this makes what's exported
|
When set to a branch name or other treeish, this makes what's exported
|
||||||
|
|
|
@ -11,29 +11,30 @@ It would be nice to support annex.verify=false when it's safe but not
|
||||||
when the file got modified, but if it added an extra round trip
|
when the file got modified, but if it added an extra round trip
|
||||||
to the P2P protocol, that could lose some of the speed gains.
|
to the P2P protocol, that could lose some of the speed gains.
|
||||||
|
|
||||||
Resumes make this difficult. What if a file starts to be transferred,
|
The way that git-annex-shell recvkey handles this is the client
|
||||||
|
communicates to it if it's sending an unlocked file, which forces
|
||||||
|
verification. Otherwise, verification can be skipped.
|
||||||
|
|
||||||
|
Seems the best we could do with the P2P protocol, barring adding
|
||||||
|
rsync-style rolling hashing to it, is to detect when a file got modified
|
||||||
|
as it was being sent, and inform the peer that the data it got is invalid.
|
||||||
|
It can then force verification.
|
||||||
|
|
||||||
|
> [[done]] --[[Joey]]
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
A related problem is resumes. What if a file starts to be transferred,
|
||||||
gets changed while it's transferred so some bad bytes are sent, then the
|
gets changed while it's transferred so some bad bytes are sent, then the
|
||||||
transfer is interrupted, and later is resumed from a different remote
|
transfer is interrupted, and later is resumed from a different remote
|
||||||
that has the correct content. How can it tell that the bad data was sent
|
that has the correct content. How can it tell that the bad data was sent
|
||||||
in this case?
|
in this case?
|
||||||
|
|
||||||
----
|
|
||||||
|
|
||||||
The way that git-annex-shell recvkey handles this is the client
|
|
||||||
communicates to it if it's sending an unlocked file, which forces
|
|
||||||
verification. Otherwise, verification can be skipped.
|
|
||||||
|
|
||||||
In the case where an upload is started from one repository and later
|
In the case where an upload is started from one repository and later
|
||||||
resumed by another, rsync wipes out any differences, so if the first
|
resumed by another, rsync wipes out any differences, so if the first
|
||||||
repository was unlocked, and the second is locked, it's safe for recvkey to
|
repository was unlocked, and the second is locked, it's safe for recvkey to
|
||||||
treat it locked and skip verification.
|
treat it locked and skip verification.
|
||||||
|
|
||||||
Seems the best we could do with the P2P protocol, barring adding
|
|
||||||
rsync-style rolling hashing to it, is to detect when a file got modified
|
|
||||||
as it was being sent, and inform the peer that the data it got is bad.
|
|
||||||
It can then throw it away rather than putting the bad data into the
|
|
||||||
repository.
|
|
||||||
|
|
||||||
This is not really unique to the P2P protocol -- special remotes
|
This is not really unique to the P2P protocol -- special remotes
|
||||||
can be written to support resuming. The web special remote does; there may
|
can be written to support resuming. The web special remote does; there may
|
||||||
be external special remotes that do too. While the content of a key on
|
be external special remotes that do too. While the content of a key on
|
||||||
|
@ -47,9 +48,14 @@ AlwaysVerify, unless the remote returns Verified. This can be done in
|
||||||
Annex.Content.getViaTmp, so it will affect all downloads involving the tmp
|
Annex.Content.getViaTmp, so it will affect all downloads involving the tmp
|
||||||
key for a file.
|
key for a file.
|
||||||
|
|
||||||
|
> [[done]] --[[Joey]]
|
||||||
|
|
||||||
This would change handling of resumes of downloads using rsync too.
|
This would change handling of resumes of downloads using rsync too.
|
||||||
But those are always safe to skip verification of, although they don't
|
But those are always safe to skip verification of, although they don't
|
||||||
quite do a full verification of the key's hash. To still allow disabling of
|
quite do a full verification of the key's hash. To still allow disabling of
|
||||||
verification of those, could add a third state in between UnVerified and
|
verification of those, could add a third state in between UnVerified and
|
||||||
Verified, that means it's sure it's gotten exactly the same bytes as are on
|
Verified, that means it's sure it's gotten exactly the same bytes as are on
|
||||||
the remote.
|
the remote.
|
||||||
|
|
||||||
|
> Decided this added too much complexity for such an edge case, so
|
||||||
|
> skipped dealing with it. --[[Joey]]
|
||||||
|
|
Loading…
Reference in a new issue