force verification when resuming download

When resuming a download and not using a rolling checksummer like rsync,
the partial file we start with might contain garbage, in the case where a
file changed as it was being downloaded. So, disabling verification on
resumes risked a bad object being put into the annex.

Even downloads with rsync are currently affected. It didn't seem worth the
added complexity to special case those to prevent verification, especially
since git-annex is using rsync less often now.

This commit was sponsored by Brock Spratlen on Patreon.
This commit is contained in:
Joey Hess 2018-03-13 14:50:49 -04:00
parent 31e1adc005
commit 4015c5679a
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 41 additions and 19 deletions

View file

@ -11,29 +11,30 @@ It would be nice to support annex.verify=false when it's safe but not
when the file got modified, but if it added an extra round trip
to the P2P protocol, that could lose some of the speed gains.
Resumes make this difficult. What if a file starts to be transferred,
The way that git-annex-shell recvkey handles this is the client
communicates to it if it's sending an unlocked file, which forces
verification. Otherwise, verification can be skipped.
Seems the best we could do with the P2P protocol, barring adding
rsync-style rolling hashing to it, is to detect when a file got modified
as it was being sent, and inform the peer that the data it got is invalid.
It can then force verification.
> [[done]] --[[Joey]]
----
A related problem is resumes. What if a file starts to be transferred,
gets changed while it's transferred so some bad bytes are sent, then the
transfer is interrupted, and later is resumed from a different remote
that has the correct content. How can it tell that the bad data was sent
in this case?
----
The way that git-annex-shell recvkey handles this is the client
communicates to it if it's sending an unlocked file, which forces
verification. Otherwise, verification can be skipped.
In the case where an upload is started from one repository and later
resumed by another, rsync wipes out any differences, so if the first
repository was unlocked, and the second is locked, it's safe for recvkey to
treat it locked and skip verification.
Seems the best we could do with the P2P protocol, barring adding
rsync-style rolling hashing to it, is to detect when a file got modified
as it was being sent, and inform the peer that the data it got is bad.
It can then throw it away rather than putting the bad data into the
repository.
This is not really unique to the P2P protocol -- special remotes
can be written to support resuming. The web special remote does; there may
be external special remotes that do too. While the content of a key on
@ -47,9 +48,14 @@ AlwaysVerify, unless the remote returns Verified. This can be done in
Annex.Content.getViaTmp, so it will affect all downloads involving the tmp
key for a file.
> [[done]] --[[Joey]]
This would change handling of resumes of downloads using rsync too.
But those are always safe to skip verification of, although they don't
quite do a full verification of the key's hash. To still allow disabling of
verification of those, could add a third state in between UnVerified and
Verified, that means it's sure it's gotten exactly the same bytes as are on
the remote.
> Decided this added too much complexity for such an edge case, so
> skipped dealing with it. --[[Joey]]