Fix bug that prevented uploads to remotes using new-style chunking from resuming after the last successfully uploaded chunk.

"checkPresent baser" was wrong; the baser has a dummy checkPresent action
not the real one. So, to fix this, we need to call preparecheckpresent to
get a checkpresent action that can be used to check if chunks are present.

Note that, for remotes like S3, this means that the preparer is run,
which opens a S3 handle, that will be used for each checkpresent of a
chunk. That's a good thing; if we're resuming an upload that's already many
chunks in, it'll reuse that same http connection for each chunk it checks.
Still, it's not a perfectly ideal thing, since this is a different http
connection that the one that will be used to upload chunks. It would be
nice to improve the API so that both use the same http connection.
This commit is contained in:
Joey Hess 2015-07-16 15:01:10 -04:00
parent 5de3b4d07a
commit afe6a53bca
3 changed files with 37 additions and 3 deletions

View file

@ -184,12 +184,14 @@ specialRemote' cfg c preparestorer prepareretriever prepareremover preparecheckp
-- chunk, then encrypt, then feed to the storer
storeKeyGen k f p enc = safely $ preparestorer k $ safely . go
where
go (Just storer) = sendAnnex k rollback $ \src ->
go (Just storer) = preparecheckpresent k $ safely . go' storer
go Nothing = return False
go' storer (Just checker) = sendAnnex k rollback $ \src ->
displayprogress p k f $ \p' ->
storeChunks (uuid baser) chunkconfig k src p'
(storechunk enc storer)
(checkPresent baser)
go Nothing = return False
checker
go' _ Nothing = return False
rollback = void $ removeKey encr k
storechunk Nothing storer k content p = storer k content p

2
debian/changelog vendored
View file

@ -3,6 +3,8 @@ git-annex (5.20150714) UNRELEASED; urgency=medium
* Improve bash completion code so that "git annex" will also tab
complete. However, git's bash completion script needs a patch,
which I've submitted, for this to work prefectly.
* Fix bug that prevented uploads to remotes using new-style chunking
from resuming after the last successfully uploaded chunk.
-- Joey Hess <id@joeyh.name> Thu, 16 Jul 2015 14:55:07 -0400

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2015-07-16T17:57:44Z"
content="""
This should have been filed as a bug report... I will move the thread to
bugs after posting this comment.
In your obfuscated log, it tries to HEAD GPGHMACSHA1--1111111111
and when that fails, it PUTs GPGHMACSHA1--2222222222. From this, we can
deduce that GPGHMACSHA1--1111111111 is not the first chunk, but is the full
non-chunked file, and GPGHMACSHA1--2222222222 is actually the first chunk.
For testing, I modifed the S3 remote to make file uploads succeed, but then
report to git-annex that they failed. So, git annex copy uploads the 1st
chunk and then fails, same as it was interrupted there. Repeating the copy,
I see the same thing; it HEADs the full key, does not HEAD the first chunk,
and so doesn't notice it was uploaded before, and so re-uploads the first
chunk.
The HEAD of the full key is just done for backwards compatability reasons.
The problem is that it's not checking if the current chunk it's gonna
upload is present in the remote. But, there is code in seekResume that
is supposed to do that very check: `tryNonAsync (checker k)`
Aha, the problem seems to be in the checkpresent action that's passed to
that. Looks like it's passing in a dummy checkpresent action.
I've fixed this in git, and now it resumes properly in my test case.
"""]]