Fix bug that prevented uploads to remotes using new-style chunking from resuming after the last successfully uploaded chunk.

"checkPresent baser" was wrong; the baser has a dummy checkPresent action not the real one. So, to fix this, we need to call preparecheckpresent to get a checkpresent action that can be used to check if chunks are present. Note that, for remotes like S3, this means that the preparer is run, which opens a S3 handle, that will be used for each checkpresent of a chunk. That's a good thing; if we're resuming an upload that's already many chunks in, it'll reuse that same http connection for each chunk it checks. Still, it's not a perfectly ideal thing, since this is a different http connection that the one that will be used to upload chunks. It would be nice to improve the API so that both use the same http connection.
2015-07-16 15:01:10 -04:00 · 2015-07-16 15:01:10 -04:00 · afe6a53bca
commit afe6a53bca
parent 5de3b4d07a
3 changed files with 37 additions and 3 deletions
--- a/Remote/Helper/Special.hs
+++ b/Remote/Helper/Special.hs
@ -184,12 +184,14 @@ specialRemote' cfg c preparestorer prepareretriever prepareremover preparecheckp
 	-- chunk, then encrypt, then feed to the storer
 	storeKeyGen k f p enc = safely $ preparestorer k $ safely . go
 	  where
-		go (Just storer) = sendAnnex k rollback $ \src ->
+		go (Just storer) = preparecheckpresent k $ safely . go' storer
+		go Nothing = return False
+		go' storer (Just checker) = sendAnnex k rollback $ \src ->
 			displayprogress p k f $ \p' ->
 				storeChunks (uuid baser) chunkconfig k src p'
 					(storechunk enc storer)
-					(checkPresent baser)
-		go Nothing = return False
+					checker
+		go' _ Nothing = return False
 		rollback = void $ removeKey encr k

 	storechunk Nothing storer k content p = storer k content p
--- a/debian/changelog
+++ b/debian/changelog
@ -3,6 +3,8 @@ git-annex (5.20150714) UNRELEASED; urgency=medium
  * Improve bash completion code so that "git annex" will also tab
    complete. However, git's bash completion script needs a patch,
    which I've submitted, for this to work prefectly.
+  * Fix bug that prevented uploads to remotes using new-style chunking
+    from resuming after the last successfully uploaded chunk.

 -- Joey Hess <id@joeyh.name>  Thu, 16 Jul 2015 14:55:07 -0400

--- a/doc/forum/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_4_bd631d470ee0365a11483c9a2e563b32._comment
+++ b/doc/forum/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_4_bd631d470ee0365a11483c9a2e563b32._comment
@ -0,0 +1,30 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 4"""
+ date="2015-07-16T17:57:44Z"
+ content="""
+This should have been filed as a bug report... I will move the thread to
+bugs after posting this comment.
+
+In your obfuscated log, it tries to HEAD GPGHMACSHA1--1111111111
+and when that fails, it PUTs GPGHMACSHA1--2222222222. From this, we can
+deduce that GPGHMACSHA1--1111111111 is not the first chunk, but is the full
+non-chunked file, and GPGHMACSHA1--2222222222 is actually the first chunk.
+
+For testing, I modifed the S3 remote to make file uploads succeed, but then
+report to git-annex that they failed. So, git annex copy uploads the 1st
+chunk and then fails, same as it was interrupted there. Repeating the copy,
+I see the same thing; it HEADs the full key, does not HEAD the first chunk,
+and so doesn't notice it was uploaded before, and so re-uploads the first
+chunk.
+
+The HEAD of the full key is just done for backwards compatability reasons.
+The problem is that it's not checking if the current chunk it's gonna
+upload is present in the remote. But, there is code in seekResume that
+is supposed to do that very check: `tryNonAsync (checker k)`
+
+Aha, the problem seems to be in the checkpresent action that's passed to
+that. Looks like it's passing in a dummy checkpresent action.
+
+I've fixed this in git, and now it resumes properly in my test case.
+"""]]