move forum bug report to bugs, and close

This commit is contained in:
Joey Hess 2015-07-16 15:01:55 -04:00
parent afe6a53bca
commit 22f3a5fdc2
5 changed files with 2 additions and 0 deletions

View file

@ -1,88 +0,0 @@
I'm trying to upload large files into s3 remote. I'm using a very recent version of git-annex:
git-annex version: 5.20150616-g4d7683b
build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV FsEvents XMPP DNS Feeds Quvi TDFA TorrentParser
key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E MD5E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 MD5 WORM URL
remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
Here's how my chunking is set up:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx bucket=mybucket chunk=256MiB cipher=xxxxxx cipherkeys=xxxxxx datacenter=US
host=s3.amazonaws.com name=mybucket port=80 s3creds=xxxxxx storageclass=STANDARD type=S3 timestamp=xxxxxx
If I run an upload and `^C` it in the middle of the upload, then start it again, it will always resume from the beginning.
I've proven this to myself by using the `--debug` switch, please see blow. I've renamed certain things for security reasons, however GPGHMACSHA1--1111111111 always refers to the same chunk and GPGHMACSHA1--2222222222 always refers to the same chunk, etc.
You can see that even after it uploads the same chunk once, it tries again.
This is consistent with the behavior of letting it sit there for an hour and upload half of the large file, and then interrupting it, and having it start from scratch again.
$ git annex copy --debug * --to mybucket
[2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","git-annex"]
[2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","log","refs/heads/git-annex..xxx","-n1","--pretty=%H"]
[2015-06-23 15:24:07 PDT] chat: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","cat-file","--batch"]
[2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","ls-files","--cached","-z","--","aaa.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz"]
copy aaa.tgz [2015-06-23 15:24:07 PDT] chat: gpg ["--quiet","--trust-model","always","--decrypt"]
(checking mybucket...) [2015-06-23 15:24:07 PDT] String to sign: "HEAD\n\n\nTue, 23 Jun 2015 22:24:07 GMT\n/mybucket/GPGHMACSHA1--1111111111"
[2015-06-23 15:24:07 PDT] Host: "mybucket.s3.amazonaws.com"
[2015-06-23 15:24:07 PDT] Path: "/GPGHMACSHA1--1111111111"
[2015-06-23 15:24:07 PDT] Query string: ""
[2015-06-23 15:24:07 PDT] Response status: Status {statusCode = 404, statusMessage = "Not Found"}
[2015-06-23 15:24:07 PDT] Response header 'x-amz-request-id': 'xxx'
[2015-06-23 15:24:07 PDT] Response header 'x-amz-id-2': 'xxx'
[2015-06-23 15:24:07 PDT] Response header 'Content-Type': 'application/xml'
[2015-06-23 15:24:07 PDT] Response header 'Transfer-Encoding': 'chunked'
[2015-06-23 15:24:07 PDT] Response header 'Date': 'Tue, 23 Jun 2015 22:24:03 GMT'
[2015-06-23 15:24:07 PDT] Response header 'Server': 'AmazonS3'
[2015-06-23 15:24:07 PDT] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
(to mybucket...)
0% 0.0 B/s 0s[2015-06-23 15:24:07 PDT] chat: gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","18","--symmetric","--force-mdc","--no-textmode"]
[2015-06-23 15:24:19 PDT] String to sign: "PUT\n\n\nTue, 23 Jun 2015 22:24:19 GMT\nx-amz-storage-class:STANDARD\n/mybucket/GPGHMACSHA1--2222222222"
[2015-06-23 15:24:19 PDT] Host: "mybucket.s3.amazonaws.com"
[2015-06-23 15:24:19 PDT] Path: "/GPGHMACSHA1--2222222222"
[2015-06-23 15:24:19 PDT] Query string: ""
3% 636.3KB/s 3h0m[2015-06-23 15:31:01 PDT] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2015-06-23 15:31:01 PDT] Response header 'x-amz-id-2': 'xxx'
[2015-06-23 15:31:01 PDT] Response header 'x-amz-request-id': 'xxx'
[2015-06-23 15:31:01 PDT] Response header 'Date': 'Tue, 23 Jun 2015 22:24:17 GMT'
[2015-06-23 15:31:01 PDT] Response header 'ETag': '"xxx"'
[2015-06-23 15:31:01 PDT] Response header 'Content-Length': '0'
[2015-06-23 15:31:01 PDT] Response header 'Server': 'AmazonS3'
[2015-06-23 15:31:01 PDT] Response metadata: S3: request ID=xxx, x-amz-id-2=xxx
3% 633.2KB/s 3h1m[2015-06-23 15:31:01 PDT] chat: gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","18","--symmetric","--force-mdc","--no-textmode"]
[2015-06-23 15:31:13 PDT] String to sign: "PUT\n\n\nTue, 23 Jun 2015 22:31:13 GMT\nx-amz-storage-class:STANDARD\n/mybucket/GPGHMACSHA1--3333333333"
[2015-06-23 15:31:13 PDT] Host: "mybucket.s3.amazonaws.com"
[2015-06-23 15:31:13 PDT] Path: "/GPGHMACSHA1--3333333333"
[2015-06-23 15:31:13 PDT] Query string: ""
3% 617.2KB/s 3h6m^C
$ git annex copy --debug * --to mybucket
[2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","git-annex"]
[2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","log","refs/heads/git-annex..xxx","-n1","--pretty=%H"]
[2015-06-23 15:31:25 PDT] chat: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","cat-file","--batch"]
[2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","ls-files","--cached","-z","--","aaa.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz"]
copy aaa.tgz [2015-06-23 15:31:25 PDT] chat: gpg ["--quiet","--trust-model","always","--decrypt"]
(checking mybucket...) [2015-06-23 15:31:25 PDT] String to sign: "HEAD\n\n\nTue, 23 Jun 2015 22:31:25 GMT\n/mybucket/GPGHMACSHA1--1111111111"
[2015-06-23 15:31:25 PDT] Host: "mybucket.s3.amazonaws.com"
[2015-06-23 15:31:25 PDT] Path: "/GPGHMACSHA1--1111111111"
[2015-06-23 15:31:25 PDT] Query string: ""
[2015-06-23 15:31:25 PDT] Response status: Status {statusCode = 404, statusMessage = "Not Found"}
[2015-06-23 15:31:25 PDT] Response header 'x-amz-request-id': 'xxx'
[2015-06-23 15:31:25 PDT] Response header 'x-amz-id-2': 'xxx'
[2015-06-23 15:31:25 PDT] Response header 'Content-Type': 'application/xml'
[2015-06-23 15:31:25 PDT] Response header 'Transfer-Encoding': 'chunked'
[2015-06-23 15:31:25 PDT] Response header 'Date': 'Tue, 23 Jun 2015 22:31:21 GMT'
[2015-06-23 15:31:25 PDT] Response header 'Server': 'AmazonS3'
[2015-06-23 15:31:25 PDT] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
(to mybucket...)
0% 0.0 B/s 0s[2015-06-23 15:31:25 PDT] chat: gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","18","--symmetric","--force-mdc","--no-textmode"]
[2015-06-23 15:31:37 PDT] String to sign: "PUT\n\n\nTue, 23 Jun 2015 22:31:37 GMT\nx-amz-storage-class:STANDARD\n/mybucket/GPGHMACSHA1--2222222222"
[2015-06-23 15:31:37 PDT] Host: "mybucket.s3.amazonaws.com"
[2015-06-23 15:31:37 PDT] Path: "/GPGHMACSHA1--2222222222"
[2015-06-23 15:31:37 PDT] Query string: ""
0% 350.1KB/s 5h40m^C

View file

@ -1,7 +0,0 @@
[[!comment format=mdwn
username="anarcat"
subject="comment 1"
date="2015-06-24T00:31:43Z"
content="""
did a single chunk get transfered correctly? i believe git-annex can only resume at the chunk granularity... that is what it is for, no? --[[anarcat]]
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="digiuser"
subject="yes"
date="2015-06-24T00:49:22Z"
content="""
Yes, a single chunk did get transferred correctly.
Actually, many times I've run this experiment, many chunks did get transferred correctly. I've even verified that they are in S3, but git-annex is trying to re-upload them.
(I haven't checked their contents in S3 but the filenames are there and the sizes are there)
"""]]

View file

@ -1,7 +0,0 @@
[[!comment format=mdwn
username="digiuser"
subject="any updates?"
date="2015-06-29T03:05:53Z"
content="""
Sorry to post again here but I was wondering if this message got lost. Anyone have a solution here? Thanks!
"""]]

View file

@ -1,30 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2015-07-16T17:57:44Z"
content="""
This should have been filed as a bug report... I will move the thread to
bugs after posting this comment.
In your obfuscated log, it tries to HEAD GPGHMACSHA1--1111111111
and when that fails, it PUTs GPGHMACSHA1--2222222222. From this, we can
deduce that GPGHMACSHA1--1111111111 is not the first chunk, but is the full
non-chunked file, and GPGHMACSHA1--2222222222 is actually the first chunk.
For testing, I modifed the S3 remote to make file uploads succeed, but then
report to git-annex that they failed. So, git annex copy uploads the 1st
chunk and then fails, same as it was interrupted there. Repeating the copy,
I see the same thing; it HEADs the full key, does not HEAD the first chunk,
and so doesn't notice it was uploaded before, and so re-uploads the first
chunk.
The HEAD of the full key is just done for backwards compatability reasons.
The problem is that it's not checking if the current chunk it's gonna
upload is present in the remote. But, there is code in seekResume that
is supposed to do that very check: `tryNonAsync (checker k)`
Aha, the problem seems to be in the checkpresent action that's passed to
that. Looks like it's passing in a dummy checkpresent action.
I've fixed this in git, and now it resumes properly in my test case.
"""]]