aws library now supports multipart; initial design

2014-10-28 13:06:24 -04:00 · 2014-10-28 13:06:24 -04:00 · dcaf89da2f
commit dcaf89da2f
parent 04e920d715
1 changed files with 31 additions and 0 deletions
--- a/doc/bugs/S3_upload_not_using_multipart/comment_9_74b2a392a537dde1c28089f1deed940c._comment
+++ b/doc/bugs/S3_upload_not_using_multipart/comment_9_74b2a392a537dde1c28089f1deed940c._comment
@ -0,0 +1,31 @@
 [[!comment format=mdwn
 username="joey"
 subject="""comment 9"""
 date="2014-10-28T16:42:21Z"
 content="""
 The aws library now supports multipart uploads, using its
 S3.Commands.Multipart module.
 I don't think that multipart and chunking fit together: Typically the
 chunks are too small to need multipart for individual chunks. And the
 chunks shouldn't be combined together into a complete object at the end (at
 least not if we care about using chunking to obscure object size).
 Individual chunks sizes can vary when encryption is used, so combining them
 all into one file wouldn't work.
 Also, multipart uploads require at least 3 http calls, so there's no point
 using it for small objects, as it would only add overhead.
 So, multipart uploads should be used when not chunking, when the object to
 upload exceeds some size, which should probably defaut to something in the
 range of 100 mb to 1 gb.
 It might be possible to support resuming of interrupted multipart uploads.
 It seems that git-annex would need to store, locally, the UploadId,
 as well as the list of uploaded parts, including the Etag for the upload
 (which is needed when completing the multipart upload too).
 Also it should probably set Expires when initiating the multipart upload,
 so that incomplete ones get cleaned up after some period of time.
 Otherwise, users would probably be billed for them.
 """]]