aws library now supports multipart; initial design

This commit is contained in:
Joey Hess 2014-10-28 13:06:24 -04:00
parent 04e920d715
commit dcaf89da2f

View file

@ -0,0 +1,31 @@
[[!comment format=mdwn
username="joey"
subject="""comment 9"""
date="2014-10-28T16:42:21Z"
content="""
The aws library now supports multipart uploads, using its
S3.Commands.Multipart module.
I don't think that multipart and chunking fit together: Typically the
chunks are too small to need multipart for individual chunks. And the
chunks shouldn't be combined together into a complete object at the end (at
least not if we care about using chunking to obscure object size).
Individual chunks sizes can vary when encryption is used, so combining them
all into one file wouldn't work.
Also, multipart uploads require at least 3 http calls, so there's no point
using it for small objects, as it would only add overhead.
So, multipart uploads should be used when not chunking, when the object to
upload exceeds some size, which should probably defaut to something in the
range of 100 mb to 1 gb.
It might be possible to support resuming of interrupted multipart uploads.
It seems that git-annex would need to store, locally, the UploadId,
as well as the list of uploaded parts, including the Etag for the upload
(which is needed when completing the multipart upload too).
Also it should probably set Expires when initiating the multipart upload,
so that incomplete ones get cleaned up after some period of time.
Otherwise, users would probably be billed for them.
"""]]