Merge branch 's3-aws'
This commit is contained in:
commit
911ba8d972
12 changed files with 493 additions and 225 deletions
|
@ -2,9 +2,13 @@ S3 has memory leaks
|
|||
|
||||
Sending a file to S3 causes a slow memory increase toward the file size.
|
||||
|
||||
> This is fixed, now that it uses aws. --[[Joey]]
|
||||
|
||||
Copying the file back from S3 causes a slow memory increase toward the
|
||||
file size.
|
||||
|
||||
> [[fixed|done]] too! --[[Joey]]
|
||||
|
||||
The author of hS3 is aware of the problem, and working on it. I think I
|
||||
have identified the root cause of the buffering; it's done by hS3 so it can
|
||||
resend the data if S3 sends it a 307 redirect. --[[Joey]]
|
||||
|
|
|
@ -52,3 +52,11 @@ Please provide any additional information below.
|
|||
upgrade supported from repository versions: 0 1 2
|
||||
|
||||
[[!tag confirmed]]
|
||||
|
||||
> [[fixed|done]] This is now supported, when git-annex is built with a new
|
||||
> enough version of the aws library. You need to configure the remote to
|
||||
> use an appropriate value for multipart, eg:
|
||||
>
|
||||
> git annex enableremote cloud multipart=1GiB
|
||||
>
|
||||
> --[[Joey]]
|
||||
|
|
|
@ -6,3 +6,5 @@ Amazon has opened up a new region in AWS with a datacenter in Frankfurt/Germany.
|
|||
* Region: eu-central-1
|
||||
|
||||
This should be added to the "Adding an Amazon S3 repository" page in the Datacenter dropdown of the webapp.
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
||||
|
|
|
@ -18,11 +18,11 @@ the S3 remote.
|
|||
* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
|
||||
See [[encryption]].
|
||||
|
||||
* `keyid` - Specifies the gpg key to use for [[encryption]].
|
||||
|
||||
* `chunk` - Enables [[chunking]] when storing large files.
|
||||
`chunk=1MiB` is a good starting point for chunking.
|
||||
|
||||
* `keyid` - Specifies the gpg key to use for [[encryption]].
|
||||
|
||||
* `embedcreds` - Optional. Set to "yes" embed the login credentials inside
|
||||
the git repository, which allows other clones to also access them. This is
|
||||
the default when gpg encryption is enabled; the credentials are stored
|
||||
|
@ -33,7 +33,8 @@ the S3 remote.
|
|||
embedcreds without gpg encryption.
|
||||
|
||||
* `datacenter` - Defaults to "US". Other values include "EU",
|
||||
"us-west-1", and "ap-southeast-1".
|
||||
"us-west-1", "us-west-2", "ap-southeast-1", "ap-southeast-2", and
|
||||
"sa-east-1".
|
||||
|
||||
* `storageclass` - Default is "STANDARD". If you have configured git-annex
|
||||
to preserve multiple [[copies]], consider setting this to "REDUCED_REDUNDANCY"
|
||||
|
@ -46,11 +47,24 @@ the S3 remote.
|
|||
so by default, a bucket name is chosen based on the remote name
|
||||
and UUID. This can be specified to pick a bucket name.
|
||||
|
||||
* `partsize` - Amazon S3 only accepts uploads up to a certian file size,
|
||||
and storing larger files requires a multipart upload process.
|
||||
|
||||
Setting `partsize=1GiB` is recommended for Amazon S3 when not using
|
||||
chunking; this will cause multipart uploads to be done using parts
|
||||
up to 1GiB in size. Note that setting partsize to less than 100MiB
|
||||
will cause Amazon S3 to reject uploads.
|
||||
|
||||
This is not enabled by default, since other S3 implementations may
|
||||
not support multipart uploads or have different limits,
|
||||
but can be enabled or changed at any time.
|
||||
time.
|
||||
|
||||
* `fileprefix` - By default, git-annex places files in a tree rooted at the
|
||||
top of the S3 bucket. When this is set, it's prefixed to the filenames
|
||||
used. For example, you could set it to "foo/" in one special remote,
|
||||
and to "bar/" in another special remote, and both special remotes could
|
||||
then use the same bucket.
|
||||
|
||||
* `x-amz-*` are passed through as http headers when storing keys
|
||||
* `x-amz-meta-*` are passed through as http headers when storing keys
|
||||
in S3.
|
||||
|
|
14
doc/todo/S3_multipart_interruption_cleanup.mdwn
Normal file
14
doc/todo/S3_multipart_interruption_cleanup.mdwn
Normal file
|
@ -0,0 +1,14 @@
|
|||
When a multipart S3 upload is being made, and gets interrupted,
|
||||
the parts remain in the bucket, and S3 may charge for them.
|
||||
|
||||
I am not sure what happens if the same object gets uploaded again. Is S3
|
||||
nice enough to remove the old parts? I need to find out..
|
||||
|
||||
If not, this needs to be dealt with somehow. One way would be to configure an
|
||||
expiry of the uploaded parts, but this is tricky as a huge upload could
|
||||
take arbitrarily long. Another way would be to record the uploadid and the
|
||||
etags of the parts, and then resume where it left off the next time the
|
||||
object is sent to S3. (Or at least cancel the old upload; resume isn't
|
||||
practical when uploading an encrypted object.)
|
||||
|
||||
It could store that info in either the local FS or the git-annex branch.
|
Loading…
Add table
Add a link
Reference in a new issue