git-annex/doc/special_remotes/Amazon_S3.mdwn
Joey Hess 66ab18325e mention archive.org's S3 server
git-annex + archive.org could be an interesting combo for public archivists
2011-04-02 13:35:57 -04:00

55 lines
2.2 KiB
Markdown

This special remote type stores file contents in a bucket in Amazon S3
or a similar service, such as
[Archive.org's S3 API](http://www.archive.org/help/abouts3.txt).
See [[walkthrough/using_Amazon_S3]] for usage examples.
## configuration
A number of parameters can be passed to `git annex initremote` to configure
the S3 remote.
* `encryption` - Required. Either "none" to disable encryption,
or a value that can be looked up (using gpg -k) to find a gpg encryption
key that will be given access to the remote. Note that additional gpg
keys can be given access to a remote by rerunning initremote with
the new key id.
* `datacenter` - Defaults to "US". Other values include "EU",
"us-west-1", and "ap-southeast-1".
* `storageclass` - Default is "STANDARD". If you have configured git-annex
to preserve multiple [[copies]], consider setting this to "REDUCED_REDUNDANCY"
to save money.
* `host` and `port` - Specify in order to use a different, S3 compatable
service.
* `bucket` - S3 requires that buckets have a globally unique name,
so by default, a bucket name is chosen based on the remote name
and UUID. This can be specified to pick a bucket name.
## data security
When encryption=none, there is **no** protection against your data being read
as it is sent to/from S3, or by Amazon when it is stored in S3. This should
only be used for public data.
** Encryption is not yet supported. **
When encryption is enabled, all files stored in the bucket are
encrypted with gpg. Additionally, the filenames themselves are encrypted
(using HMAC). The size of the encrypted files, and
access patterns of the data, should be the only clues to what type of
data you are storing in S3.
[[!template id=note text="""
This scheme was originally developed by Lars Wirzenius et al
[for Obnam](http://braawi.org/obnam/encryption/).
"""]]
The data stored in S3 is encrypted by gpg with a symmetric cipher. The
passphrase of the cipher is itself checked into your git repository,
encrypted using one or more gpg public keys. This scheme allows new private
keys to be given access to a bucket's content, after the bucket is created
and is in use. The symmetric cipher is also hashed together with filenames
used in the bucket, in order to obfuscate the filenames.