2011-03-28 17:47:29 +00:00
|
|
|
|
This special remote type stores file contents in a bucket in Amazon S3
|
2011-05-16 06:21:40 +00:00
|
|
|
|
or a similar service.
|
2011-03-28 17:47:29 +00:00
|
|
|
|
|
2015-02-04 18:54:16 +00:00
|
|
|
|
See [[tips/using_Amazon_S3]],
|
|
|
|
|
[[tips/Internet_Archive_via_S3]], and [[tips/using_Google_Cloud_Storage]]
|
|
|
|
|
for usage examples.
|
2011-03-28 17:47:29 +00:00
|
|
|
|
|
2011-03-30 05:45:39 +00:00
|
|
|
|
## configuration
|
2011-03-28 17:47:29 +00:00
|
|
|
|
|
2021-03-17 13:41:12 +00:00
|
|
|
|
The standard environment variables `AWS_ACCESS_KEY_ID` and
|
|
|
|
|
`AWS_SECRET_ACCESS_KEY` are used to supply login credentials for S3. You
|
|
|
|
|
need to set these only when running `git annex initremote` (or
|
|
|
|
|
`enableremote`), as they will be cached in a file only you can read inside
|
|
|
|
|
the local git repository. If you’re working with temporary security
|
|
|
|
|
credentials, you can also set the `AWS_SESSION_TOKEN` environment variable.
|
2011-05-16 15:20:30 +00:00
|
|
|
|
|
2011-03-28 23:08:12 +00:00
|
|
|
|
A number of parameters can be passed to `git annex initremote` to configure
|
|
|
|
|
the S3 remote.
|
|
|
|
|
|
2013-09-05 03:46:50 +00:00
|
|
|
|
* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
|
|
|
|
|
See [[encryption]].
|
2012-11-19 21:32:58 +00:00
|
|
|
|
|
2014-11-06 18:26:01 +00:00
|
|
|
|
* `keyid` - Specifies the gpg key to use for [[encryption]].
|
|
|
|
|
|
2014-08-02 19:51:58 +00:00
|
|
|
|
* `chunk` - Enables [[chunking]] when storing large files.
|
|
|
|
|
`chunk=1MiB` is a good starting point for chunking.
|
|
|
|
|
|
2012-11-19 21:32:58 +00:00
|
|
|
|
* `embedcreds` - Optional. Set to "yes" embed the login credentials inside
|
|
|
|
|
the git repository, which allows other clones to also access them. This is
|
|
|
|
|
the default when gpg encryption is enabled; the credentials are stored
|
|
|
|
|
encrypted and only those with the repository's keys can access them.
|
|
|
|
|
|
|
|
|
|
It is not the default when using shared encryption, or no encryption.
|
|
|
|
|
Think carefully about who can access your repository before using
|
|
|
|
|
embedcreds without gpg encryption.
|
2011-03-28 23:08:12 +00:00
|
|
|
|
|
2017-02-14 18:14:24 +00:00
|
|
|
|
* `datacenter` - Defaults to "US". Other values include "EU" (which is EU/Ireland),
|
2014-08-08 22:54:04 +00:00
|
|
|
|
"us-west-1", "us-west-2", "ap-southeast-1", "ap-southeast-2", and
|
|
|
|
|
"sa-east-1".
|
2011-03-28 23:08:12 +00:00
|
|
|
|
|
2015-09-17 21:20:01 +00:00
|
|
|
|
* `storageclass` - Default is "STANDARD".
|
|
|
|
|
Consult S3 provider documentation for pricing details and available
|
2020-06-08 14:58:48 +00:00
|
|
|
|
storage classes. For example, the s3cmd(1) man page lists valid storage class names for Amazon S3.
|
2015-11-02 15:14:03 +00:00
|
|
|
|
|
2020-06-08 14:58:48 +00:00
|
|
|
|
When using Amazon S3,
|
|
|
|
|
if the remote will be used for backup or archival,
|
2021-10-14 16:45:05 +00:00
|
|
|
|
and so its files are Infrequently Accessed, `STANDARD_IA` is a
|
2020-06-08 14:58:48 +00:00
|
|
|
|
good choice to save money (requires a git-annex built with aws-0.13.0).
|
|
|
|
|
If you have configured git-annex to preserve
|
2021-10-14 16:45:05 +00:00
|
|
|
|
multiple [[copies]], also consider setting this to `ONEZONE_IA`
|
2020-06-08 14:58:48 +00:00
|
|
|
|
to save even more money.
|
2015-11-02 15:14:03 +00:00
|
|
|
|
|
2021-10-14 16:45:05 +00:00
|
|
|
|
Amazon S3's `DEEP_ARCHIVE` is similar to Amazon Glacier. For that,
|
|
|
|
|
use the [[glacier]] special remote, rather than this one.
|
|
|
|
|
|
2015-11-02 15:14:03 +00:00
|
|
|
|
When using Google Cloud Storage, to make a nearline bucket, set this to
|
2021-10-14 16:45:05 +00:00
|
|
|
|
`NEARLINE`. (Requires a git-annex built with aws-0.13.0)
|
2015-09-17 21:20:01 +00:00
|
|
|
|
|
|
|
|
|
Note that changing the storage class of an existing S3 remote will
|
|
|
|
|
affect new objects sent to the remote, but not objects already
|
|
|
|
|
stored there.
|
2011-03-28 23:08:12 +00:00
|
|
|
|
|
2019-03-22 16:17:05 +00:00
|
|
|
|
* `host` - Specify in order to use a different, S3 compatable
|
2011-03-28 23:08:12 +00:00
|
|
|
|
service.
|
2011-03-28 17:47:29 +00:00
|
|
|
|
|
2019-03-22 16:17:05 +00:00
|
|
|
|
* `protocol` - Either "http" (the default) or "https". Setting
|
2019-03-22 16:22:34 +00:00
|
|
|
|
protocol=https implies port=443.
|
|
|
|
|
|
|
|
|
|
This option was added in git-annex version 7.20190322; to make
|
|
|
|
|
a special remote that uses http with older versions of git-annex,
|
|
|
|
|
explicitly specify port=443.
|
2019-03-22 16:17:05 +00:00
|
|
|
|
|
|
|
|
|
* `port` - Specify the port to connect to. Only needed when using a service
|
|
|
|
|
on an unusual port. Setting port=443 implies protocol=https.
|
|
|
|
|
|
2016-02-09 19:36:31 +00:00
|
|
|
|
* `requeststyle` - Set to "path" to use path style requests, instead of the
|
|
|
|
|
default DNS style requests. This is needed with some S3 services.
|
|
|
|
|
|
2018-08-01 20:06:34 +00:00
|
|
|
|
If you get an error about a host name not existing, it's a good
|
|
|
|
|
indication that you need to use this.
|
|
|
|
|
|
2020-05-07 17:18:11 +00:00
|
|
|
|
* `signature` - This controls the S3 signature version to use.
|
|
|
|
|
"v2" is currently the default, "v4" is needed to use some S3 services.
|
|
|
|
|
If you get some kind of authentication error, try "v4".
|
2022-10-10 20:52:18 +00:00
|
|
|
|
To access a S3 bucket anonymously, use "anonymous".
|
2020-05-07 17:18:11 +00:00
|
|
|
|
|
2011-03-29 19:12:07 +00:00
|
|
|
|
* `bucket` - S3 requires that buckets have a globally unique name,
|
|
|
|
|
so by default, a bucket name is chosen based on the remote name
|
|
|
|
|
and UUID. This can be specified to pick a bucket name.
|
2011-05-01 18:05:10 +00:00
|
|
|
|
|
2019-04-23 17:16:25 +00:00
|
|
|
|
* `versioning` - Indicate whether the S3 bucket should have versioning
|
2019-05-01 18:29:10 +00:00
|
|
|
|
enabled. Set to "yes" to enable.
|
2019-04-23 17:16:25 +00:00
|
|
|
|
|
2019-05-01 18:29:10 +00:00
|
|
|
|
Enabling versioning along with "exporttree=yes"
|
2018-09-06 18:31:41 +00:00
|
|
|
|
allows git-annex to access old versions of files that were
|
|
|
|
|
exported to the special remote by [[git-annex export|git-annex-export]].
|
2018-08-30 17:45:28 +00:00
|
|
|
|
|
2019-05-01 18:29:10 +00:00
|
|
|
|
And enabling versioning along with "importtree=yes"
|
2019-04-23 17:16:25 +00:00
|
|
|
|
allows [[git-annex import|git-annex-import]] to import the whole
|
|
|
|
|
history of files in the bucket, synthesizing a series of git commits.
|
2018-08-30 17:45:28 +00:00
|
|
|
|
|
2019-05-01 18:29:10 +00:00
|
|
|
|
Note that git-annex does not support dropping content from versioned
|
2019-04-23 17:16:25 +00:00
|
|
|
|
S3 buckets, since the versioning preserves the content.
|
|
|
|
|
|
|
|
|
|
* `exporttree` - Set to "yes" to make this special remote usable
|
|
|
|
|
by [[git-annex export|git-annex-export]].
|
|
|
|
|
It will not be usable as a general-purpose special remote.
|
|
|
|
|
|
|
|
|
|
* `importtree` - Set to "yes" to make this special remote usable
|
|
|
|
|
by [[git-annex-import]]. When set in combination with exporttree,
|
|
|
|
|
this lets files be imported from it, and changes exported back to it.
|
|
|
|
|
|
|
|
|
|
Note that exporting files to a S3 bucket may overwrite changes that
|
|
|
|
|
have been made to files in the bucket by other software since the last
|
|
|
|
|
time git-annex imported from the bucket. When versioning is enabled,
|
|
|
|
|
the content of files overwritten in this way can still be recovered,
|
|
|
|
|
but you may have to look through the git history to find them.
|
|
|
|
|
When versioning is not enabled, this risks data loss, and so git-annex
|
|
|
|
|
will not let you enable a remote with that configuration unless forced.
|
2017-09-08 19:41:31 +00:00
|
|
|
|
|
2015-06-05 18:38:01 +00:00
|
|
|
|
* `public` - Set to "yes" to allow public read access to files sent
|
|
|
|
|
to the S3 remote. This is accomplished by setting an ACL when each
|
2015-06-05 20:23:35 +00:00
|
|
|
|
file is uploaded to the remote. So, changes to this setting will
|
|
|
|
|
only affect subseqent uploads.
|
|
|
|
|
|
|
|
|
|
* `publicurl` - Configure the URL that is used to download files
|
2018-07-02 16:30:39 +00:00
|
|
|
|
from the bucket. Using this in combination with public=yes allows
|
|
|
|
|
git-annex to download files from the S3 remote without needing to
|
|
|
|
|
know the S3 credentials.
|
2015-06-05 18:38:01 +00:00
|
|
|
|
|
2014-11-06 18:26:01 +00:00
|
|
|
|
* `partsize` - Amazon S3 only accepts uploads up to a certian file size,
|
|
|
|
|
and storing larger files requires a multipart upload process.
|
|
|
|
|
|
|
|
|
|
Setting `partsize=1GiB` is recommended for Amazon S3 when not using
|
|
|
|
|
chunking; this will cause multipart uploads to be done using parts
|
|
|
|
|
up to 1GiB in size. Note that setting partsize to less than 100MiB
|
|
|
|
|
will cause Amazon S3 to reject uploads.
|
|
|
|
|
|
|
|
|
|
This is not enabled by default, since other S3 implementations may
|
|
|
|
|
not support multipart uploads or have different limits,
|
|
|
|
|
but can be enabled or changed at any time.
|
|
|
|
|
|
2012-08-09 17:54:54 +00:00
|
|
|
|
* `fileprefix` - By default, git-annex places files in a tree rooted at the
|
|
|
|
|
top of the S3 bucket. When this is set, it's prefixed to the filenames
|
|
|
|
|
used. For example, you could set it to "foo/" in one special remote,
|
|
|
|
|
and to "bar/" in another special remote, and both special remotes could
|
|
|
|
|
then use the same bucket.
|
|
|
|
|
|
2014-08-09 18:44:53 +00:00
|
|
|
|
* `x-amz-meta-*` are passed through as http headers when storing keys
|
2020-01-20 20:23:35 +00:00
|
|
|
|
in S3.
|
|
|
|
|
|
|
|
|
|
* `x-archive-meta-*` are passed through as http headers when storing keys
|
|
|
|
|
in the Internet Archive. See
|
|
|
|
|
[the Internet Archive S3 interface documentation](https://archive.org/help/abouts3.txt)
|
|
|
|
|
for example headers.
|