33ba537728
* S3: Amazon S3 buckets created after April 2023 do not support ACLs, so public=yes cannot be used with them. Existing buckets configured with public=yes will keep working. * S3: Allow setting publicurl=yes without public=yes, to support buckets that are configured with a Bucket Policy that allows public access. Sponsored-by: Joshua Antonishen on Patreon
165 lines
7.3 KiB
Markdown
165 lines
7.3 KiB
Markdown
This special remote type stores file contents in a bucket in Amazon S3
|
||
or a similar service.
|
||
|
||
See [[tips/using_Amazon_S3]],
|
||
[[tips/Internet_Archive_via_S3]], and [[tips/using_Google_Cloud_Storage]]
|
||
for usage examples.
|
||
|
||
## configuration
|
||
|
||
The standard environment variables `AWS_ACCESS_KEY_ID` and
|
||
`AWS_SECRET_ACCESS_KEY` are used to supply login credentials for S3. You
|
||
need to set these only when running `git annex initremote` (or
|
||
`enableremote`), as they will be cached in a file only you can read inside
|
||
the local git repository. If you’re working with temporary security
|
||
credentials, you can also set the `AWS_SESSION_TOKEN` environment variable.
|
||
|
||
A number of parameters can be passed to `git annex initremote` to configure
|
||
the S3 remote.
|
||
|
||
* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
|
||
See [[encryption]].
|
||
|
||
* `keyid` - Specifies the gpg key to use for [[encryption]].
|
||
|
||
* `chunk` - Enables [[chunking]] when storing large files.
|
||
`chunk=1MiB` is a good starting point for chunking.
|
||
|
||
* `embedcreds` - Optional. Set to "yes" embed the login credentials inside
|
||
the git repository, which allows other clones to also access them. This is
|
||
the default when gpg encryption is enabled; the credentials are stored
|
||
encrypted and only those with the repository's keys can access them.
|
||
|
||
It is not the default when using shared encryption, or no encryption.
|
||
Think carefully about who can access your repository before using
|
||
embedcreds without gpg encryption.
|
||
|
||
* `datacenter` - Specifies which Amazon datacenter
|
||
to use for the bucket. Defaults to "US". Other values include "EU"
|
||
(which is EU/Ireland), "us-west-1", "us-west-2", "ap-southeast-1",
|
||
"ap-southeast-2", and "sa-east-1". See Amazon's documentation for a
|
||
complete list. Configuring this is equivilant to configuring both
|
||
`host` and `region`.
|
||
|
||
* `storageclass` - Default is "STANDARD".
|
||
Consult S3 provider documentation for pricing details and available
|
||
storage classes. For example, the s3cmd(1) man page lists valid storage class names for Amazon S3.
|
||
|
||
When using Amazon S3,
|
||
if the remote will be used for backup or archival,
|
||
and so its files are Infrequently Accessed, `STANDARD_IA` is a
|
||
good choice to save money (requires a git-annex built with aws-0.13.0).
|
||
If you have configured git-annex to preserve
|
||
multiple [[copies]], also consider setting this to `ONEZONE_IA`
|
||
to save even more money.
|
||
|
||
Amazon S3's `DEEP_ARCHIVE` is similar to Amazon Glacier. For that,
|
||
use the [[glacier]] special remote, rather than this one.
|
||
|
||
When using Google Cloud Storage, to make a nearline bucket, set this to
|
||
`NEARLINE`. (Requires a git-annex built with aws-0.13.0)
|
||
|
||
Note that changing the storage class of an existing S3 remote will
|
||
affect new objects sent to the remote, but not objects already
|
||
stored there.
|
||
|
||
* `host` - Specify in order to use a different, S3 compatable
|
||
service.
|
||
|
||
* `region` - Specify the region to use. Only makes sense to use when
|
||
you also set `host`.
|
||
(Requires a git-annex built with aws-0.24.)
|
||
|
||
* `protocol` - Either "http" (the default) or "https". Setting
|
||
protocol=https implies port=443.
|
||
|
||
This option was added in git-annex version 7.20190322; to make
|
||
a special remote that uses http with older versions of git-annex,
|
||
explicitly specify port=443.
|
||
|
||
* `port` - Specify the port to connect to. Only needed when using a service
|
||
on an unusual port. Setting port=443 implies protocol=https.
|
||
|
||
* `requeststyle` - Set to "path" to use path style requests, instead of the
|
||
default DNS style requests. This is needed with some S3 services.
|
||
|
||
If you get an error about a host name not existing, it's a good
|
||
indication that you need to use this.
|
||
|
||
* `signature` - This controls the S3 signature version to use.
|
||
"v2" is currently the default, "v4" is needed to use some S3 services.
|
||
If you get some kind of authentication error, try "v4".
|
||
To access a S3 bucket anonymously, use "anonymous".
|
||
|
||
* `bucket` - S3 requires that buckets have a globally unique name,
|
||
so by default, a bucket name is chosen based on the remote name
|
||
and UUID. This can be specified to pick a bucket name.
|
||
|
||
* `versioning` - Indicate whether the S3 bucket should have versioning
|
||
enabled. Set to "yes" to enable.
|
||
|
||
Enabling versioning along with "exporttree=yes"
|
||
allows git-annex to access old versions of files that were
|
||
exported to the special remote by [[git-annex export|git-annex-export]].
|
||
|
||
And enabling versioning along with "importtree=yes"
|
||
allows [[git-annex import|git-annex-import]] to import the whole
|
||
history of files in the bucket, synthesizing a series of git commits.
|
||
|
||
Note that git-annex does not support dropping content from versioned
|
||
S3 buckets, since the versioning preserves the content.
|
||
|
||
* `exporttree` - Set to "yes" to make this special remote usable
|
||
by [[git-annex export|git-annex-export]].
|
||
It will not be usable as a general-purpose special remote.
|
||
|
||
* `importtree` - Set to "yes" to make this special remote usable
|
||
by [[git-annex-import]]. When set in combination with exporttree,
|
||
this lets files be imported from it, and changes exported back to it.
|
||
|
||
Note that exporting files to a S3 bucket may overwrite changes that
|
||
have been made to files in the bucket by other software since the last
|
||
time git-annex imported from the bucket. When versioning is enabled,
|
||
the content of files overwritten in this way can still be recovered,
|
||
but you may have to look through the git history to find them.
|
||
When versioning is not enabled, this risks data loss, and so git-annex
|
||
will not let you enable a remote with that configuration unless forced.
|
||
|
||
* `publicurl` - Configure the URL that is used to download files
|
||
from the bucket. Using this with a S3 bucket that has been configured
|
||
to allow anyone to download its content allows git-annex to download
|
||
files from the S3 remote without needing to know the S3 credentials.
|
||
|
||
To configure the S3 bucket to allow anyone to download its content,
|
||
refer to S3 documentation to set a Bucket Policy.
|
||
|
||
* `public` - Deprecated. This enables public read access to files sent to
|
||
the S3 remote using ACLs. Note that Amazon S3 buckets created after April
|
||
2023 do not support using ACLs in this way and a Bucket Policy must instead
|
||
be used. This should only be set for older buckets.
|
||
|
||
* `partsize` - Amazon S3 only accepts uploads up to a certian file size,
|
||
and storing larger files requires a multipart upload process.
|
||
|
||
Setting `partsize=1GiB` is recommended for Amazon S3 when not using
|
||
chunking; this will cause multipart uploads to be done using parts
|
||
up to 1GiB in size. Note that setting partsize to less than 100MiB
|
||
will cause Amazon S3 to reject uploads.
|
||
|
||
This is not enabled by default, since other S3 implementations may
|
||
not support multipart uploads or have different limits,
|
||
but can be enabled or changed at any time.
|
||
|
||
* `fileprefix` - By default, git-annex places files in a tree rooted at the
|
||
top of the S3 bucket. When this is set, it's prefixed to the filenames
|
||
used. For example, you could set it to "foo/" in one special remote,
|
||
and to "bar/" in another special remote, and both special remotes could
|
||
then use the same bucket.
|
||
|
||
* `x-amz-meta-*` are passed through as http headers when storing keys
|
||
in S3.
|
||
|
||
* `x-archive-meta-*` are passed through as http headers when storing keys
|
||
in the Internet Archive. See
|
||
[the Internet Archive S3 interface documentation](https://archive.org/help/abouts3.txt)
|
||
for example headers.
|