28b29f63dc
Works but some commands may need changes to support special remotes configured this way.
170 lines
7.5 KiB
Markdown
170 lines
7.5 KiB
Markdown
This special remote type stores file contents in a bucket in Amazon S3
|
||
or a similar service.
|
||
|
||
See [[tips/using_Amazon_S3]],
|
||
[[tips/Internet_Archive_via_S3]], and [[tips/using_Google_Cloud_Storage]]
|
||
for usage examples.
|
||
|
||
## configuration
|
||
|
||
The standard environment variables `AWS_ACCESS_KEY_ID` and
|
||
`AWS_SECRET_ACCESS_KEY` are used to supply login credentials for S3. You
|
||
need to set these only when running `git annex initremote` (or
|
||
`enableremote`), as they will be cached in a file only you can read inside
|
||
the local git repository. If you’re working with temporary security
|
||
credentials, you can also set the `AWS_SESSION_TOKEN` environment variable.
|
||
|
||
A number of parameters can be passed to `git annex initremote` to configure
|
||
the S3 remote.
|
||
|
||
* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
|
||
See [[encryption]].
|
||
|
||
* `keyid` - Specifies the gpg key to use for [[encryption]].
|
||
|
||
* `chunk` - Enables [[chunking]] when storing large files.
|
||
`chunk=1MiB` is a good starting point for chunking.
|
||
|
||
* `embedcreds` - Optional. Set to "yes" embed the login credentials inside
|
||
the git repository, which allows other clones to also access them. This is
|
||
the default when gpg encryption is enabled; the credentials are stored
|
||
encrypted and only those with the repository's keys can access them.
|
||
|
||
It is not the default when using shared encryption, or no encryption.
|
||
Think carefully about who can access your repository before using
|
||
embedcreds without gpg encryption.
|
||
|
||
* `datacenter` - Specifies which Amazon datacenter
|
||
to use for the bucket. Defaults to "US". Other values include "EU"
|
||
(which is EU/Ireland), "us-west-1", "us-west-2", "ap-southeast-1",
|
||
"ap-southeast-2", and "sa-east-1". See Amazon's documentation for a
|
||
complete list. Configuring this is equivilant to configuring both
|
||
`host` and `region`.
|
||
|
||
* `storageclass` - Default is "STANDARD".
|
||
Consult S3 provider documentation for pricing details and available
|
||
storage classes. For example, the s3cmd(1) man page lists valid storage class names for Amazon S3.
|
||
|
||
When using Amazon S3,
|
||
if the remote will be used for backup or archival,
|
||
and so its files are Infrequently Accessed, `STANDARD_IA` is a
|
||
good choice to save money (requires a git-annex built with aws-0.13.0).
|
||
If you have configured git-annex to preserve
|
||
multiple [[copies]], also consider setting this to `ONEZONE_IA`
|
||
to save even more money.
|
||
|
||
Amazon S3's `DEEP_ARCHIVE` is similar to Amazon Glacier. For that,
|
||
use the [[glacier]] special remote, rather than this one.
|
||
|
||
When using Google Cloud Storage, to make a nearline bucket, set this to
|
||
`NEARLINE`. (Requires a git-annex built with aws-0.13.0)
|
||
|
||
Note that changing the storage class of an existing S3 remote will
|
||
affect new objects sent to the remote, but not objects already
|
||
stored there.
|
||
|
||
* `host` - Specify in order to use a different, S3 compatible
|
||
service.
|
||
|
||
* `region` - Specify the region to use. Only makes sense to use when
|
||
you also set `host`.
|
||
(Requires a git-annex built with aws-0.24.)
|
||
|
||
* `protocol` - Either "http" (the default) or "https". Setting
|
||
protocol=https implies port=443.
|
||
|
||
This option was added in git-annex version 7.20190322; to make
|
||
a special remote that uses http with older versions of git-annex,
|
||
explicitly specify port=443.
|
||
|
||
* `port` - Specify the port to connect to. Only needed when using a service
|
||
on an unusual port. Setting port=443 implies protocol=https.
|
||
|
||
* `requeststyle` - Set to "path" to use path style requests, instead of the
|
||
default DNS style requests. This is needed with some S3 services.
|
||
|
||
If you get an error about a host name not existing, it's a good
|
||
indication that you need to use this.
|
||
|
||
* `signature` - This controls the S3 signature version to use.
|
||
"v2" is currently the default, "v4" is needed to use some S3 services.
|
||
If you get some kind of authentication error, try "v4".
|
||
To access a S3 bucket anonymously, use "anonymous".
|
||
|
||
* `bucket` - S3 requires that buckets have a globally unique name,
|
||
so by default, a bucket name is chosen based on the remote name
|
||
and UUID. This can be specified to pick a bucket name.
|
||
|
||
* `versioning` - Indicate whether the S3 bucket should have versioning
|
||
enabled. Set to "yes" to enable.
|
||
|
||
Enabling versioning along with "exporttree=yes"
|
||
allows git-annex to access old versions of files that were
|
||
exported to the special remote by [[git-annex export|git-annex-export]].
|
||
|
||
And enabling versioning along with "importtree=yes"
|
||
allows [[git-annex import|git-annex-import]] to import the whole
|
||
history of files in the bucket, synthesizing a series of git commits.
|
||
|
||
Note that git-annex does not support dropping content from versioned
|
||
S3 buckets, since the versioning preserves the content.
|
||
|
||
* `exporttree` - Set to "yes" to make this special remote usable
|
||
by [[git-annex export|git-annex-export]].
|
||
It will not be usable as a general-purpose special remote.
|
||
|
||
* `importtree` - Set to "yes" to make this special remote usable
|
||
by [[git-annex-import]]. When set in combination with exporttree,
|
||
this lets files be imported from it, and changes exported back to it.
|
||
|
||
Note that exporting files to a S3 bucket may overwrite changes that
|
||
have been made to files in the bucket by other software since the last
|
||
time git-annex imported from the bucket. When versioning is enabled,
|
||
the content of files overwritten in this way can still be recovered,
|
||
but you may have to look through the git history to find them.
|
||
When versioning is not enabled, this risks data loss, and so git-annex
|
||
will not let you enable a remote with that configuration unless forced.
|
||
|
||
* `annexobjects` - When set to "yes" along with "exporttree=yes",
|
||
this allows storing other objects in the remote along with the
|
||
exported tree. They will be stored under .git/annex/objects/ in the
|
||
remote.
|
||
|
||
* `publicurl` - Configure the URL that is used to download files
|
||
from the bucket. Using this with a S3 bucket that has been configured
|
||
to allow anyone to download its content allows git-annex to download
|
||
files from the S3 remote without needing to know the S3 credentials.
|
||
|
||
To configure the S3 bucket to allow anyone to download its content,
|
||
refer to S3 documentation to set a Bucket Policy.
|
||
|
||
* `public` - Deprecated. This enables public read access to files sent to
|
||
the S3 remote using ACLs. Note that Amazon S3 buckets created after April
|
||
2023 do not support using ACLs in this way and a Bucket Policy must instead
|
||
be used. This should only be set for older buckets.
|
||
|
||
* `partsize` - Amazon S3 only accepts uploads up to a certian file size,
|
||
and storing larger files requires a multipart upload process.
|
||
|
||
Setting `partsize=1GiB` is recommended for Amazon S3 when not using
|
||
chunking; this will cause multipart uploads to be done using parts
|
||
up to 1GiB in size. Note that setting partsize to less than 100MiB
|
||
will cause Amazon S3 to reject uploads.
|
||
|
||
This is not enabled by default, since other S3 implementations may
|
||
not support multipart uploads or have different limits,
|
||
but can be enabled or changed at any time.
|
||
|
||
* `fileprefix` - By default, git-annex places files in a tree rooted at the
|
||
top of the S3 bucket. When this is set, it's prefixed to the filenames
|
||
used. For example, you could set it to "foo/" in one special remote,
|
||
and to "bar/" in another special remote, and both special remotes could
|
||
then use the same bucket.
|
||
|
||
* `x-amz-meta-*` are passed through as http headers when storing keys
|
||
in S3.
|
||
|
||
* `x-archive-meta-*` are passed through as http headers when storing keys
|
||
in the Internet Archive. See
|
||
[the Internet Archive S3 interface documentation](https://archive.org/help/abouts3.txt)
|
||
for example headers.
|