git-annex/doc/encryption.mdwn
Joey Hess 7fba5dee61
correct documentation, keyid= only works once
keyid+= can be used to add additional key ids later.

I wonder if this broke with changes to remote configs? But I think it's
always been a map, and so only one keyid can be stored and later ones
overwrite earlier ones.

Sponsored-by: Brett Eisenberg on Patreon
2022-08-15 14:25:13 -04:00

149 lines
6.5 KiB
Markdown

[[!toc]]
git-annex mostly does not use encryption. Anyone with access to a git
repository can see all the filenames in it, its history, and can access
any annexed file contents.
Encryption is needed when using [[special_remotes]] like Amazon S3, where
file content is sent to an untrusted party who does not have access to the
git repository.
Such an encrypted remote uses strong ([[symmetric|design/encryption]] or
asymmetric) encryption on the contents of files, as well as HMAC hashing
of the filenames. The size of the encrypted files, and access patterns
of the data, should be the only clues to what is stored in such a
remote.
You should decide whether to use encryption with a special remote before
any data is stored in it. So, `git annex initremote` requires you
to specify "encryption=none" when first setting up a remote in order
to disable encryption. To use encryption, you run
`git-annex initremote` in one of these ways:
* `git annex initremote newremote type=... encryption=hybrid keyid=KEYID`
* `git annex initremote newremote type=... encryption=shared`
* `git annex initremote newremote type=... encryption=pubkey keyid=KEYID`
* `git annex initremote newremote type=... encryption=sharedpubkey keyid=KEYID`
To see what encryption is used for a special remote, run
`git annex info $remote` and look for a line like:
encryption: hybrid (to gpg keys: AEC828149D85C538 C910D9122512E3C8)
## hybrid encryption keys (encryption=hybrid)
The [[hybrid_key_design|design/encryption]] allows additional
encryption keys to be added on to a special remote later. Due to this
flexibility, it is the default and recommended encryption scheme.
git annex initremote newremote type=... [encryption=hybrid] keyid=KEYID
The KEYID is passed to `gpg` to find a gpg key.
Typically, you will say "keyid=2512E3C7" to use a specific gpg key.
Or, you might say "keyid=id@joeyh.name" to search for matching keys.
To add a new key and allow it to access all the content that is stored
in the encrypted special remote, just run `git annex
enableremote` specifying the keyid to add:
git annex enableremote myremote keyid+=788A3F4C
You can repeat this process to add any number of gpg keys, including
your own gpg keys and any public keys of others who you want to give
access. Anyone with a corresponding secret key will be able to decrypt
all content that is stored in the remote.
While a key can later be removed from the list, note that it will **not**
prevent the owner of the key from accessing data on the remote (which is by
design impossible to prevent, short of deleting the remote). In fact the
only sound use of `keyid-=` is probably to replace a revoked key:
git annex enableremote myremote keyid-=2512E3C7 keyid+=788A3F4C
See also [[encryption_design|design/encryption]] for other security
risks associated with encryption.
## shared encryption key (encryption=shared)
Alternatively, you can configure git-annex to use a shared cipher to
encrypt data stored in a remote. This shared cipher is stored,
**unencrypted** in the git repository. So it's shared among every
clone of the git repository.
git annex initremote newremote type=... encryption=shared
The advantage is you don't need to set up gpg keys. The disadvantage is
that this is **insecure** unless you trust every clone of the git
repository with access to the encrypted data stored in the special remote.
## regular public key encryption (encryption=pubkey)
This alternative simply encrypts the files in the special remotes to one or
more public keys. The corresponding private key is needed to store
anything in the remote, or access anything stored in it.
It might be considered more secure due to its simplicity and since
it's exactly the way everyone else uses gpg.
git annex initremote newremote type=.... encryption=pubkey keyid=KEYID
A disadvantage is that it is not easy to later add additional public keys
to the special remote. While the `enableremote` parameters `keyid+=` and
`keyid-=` can be used, they have **no effect** on encrypted files that
are already stored in the remote.
So if you need other public keys to also have access, it's best to add them
immediately after initializing the remote:
git-annex initremote newremote keyid+=788A3F4C
Another use for these parameters is to replace a revoked key:
git annex enableremote myremote keyid-=2512E3C7 keyid+=788A3F4C
But even in this case, since the files are not re-encrypted, the revoked
key has to be kept around to be able to decrypt those files.
(Of course, if the reason for revocation is
that the key has been compromised, it is **insecure** to leave files
encrypted using that old key, and the user should re-encrypt everything.)
(A cipher still needs to be generated (and is encrypted to the given key IDs).
It is only used for HMAC encryption of filenames.)
## regular public key encryption with shared filename encryption (encryption=sharedpubkey)
This is a variation on encryption=pubkey which lets anyone who
has access to the gpg public keys store files in the special remote.
But, only owners of the corresponding gpg private keys can retrieve the files
from the special remote.
git annex initremote newremote type=... encryption=sharedpubkey keyid=KEYID
This might be useful if you want to let others drop off files for you in a
special remote, so that only you can access them.
The filenames used on the special remote are encrypted using HMAC,
which prevents the special remote from seeing the filenames. But, anyone
who can clone the git repository can access the HMAC cipher; it's stored
**unencrypted** in the git repository.
## MAC algorithm
The default MAC algorithm to be applied on the filenames is HMACSHA1. A
stronger one, for instance HMACSHA512, can be chosen upon creation
of the special remote with the option `mac=HMACSHA512`. The available
MAC algorithms are HMACSHA1, HMACSHA224, HMACSHA256, HMACSHA384, and
HMACSHA512. Note that it is not possible to change algorithm for a
non-empty remote.
## credentials storage
Special remotes that need some form of credentials, such as a password,
may support embedding the credentials in the git repository, using
embedcreds=yes. See individual special remotes' documentation for details.
When credentials are embedded in the repository, they're also encrypted using
whatever encryption setting has been selected for the repository.
Such credentials are also cached locally in a file only you can read,
in `.git/annex/creds/`. If you prefer to not expose the credentials on disk
in unencrypted form, you can disable this cache, by setting the
`annex.cachecreds` config to `false`.