git-annex/doc/encryption.mdwn
guilhem 8293ed619f Allow public-key encryption of file content.
With the initremote parameters "encryption=pubkey keyid=788A3F4C".

/!\ Adding or removing a key has NO effect on files that have already
been copied to the remote. Hence using keyid+= and keyid-= with such
remotes should be used with care, and make little sense unless the point
is to replace a (sub-)key by another. /!\

Also, a test case has been added to ensure that the cipher and file
contents are encrypted as specified by the chosen encryption scheme.
2013-09-03 14:34:16 -04:00

89 lines
4.2 KiB
Markdown

git-annex mostly does not use encryption. Anyone with access to a git
repository can see all the filenames in it, its history, and can access
any annexed file contents.
Encryption is needed when using [[special_remotes]] like Amazon S3, where
file content is sent to an untrusted party who does not have access to the
git repository.
Such an encrypted remote uses strong ([[symmetric|design/encryption]] or
asymmetric) encryption on the contents of files, as well as HMAC hashing
of the filenames. The size of the encrypted files, and access patterns
of the data, should be the only clues to what is stored in such a
remote.
You should decide whether to use encryption with a special remote before
any data is stored in it. So, `git annex initremote` requires you
to specify "encryption=none" when first setting up a remote in order
to disable encryption.
If you want to generate a cipher that will be used to symmetrically
encrypt file contents, run `git annex initremote` with
"encryption=hybrid keyid=USERID". The value will be passed to `gpg` to
find encryption keys. Typically, you will say "keyid=2512E3C7" to use a
specific gpg key. Or, you might say "keyid=joey@kitenet.net" to search
for matching keys.
The default MAC algorithm to be applied on the filenames is HMACSHA1. A
stronger one, for instance HMACSHA512, one can be chosen upon creation
of the special remote with the option `mac=HMACSHA512`. The available
MAC algorithms are HMACSHA1, HMACSHA224, HMACSHA256, HMACSHA384, and
HMACSHA512. Note that it is not possible to change algorithm for a
non-empty remote.
The [[encryption_design|design/encryption]] allows additional encryption keys
to be added on to a special remote later. Once a key is added, it is able
to access content that has already been stored in the special remote.
To add a new key, just run `git annex enableremote` specifying the
new encryption key:
git annex enableremote myremote keyid+=788A3F4C
While a key can later be removed from the list, it is to be noted that
it does **not** necessarily prevent the owner of the private material
from accessing data on the remote (which is by design impossible, short
of deleting the remote). In fact the only sound use of `keyid-=` is
probably to replace a (sub-)key by another, where the private part of
both is owned by the same person/entity:
git annex enableremote myremote keyid-=2512E3C7 keyid+=788A3F4C
See also [[encryption_design|design/encryption]] for other security
risks associated with encryption.
## shared cipher mode
Alternatively, you can configure git-annex to use a shared cipher to
encrypt data stored in a remote. This shared cipher is stored,
**unencrypted** in the git repository. So it's shared among every
clone of the git repository. The advantage is you don't need to set up gpg
keys. The disadvantage is that this is **insecure** unless you
trust every clone of the git repository with access to the encrypted data
stored in the special remote.
To use shared encryption, specify "encryption=shared" when first setting
up a special remote.
## strict public-key encryption
Special remotes can also be configured to encrypt file contents using
public-key cryptography. It is significatly slower than symmetric
encryption, but is also generally considered more secure. Note that
because filenames are MAC'ed, a cipher needs to be generated (and
encrypted to the given key ID).
A disavantage is that is not possible to give/revoke anyone's access to
a non-empty remote. Indeed, although the parameters `keyid+=` and
`keyid-=` still apply, they have **no effect** on files that are already
present on the remote. In fact the only sound use of `keyid+=` and
`keyid-=` is probably, as `keyid-=` for "encryption=hybrid", to replace
a (sub-)key by another.
Also, since already uploaded files are not re-encrypted, one needs to
keep the private part of removed keys (with `keyid-=`) to be able to
decrypt these files. On the other hand, if the reason for revocation is
that the key has been compromised, it is **insecure** to leave files
encrypted using that old key, and the user should re-encrypt everything.
To use strict public-key encryption, specify "encryption=pubkey
keyid=USERID" when first setting up a special remote.