git-annex/doc/todo/encrypted_keys_in_git_repository.mdwn

The idea is to have a type of key that is based on a hash of the annexed
file, but with some form of encryption or password protection. 

So someone who can access the git repository is prevented from checking
known hashes of files, and learning about content in the repository that is
not available to them. (However, they would be able to see filenames
and commit metadata, which might also expose information about what the
files are. This is not a fully encrypted git repository like can be made
with gcrypt.)

This might be an added layer of security, or the git repository might be
made public, including perhaps some of the annexed files, while this is
used to obscure the hashes of some other annexed files. For example, a
scientific dataset might make public files that are derived from patient
data, and use this for the patient data files.

----

If two such repositories were merged, git-annex would need to somehow
be able to tell how to decrypt a key, which could have come from either
repository. So it seems that the key needs to include within it some
identifier of the secret that is used to encrypt it. For example:

    ENC--ident--foo

Where ident would be something like a UUID of the secret, and foo is the
file's hash (and perhaps size, or indeed a whole regular key) that is
encrypted by the secret.

----

Should the key's size be included only in encrypted form, or in plaintext?

Since git-annex constructs a Key without using the associated Backend,
it currently parses the size field the same way for each type of key.
So, it does not seem to be possible to encrypt the size field and 
use that encrypted size to populate keySize. What would be
doable though, is to replace uses of keySize with a new Backend method that
returns the size.

----

Should the encryption method be reversible?

If the encryption method is not reversible, the key's size would need to be
included in plaintext, or left out entirely.

But it does not seem necessary for the encryption method to be reversible
otherwise. Consider if scrypt was used. When adding a file, git-annex would
first hash it, and then run it through scrypt. That is not reversible,
so when fscking a file, just repeat the same process and compare the
resulting scrypt keys.

Not being reversible is a nice benefit, because it makes it much harder
for an attacker to brute-force. If it's reversible the attacker can
brute-force the user's password, looking for a password that decrypts
to something that looks right.

--[[Joey]]
idea 2022-12-27 17:48:53 +00:00			`The idea is to have a type of key that is based on a hash of the annexed`
			`file, but with some form of encryption or password protection.`

			`So someone who can access the git repository is prevented from checking`
			`known hashes of files, and learning about content in the repository that is`
			`not available to them. (However, they would be able to see filenames`
			`and commit metadata, which might also expose information about what the`
			`files are. This is not a fully encrypted git repository like can be made`
			`with gcrypt.)`

			`This might be an added layer of security, or the git repository might be`
			`made public, including perhaps some of the annexed files, while this is`
			`used to obscure the hashes of some other annexed files. For example, a`
			`scientific dataset might make public files that are derived from patient`
			`data, and use this for the patient data files.`

			`----`

			`If two such repositories were merged, git-annex would need to somehow`
			`be able to tell how to decrypt a key, which could have come from either`
			`repository. So it seems that the key needs to include within it some`
			`identifier of the secret that is used to encrypt it. For example:`

			`ENC--ident--foo`

			`Where ident would be something like a UUID of the secret, and foo is the`
			`file's hash (and perhaps size, or indeed a whole regular key) that is`
			`encrypted by the secret.`

			`----`

			`Should the key's size be included only in encrypted form, or in plaintext?`

			`Since git-annex constructs a Key without using the associated Backend,`
			`it currently parses the size field the same way for each type of key.`
			`So, it does not seem to be possible to encrypt the size field and`
			`use that encrypted size to populate keySize. What would be`
			`doable though, is to replace uses of keySize with a new Backend method that`
			`returns the size.`

			`----`

			`Should the encryption method be reversible?`

			`If the encryption method is not reversible, the key's size would need to be`
			`included in plaintext, or left out entirely.`

			`But it does not seem necessary for the encryption method to be reversible`
			`otherwise. Consider if scrypt was used. When adding a file, git-annex would`
			`first hash it, and then run it through scrypt. That is not reversible,`
			`so when fscking a file, just repeat the same process and compare the`
			`resulting scrypt keys.`

			`Not being reversible is a nice benefit, because it makes it much harder`
			`for an attacker to brute-force. If it's reversible the attacker can`
			`brute-force the user's password, looking for a password that decrypts`
			`to something that looks right.`

			`--[[Joey]]`