79 lines
		
	
	
	
		
			3.6 KiB
			
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			79 lines
		
	
	
	
		
			3.6 KiB
			
		
	
	
	
		
			Markdown
		
	
	
	
	
	
Git uses SHA1, which is becoming increasingly broken. Using git-annex
 | 
						|
and signed commits, we can work around the weaknesses of SHA1, and
 | 
						|
let anyone who clones a repository verify that the data they receive
 | 
						|
is the same data that was originally commited to it.
 | 
						|
 | 
						|
This is recommended if you are storing any kind of binary 
 | 
						|
files in a git repository.
 | 
						|
 | 
						|
## Configuring git-annex
 | 
						|
 | 
						|
You need git-annex 6.20170228. Upgrade if you don't have it.
 | 
						|
 | 
						|
git-annex can use many types of [[backends]] and not all of them are
 | 
						|
secure. So, you need to configure git-annex to only use
 | 
						|
cryptographically secure hashes.
 | 
						|
 | 
						|
	git annex config --set annex.securehashesonly true
 | 
						|
 | 
						|
Each new clone of the repository will then inherit that configuration.
 | 
						|
But, any existing clones will not, so this should be run in them:
 | 
						|
 | 
						|
	git config annex.securehashesonly true
 | 
						|
 | 
						|
## Signed commits
 | 
						|
 | 
						|
It's important that all commits to the git repository are signed.
 | 
						|
Use `git commit --gpg-sign`, or enable the commit.gpgSign configuration.
 | 
						|
 | 
						|
Use `git log --show-signature` to check the signatures of commits.
 | 
						|
If the signature is valid, it guarantees that all annexed files
 | 
						|
have the same content that was orignally committed.
 | 
						|
 | 
						|
## Why is this more secure than git alone?
 | 
						|
 | 
						|
SHA1 collisions exist now, and can be produced using a common-prefix
 | 
						|
attack. See <https://shattered.io/>. Let's assume that a chosen-prefix
 | 
						|
attack against SHA1 will also become feasible too. However, a full preimage
 | 
						|
attack still seems unlikely, so we won't consider such attacks in the
 | 
						|
analysis below.
 | 
						|
 | 
						|
The reason that git-annex can work around git's problematic use of SHA1 is
 | 
						|
that git-annex uses other, [[stronger hashes|backends]] of the contents of
 | 
						|
annexed files. For example, an annexed file may be a symlink to
 | 
						|
".git/annex/objects/Ab/Cd/SHA256--eb45a55eb8756646e244e6c5f47349294568d58a9321244f4ee09a163da23a27".
 | 
						|
 | 
						|
Such a symlink is stored as a git blob object. The SHA1 of the git blobs
 | 
						|
are listed in a git tree object, and the git commit object contains the
 | 
						|
SHA1 of the tree. Finally, the commit object is gpg signed.
 | 
						|
 | 
						|
So, by checking the signature of a commit (`git log --show-signature`),
 | 
						|
you can verify that this is the same commit that was originally made
 | 
						|
to the repository. As far as the git developers know, there is no way
 | 
						|
to produce multiple colliding git tree objects (at least not without
 | 
						|
creating files with spectacularly ugly and long names), so you
 | 
						|
know that the tree object pointed to by the signed commit is the original one.
 | 
						|
 | 
						|
Now, what about the blob objects that the tree lists? If these blobs
 | 
						|
were regular git files, a SHA1 collision could mean your git repository
 | 
						|
does not contain the same file that was orignally committed, and the signed
 | 
						|
commit would not help.
 | 
						|
 | 
						|
But, if the blob object is a git-annex symlink target, it has to contain the
 | 
						|
strong hash of the file content. If a SHA1 collision swaps in some other
 | 
						|
blob object, it will need to contain the strong hash of a different file's
 | 
						|
content. The current common-prefix attack cannot do that.
 | 
						|
 | 
						|
A chosen-prefix attack could make two strong hashes SHA1 the same,
 | 
						|
but it would need to include additional data after the hash to do it. Since
 | 
						|
git-annex version 6.20170224, there is no place for an attacker to
 | 
						|
put such data in a git-symlink target. (See
 | 
						|
[[todo/sha1_collision_embedding_in_git-annex_keys]] for details
 | 
						|
of how this was prevented.)
 | 
						|
 | 
						|
So, we have a SHA1 chain from the gpg signature to the git-annex symlink target,
 | 
						|
and at no point in the chain is a SHA1 collision attack feasible.
 | 
						|
Finally, git-annex verifies the strong hash when transferring
 | 
						|
the content of a file into the repository (and `git annex fsck` verifies it
 | 
						|
too), and so the content that the symlink is pointing to must be the same
 | 
						|
content that was originally committed.
 |