make fsck check annex.securehashesonly, and new tip for working around SHA1 collisions with git-annex

This commit was sponsored by andrea rota.
This commit is contained in:
Joey Hess 2017-02-27 13:50:00 -04:00
parent 07f1e638ee
commit 942e0174b3
No known key found for this signature in database
GPG key ID: C910D9222512E3C7
4 changed files with 106 additions and 4 deletions

View file

@ -2,9 +2,11 @@ git-annex (6.20170215) UNRELEASED; urgency=medium
* Cryptographically secure hashes can be forced to be used in a * Cryptographically secure hashes can be forced to be used in a
repository, by setting annex.securehashesonly. repository, by setting annex.securehashesonly.
This does not prevent the git repository from containing files This does not prevent the git repository from containing links
with insecure hashes, but it does prevent the content of such files to insecure hashes, but it does prevent the content of such files
from being added to .git/annex/objects. from being added to .git/annex/objects by any method.
* fsck: Warn about any files whose content is present, that don't
use secure hashes, when annex.securehashesonly is set.
* sync, merge: Fail when the current branch has no commits yet, instead * sync, merge: Fail when the current branch has no commits yet, instead
of not merging in anything from remotes and appearing to succeed. of not merging in anything from remotes and appearing to succeed.
* Run ssh with -n whenever input is not being piped into it, * Run ssh with -n whenever input is not being piped into it,

View file

@ -1,6 +1,6 @@
{- git-annex command {- git-annex command
- -
- Copyright 2010-2016 Joey Hess <id@joeyh.name> - Copyright 2010-2017 Joey Hess <id@joeyh.name>
- -
- Licensed under the GNU GPL version 3 or higher. - Licensed under the GNU GPL version 3 or higher.
-} -}
@ -35,6 +35,7 @@ import Utility.PID
import qualified Database.Keys import qualified Database.Keys
import qualified Database.Fsck as FsckDb import qualified Database.Fsck as FsckDb
import Types.CleanupActions import Types.CleanupActions
import Types.Key
import Data.Time.Clock.POSIX import Data.Time.Clock.POSIX
import System.Posix.Types (EpochTime) import System.Posix.Types (EpochTime)
@ -234,6 +235,14 @@ verifyLocationLog key keystatus desc = do
whenM (liftIO $ doesDirectoryExist $ parentDir obj) $ whenM (liftIO $ doesDirectoryExist $ parentDir obj) $
freezeContentDir obj freezeContentDir obj
{- Warn when annex.securehashesonly is set and content using an
- insecure hash is present. This should only be able to happen
- if the repository already contained the content before the
- config was set. -}
when (present && not (cryptographicallySecure (keyVariety key))) $
whenM (annexSecureHashesOnly <$> Annex.getGitConfig) $
warning $ "** Despite annex.securehashesonly being set, " ++ obj ++ " has content present in the annex using an insecure " ++ formatKeyVariety (keyVariety key) ++ " key"
{- In direct mode, modified files will show up as not present, {- In direct mode, modified files will show up as not present,
- but that is expected and not something to do anything about. -} - but that is expected and not something to do anything about. -}
if direct && not present if direct && not present

View file

@ -829,6 +829,18 @@ Here are all the supported configuration settings.
This is overridden by annex annex.backend configuration in the This is overridden by annex annex.backend configuration in the
.gitattributes files. .gitattributes files.
* `annex.securehashesonly`
Set to true to indicate that the repository should only use
cryptographically secure hashes
(SHA2, SHA3) and not insecure hashes (MD5, SHA1) for content.
When this is set, the contents of files using cryptographically
insecure hashes will not be allowed to be added to the repository.
Also, git-annex fsck` will complain about any files present in
the repository that use insecure hashes.
* `annex.diskreserve` * `annex.diskreserve`
Amount of disk space to reserve. Disk space is checked when transferring Amount of disk space to reserve. Disk space is checked when transferring

View file

@ -0,0 +1,79 @@
Git uses SHA1, which is becoming increasingly broken. Using git-annex
and signed commits, we can work around the weaknesses of SHA1, and
let anyone who clones a repository verify that the data they receive
is the same data that was originally commited to it.
This is recommended if you are storing any kind of binary
files in a git repository.
### How to do it
You need git-annex 6.20170228. Upgrade if you don't have it.
git-annex can use many types of [[backends]] and not all of them are
secure. So, you need to configure git-annex to only use
cryptographically secure hashes. Also, let's make sure annex.verify
is set (it is by default, but let's override any global gitconfig setting
for it).
git config annex.securehashesonly true
git config annex.verify true
That needs to be run in every clone of the repository. This will prevent
any annexed object using an insecure hash from reaching your repository,
and it will verify the hashes when transferring objects.
It's important that all commits to the git repository are signed.
Use `git commit --gpg-sign`, or enable the commit.gpgSign configuration.
Use `git log --show-signature` to check the signatures of commits.
If the signature is valid, it guarantees that all annexed files
have the same content that was orignally committed.
### Why is this more secure than git alone?
SHA1 collisions exist now, and can be produced using a common-prefix
attack. See <https://shattered.io/>. Let's assume that a chosen-prefix
attack against SHA1 will also become feasible too. However, a full preimage
attack still seems unlikely, so we won't consider such attacks in the
analysis below.
The reason that git-annex can work around git's problematic use of SHA1 is
that git-annex uses other, [[stronger hashes|backends]] of the contents of
annexed files. For example, an annexed file may be a symlink to
".git/annex/objects/Ab/Cd/SHA256--eb45a55eb8756646e244e6c5f47349294568d58a9321244f4ee09a163da23a27".
Such a symlink is stored as a git blob object. The SHA1 of the git blobs
are listed in a git tree object, and the git commit object contains the
SHA1 of the tree. Finally, the commit object is gpg signed.
So, by checking the signature of a commit (`git log --show-signature`),
you can verify that this is the same commit that was originally made
to the repository. As far as the git developers know, there is no way
to produce multiple colliding git tree objects (at least not without
creating files with spectacularly ugly and long names), so you
know that the tree object pointed to by the signed commit is the original one.
Now, what about the blob objects that the tree lists? If these blobs
were regular git files, a SHA1 collision could mean your git repository
does not contain the same file that was orignally committed, and the signed
commit would not help.
But, if the blob object is a git-annex symlink target, it has to contain the
strong hash of the file content. If a SHA1 collision swaps in some other
blob object, it will need to contain the strong hash of a different file's
content. The current common-prefix attack cannot do that.
A chosen-prefix attack could make two strong hashes SHA1 the same,
but it would need to include additional data after the hash to do it. Since
git-annex version 6.20170224, there is no place for an attacker to
put such data in a git-symlink target. (See
[[todo/sha1_collision_embedding_in_git-annex_keys]] for details
of how this was prevented.)
So, we have a SHA1 chain from the gpg signature to the git-annex symlink target,
and at no point in the chain is a SHA1 collision attack feasible.
Finally, git-annex verifies the strong hash when transferring
the content of a file into the repository (and `git annex fsck` verifies it
too), and so the content that the symlink is pointing to must be the same
content that was originally committed.