Some git-annex backends allow embedding enough data in the names of keys that it could be used for a SHA1 collision attack. So, a signed git commit could point to a tree with such a key in it, and the blob for the key could have two versions with the same SHA1. Users who want to use git-annex with signed commits to mitigate git's own SHA1 insecurities would like at least a way to disable the insecure git-annex backends: * WORM can contain fairly arbitrary data in a key name * URL too (also, of course, URLs download arbitrary data from the web, so a signed git commit pointing at URL keys doesn't have any security even w/o SHA1 collisions) * SHA1 and MD5 backends are insecure because there can be colliding versions of the data they point to. There could be a config setting, which would prevent git-annex from using keys with such insecure backends. A user who checks git commit signatures could enable the config setting when they initially clone their repository. This should prevent any file contents using insecure backends from being downloaded into the repository. (Even git-annex-shell recvkey would refuse data using such a key, since it would fail parsing the key.) The user would thus know that any file contents in their repository match the files in signed git commits. Enabling the config setting in a repository that already contains file contents would be a mistake, because it might contain insecure keys. And since git-annex would skip over such files, `git annex fsck` cannot warn about such a mistake. Perhaps, then, the config setting should be turned on by `git annex init`? Or, we can document this gotcha. > I've done some groundwork for this, but making git-annex not accept > insecure keys into the repo at all requires changing file2key, > which is a pure function that's used in eg, instances for serailization. > > So, how to make it vary depending on git config? Can't. Alternative > would be to add lots of checks everywhere a key is read from disk > or network, which feels like it would be a hard security boundary to > manage. > > It doesn't really matter if content under an insecure key is in the > repo, as long as there's not a signed commit referencing such a key. > So, we could say, this is up to the user constucting a signed commit, to not > put such keys in the commit. > > Or, we could use the pre-commit hook, and when > the config setting disallows insecure keys, make it reject commits > that contain them. But, if a past commit added a file using an insecure > key, and the current commit does not touch it, should it be rejected? > Rejecting it would then require a somewhat expensive look at the tree > being committed. > > The user might be merging a branch from someone else; there seems no > git hook that can sanity check a fast-forward merge. > > Perhaps leave it up to the person making signed commits to get it > right, and make git annex fsck warn about such keys? That seems > reasonable. --[[Joey]] ---- A few other potential problems: * A symlink target like .git/annex/objects/XX/YY/SHA256--foo might be able to be manipulated to add collision data in the path. For example, .git/annex/objects/collisiondata/../XX/YY/SHA256--foo I think this is not a valid attack, because at least on linux, such a symlink won't be followed, unless the .git/annex/objects/collisiondata directory exists. * `*E` backends could embed sha1 collision data in a long filename extension in a key. Impact is limited, because even if an attacker does this, the key also contains the checksum (eg SHA2) of the annexed data. The current SHA1 attack is only a common-prefix attack; it does not allow creating two colliding keys that contain two different SHA2 checksums. That would need a chosen-prefix attack. It might be worth limiting the length of an extension allowed in such a key to the longest such extension git-annex has ever supported (probably < 20 bytes or so), which would be less than the size of the data needed for current SHA1 collision attacks. Now done; git-annex refuses to use keys with super long extensions. * It might be possible to embed colliding data in a specially constructed key name with an extra field in it, eg "SHA256-cXXXXXXXXXXXXXXX-...". Need to review the code and see if such extra fields are allowed. Update: All fields are numeric, but could contain arbitrary data after the number. Could have been used in a chosen-prefix attack. This has been fixed; git-annex refuses to parse such fields, so it won't work with files that try to exploit this. * A symlink target like .git/annex/objects/XX/YY/SHA256--foo might be able to be manipulated to add collision data in the path. For example, .git/annex/objects/collisiondata/../XX/YY/SHA256--foo I think this is not a valid attack, because at least on linux, such a symlink won't be followed, unless the .git/annex/objects/collisiondata directory exists.