Some git-annex backends allow embedding enough data in the names of keys
that it could be used for a SHA1 collision attack. So, a signed git commit
could point to a tree with such a key in it, and the blob for the key could
have two versions with the same SHA1.

Users who want to use git-annex with signed commits to mitigate git's own
SHA1 insecurities would like at least a way to disable the insecure
git-annex backends:

* WORM can contain fairly arbitrary data in a key name
* URL too (also, of course, URLs download arbitrary data from the web,
  so a signed git commit pointing at URL keys doesn't have any security
  even w/o SHA1 collisions)
* SHA1 and MD5 backends are insecure because there can be colliding
  versions of the data they point to.

There could be a config setting, which would prevent git-annex from using
keys with such insecure backends. A user who checks git commit signatures
could enable the config setting when they initially clone their repository.
This should prevent any file contents using insecure backends from being
downloaded into the repository. (Even git-annex-shell recvkey would
refuse data using such a key, since it would fail parsing the key.)
The user would thus know that any file contents in their repository match
the files in signed git commits.

Enabling the config setting in a repository that already contains
file contents would be a mistake, because it might contain insecure keys.
And since git-annex would skip over such files, `git annex fsck` cannot
warn about such a mistake.

Perhaps, then, the config setting should be turned on by `git annex init`?
Or, we can document this gotcha.

> I've done some groundwork for this, but making git-annex not accept
> insecure keys into the repo at all requires changing file2key,
> which is a pure function that's used in eg, instances for serailization.
> 
> So, how to make it vary depending on git config? Can't. Alternative
> would be to add lots of checks everywhere a key is read from disk
> or network, which feels like it would be a hard security boundary to
> manage.
> 
> It doesn't really matter if content under an insecure key is in the
> repo, as long as there's not a signed commit referencing such a key.
> So, we could say, this is up to the user constucting a signed commit, to not
> put such keys in the commit.
> 
> Or, we could use the pre-commit hook, and when 
> the config setting disallows insecure keys, make it reject commits
> that contain them. But, if a past commit added a file using an insecure
> key, and the current commit does not touch it, should it be rejected?
> Rejecting it would then require a somewhat expensive look at the tree
> being committed.
> 
> The user might be merging a branch from someone else; there seems no
> git hook that can sanity check a fast-forward merge.
> 
> Perhaps leave it up to the person making signed commits to get it 
> right, and make git annex fsck warn about such keys? That seems
> reasonable. --[[Joey]]

----

A few other potential problems:

* A symlink target like .git/annex/objects/XX/YY/SHA256--foo
  might be able to be manipulated to add collision data in the path.
  For example, .git/annex/objects/collisiondata/../XX/YY/SHA256--foo

  I think this is not a valid attack, because at least on linux,
  such a symlink won't be followed, unless the
  .git/annex/objects/collisiondata directory exists.

* `*E` backends could embed sha1 collision data in a long filename
  extension in a key.

  Impact is limited, because even if an attacker does this, the key also
  contains the checksum (eg SHA2) of the annexed data. The current SHA1
  attack is only a common-prefix attack; it does not allow creating two
  colliding keys that contain two different SHA2 checksums. That would
  need a chosen-prefix attack.
  
  It might be worth limiting the length
  of an extension allowed in such a key to the longest such extension
  git-annex has ever supported (probably < 20 bytes or so), which would
  be less than the size of the data needed for current SHA1 collision
  attacks. Now done; git-annex refuses to use keys with super
  long extensions.

* It might be possible to embed colliding data in a specially constructed
  key name with an extra field in it, eg "SHA256-cXXXXXXXXXXXXXXX-...".
  Need to review the code and see if such extra fields are allowed.  

  Update: All fields are numeric, but could contain arbitrary data
  after the number. Could have been used in a chosen-prefix attack.
  This has been fixed; git-annex refuses to parse
  such fields, so it won't work with files that try to exploit this.

* A symlink target like .git/annex/objects/XX/YY/SHA256--foo
  might be able to be manipulated to add collision data in the path.
  For example, .git/annex/objects/collisiondata/../XX/YY/SHA256--foo

  I think this is not a valid attack, because at least on linux,
  such a symlink won't be followed, unless the
  .git/annex/objects/collisiondata directory exists.