use SHA256 by default

To get old behavior, add a .gitattributes containing: * annex.backend=WORM

I feel that SHA256 is a better default for most people, as long as their
systems are fast enough that checksumming their files isn't a problem.
git-annex should default to preserving the integrity of data as well as git
does. Checksum backends also work better with editing files via
unlock/lock.

I considered just using SHA1, but since that hash is believed to be somewhat
near to being broken, and git-annex deals with large files which would be a
perfect exploit medium, I decided to go to a SHA-2 hash.

SHA512 is annoyingly long when displayed, and git-annex displays it in a
few places (and notably it is shown in ls -l), so I picked the shorter
hash. Considered SHA224 as it's even shorter, but feel it's a bit weird.

I expect git-annex will use SHA-3 at some point in the future, but
probably not soon!

Note that systems without a sha256sum (or sha256) program will fall back to
defaulting to SHA1.
This commit is contained in:
Joey Hess 2011-11-04 15:21:45 -04:00
parent 1089e85d48
commit ef3457196a
8 changed files with 37 additions and 30 deletions

View file

@ -5,17 +5,19 @@ to retrieve the file's content (its value).
Multiple pluggable key-value backends are supported, and a single repository
can use different ones for different files.
* `WORM` ("Write Once, Read Many") This assumes that any file with
the same basename, size, and modification time has the same content.
This is the default, and the least expensive backend.
* `SHA1` -- This uses a key based on a sha1 checksum. This allows
* `SHA256` -- The default backend for new files. This allows
verifying that the file content is right, and can avoid duplicates of
files with the same content. Its need to generate checksums
can make it slower for large files.
* `SHA512`, `SHA384`, `SHA256`, `SHA224` -- Like SHA1, but larger
checksums. Mostly useful for the very paranoid, or anyone who is
researching checksum collisions and wants to annex their colliding data. ;)
* `SHA1E`, `SHA512E`, etc -- Variants that preserve filename extension as
can make it slower for large files.
* `WORM` ("Write Once, Read Many") This assumes that any file with
the same basename, size, and modification time has the same content.
This is the the least expensive backend, recommended for really large
files or slow systems.
* `SHA512` -- Best currently available hash, for the very paranoid.
* `SHA1` -- Smaller hash than `SHA256` for those who want a checksum
but are not concerned about security.
* `SHA384`, `SHA224` -- Hashes for people who like unusual sizes.
* `SHA256E`, `SHA1E`, etc -- Variants that preserve filename extension as
part of the key. Useful for archival tasks where the filename extension
contains metadata that should be preserved.
@ -27,9 +29,11 @@ For finer control of what backend is used when adding different types of
files, the `.gitattributes` file can be used. The `annex.backend`
attribute can be set to the name of the backend to use for matching files.
For example, to use the SHA1 backend for sound files, which tend to be
smallish and might be modified or copied over time, you could set in
`.gitattributes`:
For example, to use the SHA256 backend for sound files, which tend to be
smallish and might be modified or copied over time,
while using the WORM backend for everything else, you could set
in `.gitattributes`:
*.mp3 annex.backend=SHA1
*.ogg annex.backend=SHA1
* annex.backend=WORM
*.mp3 annex.backend=SHA256
*.ogg annex.backend=SHA256