git-annex/doc/backends.mdwn
Joey Hess ef3457196a use SHA256 by default
To get old behavior, add a .gitattributes containing: * annex.backend=WORM

I feel that SHA256 is a better default for most people, as long as their
systems are fast enough that checksumming their files isn't a problem.
git-annex should default to preserving the integrity of data as well as git
does. Checksum backends also work better with editing files via
unlock/lock.

I considered just using SHA1, but since that hash is believed to be somewhat
near to being broken, and git-annex deals with large files which would be a
perfect exploit medium, I decided to go to a SHA-2 hash.

SHA512 is annoyingly long when displayed, and git-annex displays it in a
few places (and notably it is shown in ls -l), so I picked the shorter
hash. Considered SHA224 as it's even shorter, but feel it's a bit weird.

I expect git-annex will use SHA-3 at some point in the future, but
probably not soon!

Note that systems without a sha256sum (or sha256) program will fall back to
defaulting to SHA1.
2011-11-04 15:51:01 -04:00

39 lines
1.9 KiB
Markdown

When a file is annexed, a key is generated from its content and/or metadata.
The file checked into git symlinks to the key. This key can later be used
to retrieve the file's content (its value).
Multiple pluggable key-value backends are supported, and a single repository
can use different ones for different files.
* `SHA256` -- The default backend for new files. This allows
verifying that the file content is right, and can avoid duplicates of
files with the same content. Its need to generate checksums
can make it slower for large files.
* `WORM` ("Write Once, Read Many") This assumes that any file with
the same basename, size, and modification time has the same content.
This is the the least expensive backend, recommended for really large
files or slow systems.
* `SHA512` -- Best currently available hash, for the very paranoid.
* `SHA1` -- Smaller hash than `SHA256` for those who want a checksum
but are not concerned about security.
* `SHA384`, `SHA224` -- Hashes for people who like unusual sizes.
* `SHA256E`, `SHA1E`, etc -- Variants that preserve filename extension as
part of the key. Useful for archival tasks where the filename extension
contains metadata that should be preserved.
The `annex.backends` git-config setting can be used to list the backends
git-annex should use. The first one listed will be used by default when
new files are added.
For finer control of what backend is used when adding different types of
files, the `.gitattributes` file can be used. The `annex.backend`
attribute can be set to the name of the backend to use for matching files.
For example, to use the SHA256 backend for sound files, which tend to be
smallish and might be modified or copied over time,
while using the WORM backend for everything else, you could set
in `.gitattributes`:
* annex.backend=WORM
*.mp3 annex.backend=SHA256
*.ogg annex.backend=SHA256