use SHA256 by default
To get old behavior, add a .gitattributes containing: * annex.backend=WORM I feel that SHA256 is a better default for most people, as long as their systems are fast enough that checksumming their files isn't a problem. git-annex should default to preserving the integrity of data as well as git does. Checksum backends also work better with editing files via unlock/lock. I considered just using SHA1, but since that hash is believed to be somewhat near to being broken, and git-annex deals with large files which would be a perfect exploit medium, I decided to go to a SHA-2 hash. SHA512 is annoyingly long when displayed, and git-annex displays it in a few places (and notably it is shown in ls -l), so I picked the shorter hash. Considered SHA224 as it's even shorter, but feel it's a bit weird. I expect git-annex will use SHA-3 at some point in the future, but probably not soon! Note that systems without a sha256sum (or sha256) program will fall back to defaulting to SHA1.
This commit is contained in:
parent
1089e85d48
commit
ef3457196a
8 changed files with 37 additions and 30 deletions
|
@ -26,12 +26,12 @@ import Types.Key
|
|||
import qualified Types.Backend as B
|
||||
|
||||
-- When adding a new backend, import it here and add it to the list.
|
||||
import qualified Backend.WORM
|
||||
import qualified Backend.SHA
|
||||
import qualified Backend.WORM
|
||||
import qualified Backend.URL
|
||||
|
||||
list :: [Backend Annex]
|
||||
list = Backend.WORM.backends ++ Backend.SHA.backends ++ Backend.URL.backends
|
||||
list = Backend.SHA.backends ++ Backend.WORM.backends ++ Backend.URL.backends
|
||||
|
||||
{- List of backends in the order to try them when storing a new key. -}
|
||||
orderedList :: Annex [Backend Annex]
|
||||
|
|
|
@ -16,12 +16,12 @@ import qualified Build.SysConfig as SysConfig
|
|||
|
||||
type SHASize = Int
|
||||
|
||||
-- order is slightly significant; want SHA256 first, and more general
|
||||
-- sizes earlier
|
||||
sizes :: [Int]
|
||||
sizes = [1, 256, 512, 224, 384]
|
||||
sizes = [256, 1, 512, 224, 384]
|
||||
|
||||
backends :: [Backend Annex]
|
||||
-- order is slightly significant; want sha1 first, and more general
|
||||
-- sizes earlier
|
||||
backends = catMaybes $ map genBackend sizes ++ map genBackendE sizes
|
||||
|
||||
genBackend :: SHASize -> Maybe (Backend Annex)
|
||||
|
|
3
debian/changelog
vendored
3
debian/changelog
vendored
|
@ -1,5 +1,8 @@
|
|||
git-annex (3.20111026) UNRELEASED; urgency=low
|
||||
|
||||
* The default backend used when adding files to the annex is changed
|
||||
from WORM to SHA256.
|
||||
To get old behavior, add a .gitattributes containing: * annex.backend=WORM
|
||||
* Sped up some operations on remotes that are on the same host.
|
||||
* copy --to: Fixed leak when copying many files to a remote on the same
|
||||
host.
|
||||
|
|
|
@ -5,17 +5,19 @@ to retrieve the file's content (its value).
|
|||
Multiple pluggable key-value backends are supported, and a single repository
|
||||
can use different ones for different files.
|
||||
|
||||
* `WORM` ("Write Once, Read Many") This assumes that any file with
|
||||
the same basename, size, and modification time has the same content.
|
||||
This is the default, and the least expensive backend.
|
||||
* `SHA1` -- This uses a key based on a sha1 checksum. This allows
|
||||
* `SHA256` -- The default backend for new files. This allows
|
||||
verifying that the file content is right, and can avoid duplicates of
|
||||
files with the same content. Its need to generate checksums
|
||||
can make it slower for large files.
|
||||
* `SHA512`, `SHA384`, `SHA256`, `SHA224` -- Like SHA1, but larger
|
||||
checksums. Mostly useful for the very paranoid, or anyone who is
|
||||
researching checksum collisions and wants to annex their colliding data. ;)
|
||||
* `SHA1E`, `SHA512E`, etc -- Variants that preserve filename extension as
|
||||
can make it slower for large files.
|
||||
* `WORM` ("Write Once, Read Many") This assumes that any file with
|
||||
the same basename, size, and modification time has the same content.
|
||||
This is the the least expensive backend, recommended for really large
|
||||
files or slow systems.
|
||||
* `SHA512` -- Best currently available hash, for the very paranoid.
|
||||
* `SHA1` -- Smaller hash than `SHA256` for those who want a checksum
|
||||
but are not concerned about security.
|
||||
* `SHA384`, `SHA224` -- Hashes for people who like unusual sizes.
|
||||
* `SHA256E`, `SHA1E`, etc -- Variants that preserve filename extension as
|
||||
part of the key. Useful for archival tasks where the filename extension
|
||||
contains metadata that should be preserved.
|
||||
|
||||
|
@ -27,9 +29,11 @@ For finer control of what backend is used when adding different types of
|
|||
files, the `.gitattributes` file can be used. The `annex.backend`
|
||||
attribute can be set to the name of the backend to use for matching files.
|
||||
|
||||
For example, to use the SHA1 backend for sound files, which tend to be
|
||||
smallish and might be modified or copied over time, you could set in
|
||||
`.gitattributes`:
|
||||
For example, to use the SHA256 backend for sound files, which tend to be
|
||||
smallish and might be modified or copied over time,
|
||||
while using the WORM backend for everything else, you could set
|
||||
in `.gitattributes`:
|
||||
|
||||
*.mp3 annex.backend=SHA1
|
||||
*.ogg annex.backend=SHA1
|
||||
* annex.backend=WORM
|
||||
*.mp3 annex.backend=SHA256
|
||||
*.ogg annex.backend=SHA256
|
||||
|
|
|
@ -2,8 +2,8 @@
|
|||
# cp /tmp/big_file .
|
||||
# cp /tmp/debian.iso .
|
||||
# git annex add .
|
||||
add big_file ok
|
||||
add debian.iso ok
|
||||
add big_file (checksum...) ok
|
||||
add debian.iso (checksum...) ok
|
||||
# git commit -a -m added
|
||||
|
||||
When you add a file to the annex and commit it, only a symlink to
|
||||
|
|
|
@ -9,5 +9,5 @@ makes it very easy.
|
|||
move my_cool_big_file (to usbdrive...) ok
|
||||
# git annex move video/hackity_hack_and_kaxxt.mov --from fileserver
|
||||
move video/hackity_hack_and_kaxxt.mov (from fileserver...)
|
||||
WORM-s86050597-m1274316523--hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02
|
||||
SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e 100% 82MB 199.1KB/s 07:02
|
||||
ok
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
It's possible for data to accumulate in the annex that no files point to
|
||||
anymore. One way it can happen is if you `git rm` a file without
|
||||
first calling `git annex drop`. And, when you modify an annexed file, the old
|
||||
content of the file remains in the annex. Another way is when migrating
|
||||
between key-value [[backends|backend]].
|
||||
It's possible for data to accumulate in the annex that no files in any
|
||||
branch point to anymore. One way it can happen is if you `git rm` a file
|
||||
without first calling `git annex drop`. And, when you modify an annexed
|
||||
file, the old content of the file remains in the annex. Another way is when
|
||||
migrating between key-value [[backends|backend]].
|
||||
|
||||
This might be historical data you want to preserve, so git-annex defaults to
|
||||
preserving it. So from time to time, you may want to check for such data and
|
||||
|
@ -12,8 +12,8 @@ eliminate it to save space.
|
|||
unused . (checking for unused data...)
|
||||
Some annexed data is no longer used by any files in the repository.
|
||||
NUMBER KEY
|
||||
1 WORM-s3-m1289672605--file
|
||||
2 WORM-s14-m1289672605--file
|
||||
1 SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e
|
||||
2 SHA1-s14--f1358ec1873d57350e3dc62054dc232bc93c2bd1
|
||||
(To see where data was previously used, try: git log --stat -S'KEY')
|
||||
(To remove unwanted data: git-annex dropunused NUMBER)
|
||||
ok
|
||||
|
|
|
@ -13,7 +13,7 @@ Now you can get files and they will be transferred (using `rsync` via `ssh`):
|
|||
|
||||
# git annex get my_cool_big_file
|
||||
get my_cool_big_file (getting UUID for origin...) (from origin...)
|
||||
WORM-s2159-m1285650548--my_cool_big_file 100% 2159 2.1KB/s 00:00
|
||||
SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e 100% 2159 2.1KB/s 00:00
|
||||
ok
|
||||
|
||||
When you drop files, git-annex will ssh over to the remote and make
|
||||
|
|
Loading…
Reference in a new issue