use SHA256 by default
To get old behavior, add a .gitattributes containing: * annex.backend=WORM I feel that SHA256 is a better default for most people, as long as their systems are fast enough that checksumming their files isn't a problem. git-annex should default to preserving the integrity of data as well as git does. Checksum backends also work better with editing files via unlock/lock. I considered just using SHA1, but since that hash is believed to be somewhat near to being broken, and git-annex deals with large files which would be a perfect exploit medium, I decided to go to a SHA-2 hash. SHA512 is annoyingly long when displayed, and git-annex displays it in a few places (and notably it is shown in ls -l), so I picked the shorter hash. Considered SHA224 as it's even shorter, but feel it's a bit weird. I expect git-annex will use SHA-3 at some point in the future, but probably not soon! Note that systems without a sha256sum (or sha256) program will fall back to defaulting to SHA1.
This commit is contained in:
parent
1089e85d48
commit
ef3457196a
8 changed files with 37 additions and 30 deletions
|
@ -26,12 +26,12 @@ import Types.Key
|
||||||
import qualified Types.Backend as B
|
import qualified Types.Backend as B
|
||||||
|
|
||||||
-- When adding a new backend, import it here and add it to the list.
|
-- When adding a new backend, import it here and add it to the list.
|
||||||
import qualified Backend.WORM
|
|
||||||
import qualified Backend.SHA
|
import qualified Backend.SHA
|
||||||
|
import qualified Backend.WORM
|
||||||
import qualified Backend.URL
|
import qualified Backend.URL
|
||||||
|
|
||||||
list :: [Backend Annex]
|
list :: [Backend Annex]
|
||||||
list = Backend.WORM.backends ++ Backend.SHA.backends ++ Backend.URL.backends
|
list = Backend.SHA.backends ++ Backend.WORM.backends ++ Backend.URL.backends
|
||||||
|
|
||||||
{- List of backends in the order to try them when storing a new key. -}
|
{- List of backends in the order to try them when storing a new key. -}
|
||||||
orderedList :: Annex [Backend Annex]
|
orderedList :: Annex [Backend Annex]
|
||||||
|
|
|
@ -16,12 +16,12 @@ import qualified Build.SysConfig as SysConfig
|
||||||
|
|
||||||
type SHASize = Int
|
type SHASize = Int
|
||||||
|
|
||||||
|
-- order is slightly significant; want SHA256 first, and more general
|
||||||
|
-- sizes earlier
|
||||||
sizes :: [Int]
|
sizes :: [Int]
|
||||||
sizes = [1, 256, 512, 224, 384]
|
sizes = [256, 1, 512, 224, 384]
|
||||||
|
|
||||||
backends :: [Backend Annex]
|
backends :: [Backend Annex]
|
||||||
-- order is slightly significant; want sha1 first, and more general
|
|
||||||
-- sizes earlier
|
|
||||||
backends = catMaybes $ map genBackend sizes ++ map genBackendE sizes
|
backends = catMaybes $ map genBackend sizes ++ map genBackendE sizes
|
||||||
|
|
||||||
genBackend :: SHASize -> Maybe (Backend Annex)
|
genBackend :: SHASize -> Maybe (Backend Annex)
|
||||||
|
|
3
debian/changelog
vendored
3
debian/changelog
vendored
|
@ -1,5 +1,8 @@
|
||||||
git-annex (3.20111026) UNRELEASED; urgency=low
|
git-annex (3.20111026) UNRELEASED; urgency=low
|
||||||
|
|
||||||
|
* The default backend used when adding files to the annex is changed
|
||||||
|
from WORM to SHA256.
|
||||||
|
To get old behavior, add a .gitattributes containing: * annex.backend=WORM
|
||||||
* Sped up some operations on remotes that are on the same host.
|
* Sped up some operations on remotes that are on the same host.
|
||||||
* copy --to: Fixed leak when copying many files to a remote on the same
|
* copy --to: Fixed leak when copying many files to a remote on the same
|
||||||
host.
|
host.
|
||||||
|
|
|
@ -5,17 +5,19 @@ to retrieve the file's content (its value).
|
||||||
Multiple pluggable key-value backends are supported, and a single repository
|
Multiple pluggable key-value backends are supported, and a single repository
|
||||||
can use different ones for different files.
|
can use different ones for different files.
|
||||||
|
|
||||||
* `WORM` ("Write Once, Read Many") This assumes that any file with
|
* `SHA256` -- The default backend for new files. This allows
|
||||||
the same basename, size, and modification time has the same content.
|
|
||||||
This is the default, and the least expensive backend.
|
|
||||||
* `SHA1` -- This uses a key based on a sha1 checksum. This allows
|
|
||||||
verifying that the file content is right, and can avoid duplicates of
|
verifying that the file content is right, and can avoid duplicates of
|
||||||
files with the same content. Its need to generate checksums
|
files with the same content. Its need to generate checksums
|
||||||
can make it slower for large files.
|
can make it slower for large files.
|
||||||
* `SHA512`, `SHA384`, `SHA256`, `SHA224` -- Like SHA1, but larger
|
* `WORM` ("Write Once, Read Many") This assumes that any file with
|
||||||
checksums. Mostly useful for the very paranoid, or anyone who is
|
the same basename, size, and modification time has the same content.
|
||||||
researching checksum collisions and wants to annex their colliding data. ;)
|
This is the the least expensive backend, recommended for really large
|
||||||
* `SHA1E`, `SHA512E`, etc -- Variants that preserve filename extension as
|
files or slow systems.
|
||||||
|
* `SHA512` -- Best currently available hash, for the very paranoid.
|
||||||
|
* `SHA1` -- Smaller hash than `SHA256` for those who want a checksum
|
||||||
|
but are not concerned about security.
|
||||||
|
* `SHA384`, `SHA224` -- Hashes for people who like unusual sizes.
|
||||||
|
* `SHA256E`, `SHA1E`, etc -- Variants that preserve filename extension as
|
||||||
part of the key. Useful for archival tasks where the filename extension
|
part of the key. Useful for archival tasks where the filename extension
|
||||||
contains metadata that should be preserved.
|
contains metadata that should be preserved.
|
||||||
|
|
||||||
|
@ -27,9 +29,11 @@ For finer control of what backend is used when adding different types of
|
||||||
files, the `.gitattributes` file can be used. The `annex.backend`
|
files, the `.gitattributes` file can be used. The `annex.backend`
|
||||||
attribute can be set to the name of the backend to use for matching files.
|
attribute can be set to the name of the backend to use for matching files.
|
||||||
|
|
||||||
For example, to use the SHA1 backend for sound files, which tend to be
|
For example, to use the SHA256 backend for sound files, which tend to be
|
||||||
smallish and might be modified or copied over time, you could set in
|
smallish and might be modified or copied over time,
|
||||||
`.gitattributes`:
|
while using the WORM backend for everything else, you could set
|
||||||
|
in `.gitattributes`:
|
||||||
|
|
||||||
*.mp3 annex.backend=SHA1
|
* annex.backend=WORM
|
||||||
*.ogg annex.backend=SHA1
|
*.mp3 annex.backend=SHA256
|
||||||
|
*.ogg annex.backend=SHA256
|
||||||
|
|
|
@ -2,8 +2,8 @@
|
||||||
# cp /tmp/big_file .
|
# cp /tmp/big_file .
|
||||||
# cp /tmp/debian.iso .
|
# cp /tmp/debian.iso .
|
||||||
# git annex add .
|
# git annex add .
|
||||||
add big_file ok
|
add big_file (checksum...) ok
|
||||||
add debian.iso ok
|
add debian.iso (checksum...) ok
|
||||||
# git commit -a -m added
|
# git commit -a -m added
|
||||||
|
|
||||||
When you add a file to the annex and commit it, only a symlink to
|
When you add a file to the annex and commit it, only a symlink to
|
||||||
|
|
|
@ -9,5 +9,5 @@ makes it very easy.
|
||||||
move my_cool_big_file (to usbdrive...) ok
|
move my_cool_big_file (to usbdrive...) ok
|
||||||
# git annex move video/hackity_hack_and_kaxxt.mov --from fileserver
|
# git annex move video/hackity_hack_and_kaxxt.mov --from fileserver
|
||||||
move video/hackity_hack_and_kaxxt.mov (from fileserver...)
|
move video/hackity_hack_and_kaxxt.mov (from fileserver...)
|
||||||
WORM-s86050597-m1274316523--hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02
|
SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e 100% 82MB 199.1KB/s 07:02
|
||||||
ok
|
ok
|
||||||
|
|
|
@ -1,8 +1,8 @@
|
||||||
It's possible for data to accumulate in the annex that no files point to
|
It's possible for data to accumulate in the annex that no files in any
|
||||||
anymore. One way it can happen is if you `git rm` a file without
|
branch point to anymore. One way it can happen is if you `git rm` a file
|
||||||
first calling `git annex drop`. And, when you modify an annexed file, the old
|
without first calling `git annex drop`. And, when you modify an annexed
|
||||||
content of the file remains in the annex. Another way is when migrating
|
file, the old content of the file remains in the annex. Another way is when
|
||||||
between key-value [[backends|backend]].
|
migrating between key-value [[backends|backend]].
|
||||||
|
|
||||||
This might be historical data you want to preserve, so git-annex defaults to
|
This might be historical data you want to preserve, so git-annex defaults to
|
||||||
preserving it. So from time to time, you may want to check for such data and
|
preserving it. So from time to time, you may want to check for such data and
|
||||||
|
@ -12,8 +12,8 @@ eliminate it to save space.
|
||||||
unused . (checking for unused data...)
|
unused . (checking for unused data...)
|
||||||
Some annexed data is no longer used by any files in the repository.
|
Some annexed data is no longer used by any files in the repository.
|
||||||
NUMBER KEY
|
NUMBER KEY
|
||||||
1 WORM-s3-m1289672605--file
|
1 SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e
|
||||||
2 WORM-s14-m1289672605--file
|
2 SHA1-s14--f1358ec1873d57350e3dc62054dc232bc93c2bd1
|
||||||
(To see where data was previously used, try: git log --stat -S'KEY')
|
(To see where data was previously used, try: git log --stat -S'KEY')
|
||||||
(To remove unwanted data: git-annex dropunused NUMBER)
|
(To remove unwanted data: git-annex dropunused NUMBER)
|
||||||
ok
|
ok
|
||||||
|
|
|
@ -13,7 +13,7 @@ Now you can get files and they will be transferred (using `rsync` via `ssh`):
|
||||||
|
|
||||||
# git annex get my_cool_big_file
|
# git annex get my_cool_big_file
|
||||||
get my_cool_big_file (getting UUID for origin...) (from origin...)
|
get my_cool_big_file (getting UUID for origin...) (from origin...)
|
||||||
WORM-s2159-m1285650548--my_cool_big_file 100% 2159 2.1KB/s 00:00
|
SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e 100% 2159 2.1KB/s 00:00
|
||||||
ok
|
ok
|
||||||
|
|
||||||
When you drop files, git-annex will ssh over to the remote and make
|
When you drop files, git-annex will ssh over to the remote and make
|
||||||
|
|
Loading…
Reference in a new issue