diff --git a/Backend.hs b/Backend.hs index a09fc0e990..9a40e54598 100644 --- a/Backend.hs +++ b/Backend.hs @@ -26,12 +26,12 @@ import Types.Key import qualified Types.Backend as B -- When adding a new backend, import it here and add it to the list. -import qualified Backend.WORM import qualified Backend.SHA +import qualified Backend.WORM import qualified Backend.URL list :: [Backend Annex] -list = Backend.WORM.backends ++ Backend.SHA.backends ++ Backend.URL.backends +list = Backend.SHA.backends ++ Backend.WORM.backends ++ Backend.URL.backends {- List of backends in the order to try them when storing a new key. -} orderedList :: Annex [Backend Annex] diff --git a/Backend/SHA.hs b/Backend/SHA.hs index 3a54a8871b..d449821172 100644 --- a/Backend/SHA.hs +++ b/Backend/SHA.hs @@ -16,12 +16,12 @@ import qualified Build.SysConfig as SysConfig type SHASize = Int +-- order is slightly significant; want SHA256 first, and more general +-- sizes earlier sizes :: [Int] -sizes = [1, 256, 512, 224, 384] +sizes = [256, 1, 512, 224, 384] backends :: [Backend Annex] --- order is slightly significant; want sha1 first, and more general --- sizes earlier backends = catMaybes $ map genBackend sizes ++ map genBackendE sizes genBackend :: SHASize -> Maybe (Backend Annex) diff --git a/debian/changelog b/debian/changelog index e59b4f4048..e74a190ba5 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,5 +1,8 @@ git-annex (3.20111026) UNRELEASED; urgency=low + * The default backend used when adding files to the annex is changed + from WORM to SHA256. + To get old behavior, add a .gitattributes containing: * annex.backend=WORM * Sped up some operations on remotes that are on the same host. * copy --to: Fixed leak when copying many files to a remote on the same host. diff --git a/doc/backends.mdwn b/doc/backends.mdwn index ebcdedc2a7..2030d107a3 100644 --- a/doc/backends.mdwn +++ b/doc/backends.mdwn @@ -5,17 +5,19 @@ to retrieve the file's content (its value). Multiple pluggable key-value backends are supported, and a single repository can use different ones for different files. -* `WORM` ("Write Once, Read Many") This assumes that any file with - the same basename, size, and modification time has the same content. - This is the default, and the least expensive backend. -* `SHA1` -- This uses a key based on a sha1 checksum. This allows +* `SHA256` -- The default backend for new files. This allows verifying that the file content is right, and can avoid duplicates of files with the same content. Its need to generate checksums - can make it slower for large files. -* `SHA512`, `SHA384`, `SHA256`, `SHA224` -- Like SHA1, but larger - checksums. Mostly useful for the very paranoid, or anyone who is - researching checksum collisions and wants to annex their colliding data. ;) -* `SHA1E`, `SHA512E`, etc -- Variants that preserve filename extension as + can make it slower for large files. +* `WORM` ("Write Once, Read Many") This assumes that any file with + the same basename, size, and modification time has the same content. + This is the the least expensive backend, recommended for really large + files or slow systems. +* `SHA512` -- Best currently available hash, for the very paranoid. +* `SHA1` -- Smaller hash than `SHA256` for those who want a checksum + but are not concerned about security. +* `SHA384`, `SHA224` -- Hashes for people who like unusual sizes. +* `SHA256E`, `SHA1E`, etc -- Variants that preserve filename extension as part of the key. Useful for archival tasks where the filename extension contains metadata that should be preserved. @@ -27,9 +29,11 @@ For finer control of what backend is used when adding different types of files, the `.gitattributes` file can be used. The `annex.backend` attribute can be set to the name of the backend to use for matching files. -For example, to use the SHA1 backend for sound files, which tend to be -smallish and might be modified or copied over time, you could set in -`.gitattributes`: +For example, to use the SHA256 backend for sound files, which tend to be +smallish and might be modified or copied over time, +while using the WORM backend for everything else, you could set +in `.gitattributes`: - *.mp3 annex.backend=SHA1 - *.ogg annex.backend=SHA1 + * annex.backend=WORM + *.mp3 annex.backend=SHA256 + *.ogg annex.backend=SHA256 diff --git a/doc/walkthrough/adding_files.mdwn b/doc/walkthrough/adding_files.mdwn index 77a7fbc154..d1b5a04f77 100644 --- a/doc/walkthrough/adding_files.mdwn +++ b/doc/walkthrough/adding_files.mdwn @@ -2,8 +2,8 @@ # cp /tmp/big_file . # cp /tmp/debian.iso . # git annex add . - add big_file ok - add debian.iso ok + add big_file (checksum...) ok + add debian.iso (checksum...) ok # git commit -a -m added When you add a file to the annex and commit it, only a symlink to diff --git a/doc/walkthrough/moving_file_content_between_repositories.mdwn b/doc/walkthrough/moving_file_content_between_repositories.mdwn index 27dffe9138..3ffcc11750 100644 --- a/doc/walkthrough/moving_file_content_between_repositories.mdwn +++ b/doc/walkthrough/moving_file_content_between_repositories.mdwn @@ -9,5 +9,5 @@ makes it very easy. move my_cool_big_file (to usbdrive...) ok # git annex move video/hackity_hack_and_kaxxt.mov --from fileserver move video/hackity_hack_and_kaxxt.mov (from fileserver...) - WORM-s86050597-m1274316523--hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02 + SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e 100% 82MB 199.1KB/s 07:02 ok diff --git a/doc/walkthrough/unused_data.mdwn b/doc/walkthrough/unused_data.mdwn index e142b576c0..bd6c398710 100644 --- a/doc/walkthrough/unused_data.mdwn +++ b/doc/walkthrough/unused_data.mdwn @@ -1,8 +1,8 @@ -It's possible for data to accumulate in the annex that no files point to -anymore. One way it can happen is if you `git rm` a file without -first calling `git annex drop`. And, when you modify an annexed file, the old -content of the file remains in the annex. Another way is when migrating -between key-value [[backends|backend]]. +It's possible for data to accumulate in the annex that no files in any +branch point to anymore. One way it can happen is if you `git rm` a file +without first calling `git annex drop`. And, when you modify an annexed +file, the old content of the file remains in the annex. Another way is when +migrating between key-value [[backends|backend]]. This might be historical data you want to preserve, so git-annex defaults to preserving it. So from time to time, you may want to check for such data and @@ -12,8 +12,8 @@ eliminate it to save space. unused . (checking for unused data...) Some annexed data is no longer used by any files in the repository. NUMBER KEY - 1 WORM-s3-m1289672605--file - 2 WORM-s14-m1289672605--file + 1 SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e + 2 SHA1-s14--f1358ec1873d57350e3dc62054dc232bc93c2bd1 (To see where data was previously used, try: git log --stat -S'KEY') (To remove unwanted data: git-annex dropunused NUMBER) ok diff --git a/doc/walkthrough/using_ssh_remotes.mdwn b/doc/walkthrough/using_ssh_remotes.mdwn index fbbbbe0701..60011a200b 100644 --- a/doc/walkthrough/using_ssh_remotes.mdwn +++ b/doc/walkthrough/using_ssh_remotes.mdwn @@ -13,7 +13,7 @@ Now you can get files and they will be transferred (using `rsync` via `ssh`): # git annex get my_cool_big_file get my_cool_big_file (getting UUID for origin...) (from origin...) - WORM-s2159-m1285650548--my_cool_big_file 100% 2159 2.1KB/s 00:00 + SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e 100% 2159 2.1KB/s 00:00 ok When you drop files, git-annex will ssh over to the remote and make