git-annex/doc/walkthrough/unused_data.mdwn
Joey Hess ef3457196a use SHA256 by default
To get old behavior, add a .gitattributes containing: * annex.backend=WORM

I feel that SHA256 is a better default for most people, as long as their
systems are fast enough that checksumming their files isn't a problem.
git-annex should default to preserving the integrity of data as well as git
does. Checksum backends also work better with editing files via
unlock/lock.

I considered just using SHA1, but since that hash is believed to be somewhat
near to being broken, and git-annex deals with large files which would be a
perfect exploit medium, I decided to go to a SHA-2 hash.

SHA512 is annoyingly long when displayed, and git-annex displays it in a
few places (and notably it is shown in ls -l), so I picked the shorter
hash. Considered SHA224 as it's even shorter, but feel it's a bit weird.

I expect git-annex will use SHA-3 at some point in the future, but
probably not soon!

Note that systems without a sha256sum (or sha256) program will fall back to
defaulting to SHA1.
2011-11-04 15:51:01 -04:00

30 lines
1.3 KiB
Markdown

It's possible for data to accumulate in the annex that no files in any
branch point to anymore. One way it can happen is if you `git rm` a file
without first calling `git annex drop`. And, when you modify an annexed
file, the old content of the file remains in the annex. Another way is when
migrating between key-value [[backends|backend]].
This might be historical data you want to preserve, so git-annex defaults to
preserving it. So from time to time, you may want to check for such data and
eliminate it to save space.
# git annex unused
unused . (checking for unused data...)
Some annexed data is no longer used by any files in the repository.
NUMBER KEY
1 SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e
2 SHA1-s14--f1358ec1873d57350e3dc62054dc232bc93c2bd1
(To see where data was previously used, try: git log --stat -S'KEY')
(To remove unwanted data: git-annex dropunused NUMBER)
ok
After running `git annex unused`, you can follow the instructions to examine
the history of files that used the data, and if you decide you don't need that
data anymore, you can easily remove it:
# git annex dropunused 1
dropunused 1 ok
Hint: To drop a lot of unused data, use a command like this:
# git annex dropunused `seq 1 1000`