git-annex/doc/internals/hashing.mdwn
2013-03-31 20:20:41 -04:00

30 lines
1.2 KiB
Markdown

In both the .git/annex directory and the git-annex branch, two levels of
hash directories are used, to avoid issues with too many files in one
directory.
Two separate hash methods are used. One, the old hash format, is only used
for non-bare git repositories. The other, the new hash format, is used for
bare git repositories, the git-annex branch, and on special remotes as
well.
## new hash format
This uses two directories, each with a three-letter name, such as "f87/4d5"
The directory names come from the md5sum of the [[key|key_format]].
For example:
echo -n "SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" | md5sum
## old hash format
This uses two directories, each with a two-letter name, such as "pX/1J"
It takes the md5sum of the key, but rather than a string, represents it as 4
32bit words. Only the first word is used. It is converted into a string by the
same mechanism that would be used to encode a normal md5sum value into a
string, but where that would normally encode the bits using the 16 characters
0-9a-f, this instead uses the 32 characters "0123456789zqjxkmvwgpfZQJXKMVWGPF".
The first 2 letters of the resulting string are the first directory, and the
second 2 are the second directory.