document directory hashes
This commit is contained in:
parent
f968a40e04
commit
e0f3d1a3ba
2 changed files with 38 additions and 1 deletions
|
@ -10,6 +10,7 @@ to the file content.
|
||||||
|
|
||||||
First there are two levels of directories used for hashing, to prevent
|
First there are two levels of directories used for hashing, to prevent
|
||||||
too many things ending up in any one directory.
|
too many things ending up in any one directory.
|
||||||
|
See [[hashing]] for details.
|
||||||
|
|
||||||
Each subdirectory has the [[name_of_a_key|key_format]] in one of the
|
Each subdirectory has the [[name_of_a_key|key_format]] in one of the
|
||||||
[[key-value_backends|backends]]. The file inside also has the name of the key.
|
[[key-value_backends|backends]]. The file inside also has the name of the key.
|
||||||
|
@ -107,7 +108,9 @@ somewhere else.
|
||||||
|
|
||||||
These log files record [[location_tracking]] information
|
These log files record [[location_tracking]] information
|
||||||
for file contents. Again these are placed in two levels of subdirectories
|
for file contents. Again these are placed in two levels of subdirectories
|
||||||
for hashing. The name of the key is the filename, and the content
|
for hashing. See [[hashing]] for details.
|
||||||
|
|
||||||
|
The name of the key is the filename, and the content
|
||||||
consists of a timestamp, either 1 (present) or 0 (not present), and
|
consists of a timestamp, either 1 (present) or 0 (not present), and
|
||||||
the UUID of the repository that has or lacks the file content.
|
the UUID of the repository that has or lacks the file content.
|
||||||
|
|
||||||
|
|
34
doc/internals/hashing.mdwn
Normal file
34
doc/internals/hashing.mdwn
Normal file
|
@ -0,0 +1,34 @@
|
||||||
|
In both the .git/annex directory and the git-annex branch, two levels of
|
||||||
|
hash directories are used, to avoid issues with too many files in one
|
||||||
|
directory.
|
||||||
|
|
||||||
|
Two separate hash methods are used. One, the old hash format, is only used
|
||||||
|
for non-bare git repositories. The other, the new hash format, is used for
|
||||||
|
bare git repositories, the git-annex branch, and on special remotes as
|
||||||
|
well.
|
||||||
|
|
||||||
|
## new hash format
|
||||||
|
|
||||||
|
This uses two directories, each with a three-letter name, such as "f87/4d5"
|
||||||
|
|
||||||
|
The directory names come from the md5sum of the [[key|key_format]].
|
||||||
|
|
||||||
|
Note that you cannot use the `md5sum` utility from coreutils to generate
|
||||||
|
the same hash. Why it generates something else is unknown. The md5 hash
|
||||||
|
libraries for programming languages will work though.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
python -c 'import hashlib, sys; print hashlib.md5(sys.argv[1]).hexdigest()'
|
||||||
|
|
||||||
|
## old hash format
|
||||||
|
|
||||||
|
This uses two directories, each with a two-letter name, such as "pX/1J"
|
||||||
|
|
||||||
|
It takes the md5sum of the key, but rather than a string, represents it as 4
|
||||||
|
32bit words. Only the first word is used. It is converted into a string by the
|
||||||
|
same mechanism that would be used to encode a normal md5sum value into a
|
||||||
|
string, but where that would normally encode the bits using the 16 characters
|
||||||
|
0-9a-f, this instead uses the 32 characters "0123456789zqjxkmvwgpfZQJXKMVWGPF".
|
||||||
|
The first 2 letters of the resulting string are the first directory, and the
|
||||||
|
second 2 are the second directory.
|
Loading…
Reference in a new issue