update and bug closures for v2 layout

This commit is contained in:
Joey Hess 2011-03-16 00:08:02 -04:00
parent dd5448eb07
commit 09a7689bc3
4 changed files with 25 additions and 3 deletions

7
debian/changelog vendored
View file

@ -1,6 +1,13 @@
git-annex (0.24) UNRELEASED; urgency=low
* Reorganized annexed object store. annex.version=2
* Colons are now avoided in filenames, so bare clones of git repos
can be put on USB thumb drives formatted with vFAT or similar
filesystems.
* Added two levels of hashing to object directory and .git-annex logs,
to improve scalability with enormous numbers of annexed
objects. (With one hundred million annexed objects, each
directory would contain fewer than 1024 files.)
* The setkey, fromkey, and dropkey subcommands have changed how
the key is specified. --backend is no longer used with these.
* Add Suggests on graphviz. Closes: #618039

View file

@ -10,3 +10,6 @@ be VFAT formatted:
[[!tag wishlist]]
[[Done]]; in annex.version 2 repos, colons are entirely avoided in
filenames. So a bare git clone can be put on VFAT, and git-annex
used to move stuff --to and --from it, for sneakernet.

View file

@ -17,3 +17,11 @@ or anything in between to a paranoid
Also the use of a colon specifically breaks FAT32 ([[bugs/fat_support]]), must it be a colon or could an extra directory be used? i.e. `.git/annex/objects/SHA1/*/...`
`git annex init` could also create all but the last level directory on initialization. I'm thinking `SHA1/1/1, SHA1/1/2, ..., SHA256/f/f, ..., URL/f/f, ..., WORM/f/f`
> This is done now with a 2-level hash. It also hashes .git-annex/ log
> files which were the worse problem really. Scales to hundreds of millions
> of files with each dir having 1024 or fewer contents. Example:
>
> `me -> .git/annex/objects/71/9t/WORM-s3-m1300247299--me/WORM-s3-m1300247299--me`
>
> --[[Joey]]

View file

@ -2,12 +2,15 @@ In the world of git, we're not scared about internal implementation
details, and sometimes we like to dive in and tweak things by hand. Here's
some documentation to that end.
## `.git/annex/objects/*/*`
## `.git/annex/objects/aa/bb/*/*`
This is where locally available file contents are actually stored.
Files added to the annex get a symlink checked into git that points
to the file content.
First there are two levels of directories used for hashing, to prevent
too many things ending up in any one directory.
Each subdirectory has the name of a key in one of the
[[key-value_backends|backends]]. The file inside also has the name of the key.
This two-level structure is used because it allows the write bit to be removed
@ -41,10 +44,11 @@ Example:
e605dca6-446a-11e0-8b2a-002170d25c55 1
26339d22-446b-11e0-9101-002170d25c55 ?
## `.git-annex/*.log`
## `.git-annex/aa/bb/*.log`
The remainder of the log files record [[location_tracking]] information
for file contents. The name of the key is the filename, and the content
for file contents. Again these are placed in two levels of subdirectories
for hashing. The name of the key is the filename, and the content
consists of a timestamp, either 1 (present) or 0 (not present), and
the UUID of the repository that has or lacks the file content.