update and bug closures for v2 layout

This commit is contained in:
Joey Hess 2011-03-16 00:08:02 -04:00
parent dd5448eb07
commit 09a7689bc3
4 changed files with 25 additions and 3 deletions

7
debian/changelog vendored
View file

@ -1,6 +1,13 @@
git-annex (0.24) UNRELEASED; urgency=low git-annex (0.24) UNRELEASED; urgency=low
* Reorganized annexed object store. annex.version=2 * Reorganized annexed object store. annex.version=2
* Colons are now avoided in filenames, so bare clones of git repos
can be put on USB thumb drives formatted with vFAT or similar
filesystems.
* Added two levels of hashing to object directory and .git-annex logs,
to improve scalability with enormous numbers of annexed
objects. (With one hundred million annexed objects, each
directory would contain fewer than 1024 files.)
* The setkey, fromkey, and dropkey subcommands have changed how * The setkey, fromkey, and dropkey subcommands have changed how
the key is specified. --backend is no longer used with these. the key is specified. --backend is no longer used with these.
* Add Suggests on graphviz. Closes: #618039 * Add Suggests on graphviz. Closes: #618039

View file

@ -10,3 +10,6 @@ be VFAT formatted:
[[!tag wishlist]] [[!tag wishlist]]
[[Done]]; in annex.version 2 repos, colons are entirely avoided in
filenames. So a bare git clone can be put on VFAT, and git-annex
used to move stuff --to and --from it, for sneakernet.

View file

@ -17,3 +17,11 @@ or anything in between to a paranoid
Also the use of a colon specifically breaks FAT32 ([[bugs/fat_support]]), must it be a colon or could an extra directory be used? i.e. `.git/annex/objects/SHA1/*/...` Also the use of a colon specifically breaks FAT32 ([[bugs/fat_support]]), must it be a colon or could an extra directory be used? i.e. `.git/annex/objects/SHA1/*/...`
`git annex init` could also create all but the last level directory on initialization. I'm thinking `SHA1/1/1, SHA1/1/2, ..., SHA256/f/f, ..., URL/f/f, ..., WORM/f/f` `git annex init` could also create all but the last level directory on initialization. I'm thinking `SHA1/1/1, SHA1/1/2, ..., SHA256/f/f, ..., URL/f/f, ..., WORM/f/f`
> This is done now with a 2-level hash. It also hashes .git-annex/ log
> files which were the worse problem really. Scales to hundreds of millions
> of files with each dir having 1024 or fewer contents. Example:
>
> `me -> .git/annex/objects/71/9t/WORM-s3-m1300247299--me/WORM-s3-m1300247299--me`
>
> --[[Joey]]

View file

@ -2,12 +2,15 @@ In the world of git, we're not scared about internal implementation
details, and sometimes we like to dive in and tweak things by hand. Here's details, and sometimes we like to dive in and tweak things by hand. Here's
some documentation to that end. some documentation to that end.
## `.git/annex/objects/*/*` ## `.git/annex/objects/aa/bb/*/*`
This is where locally available file contents are actually stored. This is where locally available file contents are actually stored.
Files added to the annex get a symlink checked into git that points Files added to the annex get a symlink checked into git that points
to the file content. to the file content.
First there are two levels of directories used for hashing, to prevent
too many things ending up in any one directory.
Each subdirectory has the name of a key in one of the Each subdirectory has the name of a key in one of the
[[key-value_backends|backends]]. The file inside also has the name of the key. [[key-value_backends|backends]]. The file inside also has the name of the key.
This two-level structure is used because it allows the write bit to be removed This two-level structure is used because it allows the write bit to be removed
@ -41,10 +44,11 @@ Example:
e605dca6-446a-11e0-8b2a-002170d25c55 1 e605dca6-446a-11e0-8b2a-002170d25c55 1
26339d22-446b-11e0-9101-002170d25c55 ? 26339d22-446b-11e0-9101-002170d25c55 ?
## `.git-annex/*.log` ## `.git-annex/aa/bb/*.log`
The remainder of the log files record [[location_tracking]] information The remainder of the log files record [[location_tracking]] information
for file contents. The name of the key is the filename, and the content for file contents. Again these are placed in two levels of subdirectories
for hashing. The name of the key is the filename, and the content
consists of a timestamp, either 1 (present) or 0 (not present), and consists of a timestamp, either 1 (present) or 0 (not present), and
the UUID of the repository that has or lacks the file content. the UUID of the repository that has or lacks the file content.