git-annex/doc/internals.mdwn

88 lines
3.3 KiB
Text
Raw Normal View History

2011-03-02 01:32:28 +00:00
In the world of git, we're not scared about internal implementation
details, and sometimes we like to dive in and tweak things by hand. Here's
some documentation to that end.
2011-03-16 04:08:02 +00:00
## `.git/annex/objects/aa/bb/*/*`
2011-03-02 01:32:28 +00:00
This is where locally available file contents are actually stored.
Files added to the annex get a symlink checked into git that points
to the file content.
2011-03-16 04:08:02 +00:00
First there are two levels of directories used for hashing, to prevent
too many things ending up in any one directory.
2011-03-02 01:38:47 +00:00
Each subdirectory has the name of a key in one of the
[[key-value_backends|backends]]. The file inside also has the name of the key.
This two-level structure is used because it allows the write bit to be removed
from the subdirectories as well as from the files. That prevents accidentially
deleting or changing the file contents.
2011-03-02 01:32:28 +00:00
## The git-annex branch
This branch is managed by git-annex, with the contents listed below.
2011-06-23 16:11:03 +00:00
The file `.git/annex/index` is a separate git index file it uses
to accumlate changes for the git-annex. Also, `.git/annex/journal/` is used
to record changes before they are added to git.
Note that for speed reasons, git-annex assumes only it will modify this
branch. If you go in and make changes directly, it will probably revert
your changes in its next commit to the branch.
The best way to make changes to the git-annex branch is instead
to create a branch of it, with a name like "my/git-annex", and then
use "git annex merge" to automerge your branch into the main git-annex
branch.
### `uuid.log`
2011-03-02 01:32:28 +00:00
Records the UUIDs of known repositories, and associates them with a
description of the repository. This allows git-annex to display something
more useful than a UUID when it refers to a repository that does not have
a configured git remote pointing at it.
The file format is simply one line per repository, with the uuid followed by a
space and then the description through to the end of the line. Example:
e605dca6-446a-11e0-8b2a-002170d25c55 laptop
26339d22-446b-11e0-9101-002170d25c55 usb disk
## `remotes.log`
2011-03-28 06:12:05 +00:00
Holds persistent configuration settings for [[special_remotes]] such as
Amazon S3.
2011-03-28 06:12:05 +00:00
The file format is one line per remote, starting with the uuid of the
2011-03-29 17:49:54 +00:00
remote, followed by a space, and then a series of key=value pairs,
each separated by whitespace.
2011-03-28 06:12:05 +00:00
## `trust.log`
2011-03-02 01:32:28 +00:00
Records the [[trust]] information for repositories. Does not exist unless
[[trust]] values are configured.
The file format is one line per repository, with the uuid followed by a
space, and then either 1 (trusted), 0 (untrusted), or ? (semi-trusted).
Repositories not listed are semi-trusted.
Example:
e605dca6-446a-11e0-8b2a-002170d25c55 1
26339d22-446b-11e0-9101-002170d25c55 ?
## `aaa/bbb/*.log`
2011-03-02 01:32:28 +00:00
The remainder of the log files record [[location_tracking]] information
2011-03-16 04:08:02 +00:00
for file contents. Again these are placed in two levels of subdirectories
for hashing. The name of the key is the filename, and the content
2011-03-02 01:32:28 +00:00
consists of a timestamp, either 1 (present) or 0 (not present), and
the UUID of the repository that has or lacks the file content.
Example:
1287290776.765152s 1 e605dca6-446a-11e0-8b2a-002170d25c55
1287290767.478634s 0 26339d22-446b-11e0-9101-002170d25c55
These files are designed to be auto-merged using git's union merge driver.
The timestamps allow the most recent information to be identified.