309 lines
10 KiB
Markdown
309 lines
10 KiB
Markdown
In the world of git, we're not scared about internal implementation
|
|
details, and sometimes we like to dive in and tweak things by hand. Here's
|
|
some documentation to that end.
|
|
|
|
## The .git/ directory
|
|
|
|
### `.git/annex/objects/aa/bb/*/*`
|
|
|
|
This is where locally available file contents are actually stored.
|
|
Files added to the annex get a symlink or pointer file checked into git,
|
|
that points to the file content.
|
|
|
|
First there are two levels of directories used for hashing, to prevent
|
|
too many things ending up in any one directory.
|
|
See [[hashing]] for details.
|
|
|
|
Each subdirectory has the [[name_of_a_key|key_format]] in one of the
|
|
[[key-value_backends|backends]]. The file inside also has the name of the key.
|
|
This two-level structure is used because it allows the write bit to be removed
|
|
from the subdirectories as well as from the files. That prevents accidentally
|
|
deleting or changing the file contents. See [[lockdown]] for details.
|
|
|
|
In [[direct_mode]], file contents are not stored in here, and instead
|
|
are stored directly in the file. However, the same symlinks are still
|
|
committed to git, internally.
|
|
|
|
Also in [[direct_mode]], some additional data is stored in these directories.
|
|
`.cache` files contain cached file stats used in detecting when a file has
|
|
changed, and `.map` files contain a list of file(s) in the work directory
|
|
that contain the key.
|
|
|
|
### `.git/annex/tmp/`
|
|
|
|
This directory contains partially transferred objects.
|
|
|
|
### `.git/annex/misctmp/`
|
|
|
|
This is a temp directory for miscellaneous other temp files.
|
|
|
|
While .git/annex/objects and .git/annex/tmp can be put on different
|
|
filesystems if desired, .git/annex/misctmp
|
|
has to be on the same filesystem as the work tree and git repository.
|
|
|
|
### `.git/annex/bad/`
|
|
|
|
git-annex fsck puts any bad objects it finds in here.
|
|
|
|
### `.git/annex/transfers/`
|
|
|
|
Contains information files for uploads and downloads that are in progress,
|
|
as well as any that have failed. Used especially by the assistant.
|
|
It is safe to delete these files.
|
|
|
|
### `.git/annex/ssh/`
|
|
|
|
ssh connection caching files are written in here. It is safe to delete
|
|
these files.
|
|
|
|
### `.git/annex/index`
|
|
|
|
This is a git index file which git-annex uses to stage files
|
|
when preparing commits to the git-annex branch.
|
|
|
|
It's pretty safe to delete this file if git-annex is not currently running.
|
|
It will be re-created as necessary.
|
|
|
|
### `.git/annex/journal/`
|
|
|
|
git-annex uses this to journal changes to the git-annex branch,
|
|
before committing a set of changes.
|
|
|
|
## The git-annex branch
|
|
|
|
This branch is managed by git-annex, with the contents listed below.
|
|
|
|
This branch is not connected to your master, etc branches. It it used for
|
|
internal tracking of information about git-annex repositories and annexed
|
|
objects.
|
|
|
|
The files stored in this branch are all designed to be auto-merged
|
|
using git's [[union merge driver|git-union-merge]]. So each line
|
|
has a timestamp, to allow the most recent information to be identified.
|
|
|
|
### `uuid.log`
|
|
|
|
Records the UUIDs of known repositories, and associates them with a
|
|
description of the repository. This allows git-annex to display something
|
|
more useful than a UUID when it refers to a repository that does not have
|
|
a configured git remote pointing at it.
|
|
|
|
The file format is simply one line per repository, with the uuid followed by a
|
|
space and then the description, followed by a timestamp. Example:
|
|
|
|
e605dca6-446a-11e0-8b2a-002170d25c55 laptop timestamp=1317929189.157237s
|
|
26339d22-446b-11e0-9101-002170d25c55 usb disk timestamp=1317929330.769997s
|
|
|
|
## `numcopies.log`
|
|
|
|
Records the global numcopies setting.
|
|
|
|
The file format is simply a timestamp followed by a number.
|
|
|
|
## `config.log`
|
|
|
|
Records global configuration settings, which can be overridden by values
|
|
in `.git/config`.
|
|
|
|
The file format is a timestamp, followed by the name of the configuration,
|
|
followed by the value. For example:
|
|
|
|
1317929189.157237s annex.autocommit false
|
|
|
|
## `remote.log`
|
|
|
|
Holds persistent configuration settings for [[special_remotes]] such as
|
|
Amazon S3.
|
|
|
|
The file format is one line per remote, starting with the uuid of the
|
|
remote, followed by a space, and then a series of var=value pairs,
|
|
each separated by whitespace, and finally a timestamp.
|
|
|
|
Special remotes that are autoenabled have autoenable=true here.
|
|
|
|
Encrypted special remotes store their encryption key here,
|
|
in the "cipher" value. It is base64 encoded, and unless shared [[encryption]]
|
|
is used, is encrypted to one or more gpg keys. The first 256 bytes of
|
|
the cipher is used as the HMAC SHA1 encryption key, to encrypt filenames
|
|
stored on the special remote. The remainder of the cipher is used as a gpg
|
|
symmetric encryption key, to encrypt the content of files stored on the special
|
|
remote.
|
|
|
|
## `trust.log`
|
|
|
|
Records the [[trust]] information for repositories. Does not exist unless
|
|
[[trust]] values are configured.
|
|
|
|
The file format is one line per repository, with the uuid followed by a
|
|
space, and then either `1` (trusted), `0` (untrusted), `?` (semi-trusted),
|
|
`X` (dead) and finally a timestamp.
|
|
|
|
Example:
|
|
|
|
e605dca6-446a-11e0-8b2a-002170d25c55 1 timestamp=1317929189.157237s
|
|
26339d22-446b-11e0-9101-002170d25c55 ? timestamp=1317929330.769997s
|
|
|
|
Repositories not listed are semi-trusted.
|
|
|
|
## `group.log`
|
|
|
|
Used to group repositories together.
|
|
|
|
The file format is one line per repository, with the uuid followed by a space,
|
|
and then a space-separated list of groups this repository is part of,
|
|
and finally a timestamp.
|
|
|
|
## `preferred-content.log`
|
|
|
|
Used to indicate which repositories prefer to contain which file contents.
|
|
|
|
The file format is one line per repository, with the uuid followed by a space,
|
|
then a boolean expression, and finally a timestamp.
|
|
|
|
Files matching the expression are preferred to be retained in the
|
|
repository, while files not matching it are preferred to be stored
|
|
somewhere else.
|
|
|
|
## `required-content.log`
|
|
|
|
Used to indicate which repositories are required to contain which file
|
|
contents.
|
|
|
|
File format is identical to preferred-content.log.
|
|
|
|
## `group-preferred-content.log`
|
|
|
|
Contains standard preferred content settings for groups. (Overriding or
|
|
supplementing the ones built into git-annex.)
|
|
|
|
The file format is one line per group, starting with a timestamp, then a
|
|
space, then the group name followed by a space and then the preferred
|
|
content expression.
|
|
|
|
## `export.log`
|
|
|
|
Tracks what trees have been exported to special remotes by
|
|
[[git-annex-export]](1).
|
|
|
|
Each line starts with a timestamp, then the uuid of the special remote,
|
|
followed by the sha1 of the tree that was exported to that special remote.
|
|
|
|
(The exported tree is also grafted into the git-annex branch, at
|
|
`export.tree`, to prevent git from garbage collecting it. However, the head
|
|
of the git-annex branch should never contain such a grafted in tree;
|
|
the grafted tree is removed in the same commit that updates `export.log`.)
|
|
|
|
## `aaa/bbb/*.log`
|
|
|
|
These log files record [[location_tracking]] information
|
|
for file contents. These are placed in two levels of subdirectories
|
|
for hashing. See [[hashing]] for details.
|
|
|
|
The name of the key is the filename, and the content
|
|
consists of a timestamp, either 1 (present) or 0 (not present) or X (dead),
|
|
and the UUID of the repository that has or lacks the file content.
|
|
|
|
Example:
|
|
|
|
1287290776.765152s 1 e605dca6-446a-11e0-8b2a-002170d25c55
|
|
1287290767.478634s 0 26339d22-446b-11e0-9101-002170d25c55
|
|
|
|
## `aaa/bbb/*.log.web`
|
|
|
|
These log files record urls used by the
|
|
[[web_special_remote|special_remotes/web]] and sometimes by other remotes.
|
|
Their format is similar to the location tracking files, but with urls
|
|
rather than UUIDs.
|
|
|
|
## `aaa/bbb/*.log.rmt`
|
|
|
|
These log files are used by remotes that need to record their own state
|
|
about keys. Each remote can store one line of data about a key, in
|
|
its own format.
|
|
|
|
Example:
|
|
|
|
1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55 blah blah
|
|
1287290767.478634s 26339d22-446b-11e0-9101-002170d25c55 foo=bar
|
|
|
|
## `aaa/bbb/*.log.met`
|
|
|
|
These log files are used to store arbitrary [[design/metadata]] about keys.
|
|
Each key can have any number of metadata fields. Each field has a set of
|
|
values.
|
|
|
|
Lines are timestamped, and record when values are added (`field +value`),
|
|
but also when values are removed (`field -value`). Removed values
|
|
are retained in the log so that when merging an old line that sets a value
|
|
that was later unset, the value is not accidentally added back.
|
|
|
|
For example:
|
|
|
|
1287290776.765152s tag +foo +bar author +joey
|
|
1291237510.141453s tag -bar +baz
|
|
|
|
The value can be completely arbitrary data, although it's typically
|
|
reasonably short. If the value contains any whitespace
|
|
(including \r or \n), it will be base64 encoded. Base64 encoded values
|
|
are indicated by prefixing them with "!".
|
|
|
|
## `aaa/bbb/*.log.cnk`
|
|
|
|
These log files are used when objects are stored in chunked form on
|
|
remotes. They record the size(s) of the chunks, and the number of chunks.
|
|
|
|
For example, this logs that a remote has an object stored using both
|
|
9 chunks of 1 mb size, and 1 chunk of 10 mb size.
|
|
|
|
1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55:10240 9
|
|
1287290776.765153s e605dca6-446a-11e0-8b2a-002170d25c55:102400 1
|
|
|
|
(When those chunks are removed from the remote, the 9 is changed to 0.)
|
|
|
|
## `schedule.log`
|
|
|
|
Used to record scheduled events, such as periodic fscks.
|
|
|
|
The file format is simply one line per repository, with the uuid followed by a
|
|
space and then its schedule, followed by a timestamp.
|
|
|
|
There can be multiple events in the schedule, separated by "; ".
|
|
|
|
The format of the scheduled events is the same described in
|
|
[[git-annex-schedule]].
|
|
|
|
Example:
|
|
|
|
42bf2035-0636-461d-a367-49e9dfd361dd fsck self 30m every day at any time; fsck 4b3ebc86-0faf-4892-83c5-ce00cbe30f0a 1h every year at any time timestamp=1385646997.053162s
|
|
|
|
## `activity.log`
|
|
|
|
Used to record the times of activities, such as fscks.
|
|
|
|
Example:
|
|
|
|
42bf2035-0636-461d-a367-49e9dfd361dd Fsck timestamp=1422387398.30395s
|
|
|
|
## `transitions.log`
|
|
|
|
Used to record transitions, eg by `git annex forget`
|
|
|
|
Each line of the file is a transition, followed by a timestamp.
|
|
|
|
Example:
|
|
|
|
ForgetGitHistory 1387325539.685136s
|
|
ForgetDeadRemotes 1387325539.685136s
|
|
|
|
## `difference.log`
|
|
|
|
Used when a repository has fundamental differences from other repositories,
|
|
that should prevent merging.
|
|
|
|
Example:
|
|
|
|
e605dca6-446a-11e0-8b2a-002170d25c55 [ObjectHashLower] timestamp=1422387398.30395s
|
|
|
|
## `multicast.log`
|
|
|
|
Records uftp public key fingerprints, for use by [[git-annex-multicast]].
|