aa56d433d5
Not used yet. (Or tested.) I did consider making the log start with the uuid of the node, followed by the cluster uuid (or uuids). That would perhaps mean a smaller write to the git-annex branch when adding a node, but overall the log file would be larger, and it will be read and cached near to startup on most git-annex runs.
400 lines
14 KiB
Markdown
400 lines
14 KiB
Markdown
In the world of git, we're not scared about internal implementation
|
|
details, and sometimes we like to dive in and tweak things by hand. Here's
|
|
some documentation to that end.
|
|
|
|
[[!toc ]]
|
|
|
|
## The .git/ directory
|
|
|
|
### `.git/annex/objects/aa/bb/*/*`
|
|
|
|
This is where locally available file contents are actually stored.
|
|
Files added to the annex get a symlink or [[pointer_file]] checked into git,
|
|
that points to the file content.
|
|
|
|
First there are two levels of directories used for hashing, to prevent
|
|
too many things ending up in any one directory.
|
|
See [[hashing]] for details.
|
|
|
|
Each subdirectory has the [[name_of_a_key|key_format]] in one of the
|
|
[[key-value_backends|backends]]. The file inside also has the name of the key.
|
|
This two-level structure is used because it allows the write bit to be removed
|
|
from the subdirectories as well as from the files. That prevents accidentally
|
|
deleting or changing the file contents. See [[lockdown]] for details.
|
|
|
|
### `.git/annex/tmp/`
|
|
|
|
This directory contains partially transferred objects.
|
|
|
|
### `.git/annex/othertmp/`
|
|
|
|
This is a temp directory for miscellaneous other temp files.
|
|
|
|
### `.git/annex/bad/`
|
|
|
|
git-annex fsck puts any bad objects it finds in here.
|
|
|
|
### `.git/annex/transfers/`
|
|
|
|
Contains information files for uploads and downloads that are in progress,
|
|
as well as any that have failed. Used especially by the assistant.
|
|
It is safe to delete these files.
|
|
|
|
### `.git/annex/ssh/`
|
|
|
|
ssh connection caching files are written in here. It is safe to delete
|
|
these files.
|
|
|
|
### `.git/annex/index`
|
|
|
|
This is a git index file which git-annex uses to stage files
|
|
when preparing commits to the git-annex branch.
|
|
|
|
It's pretty safe to delete this file if git-annex is not currently running.
|
|
It will be re-created as necessary.
|
|
|
|
### `.git/annex/journal/`
|
|
|
|
git-annex uses this to journal changes to the git-annex branch,
|
|
before committing a set of changes.
|
|
|
|
<a name="The_git-annex_branch"></a>
|
|
## The git-annex branch
|
|
|
|
This branch is managed by git-annex, with the contents listed below.
|
|
|
|
This branch is not connected to your master, etc branches. It it used for
|
|
internal tracking of information about git-annex repositories and annexed
|
|
objects.
|
|
|
|
The files stored in this branch are all designed to be auto-merged
|
|
using git's [[union merge driver|git-union-merge]]. So each line
|
|
has a timestamp, to allow the most recent information to be identified.
|
|
|
|
### `uuid.log`
|
|
|
|
Records the UUIDs of known repositories, and associates them with a
|
|
description of the repository. This allows git-annex to display something
|
|
more useful than a UUID when it refers to a repository that does not have
|
|
a configured git remote pointing at it.
|
|
|
|
The file format is simply one line per repository, with the uuid followed by a
|
|
space and then the description, followed by a timestamp. Example:
|
|
|
|
e605dca6-446a-11e0-8b2a-002170d25c55 laptop timestamp=1317929189.157237s
|
|
26339d22-446b-11e0-9101-002170d25c55 usb disk timestamp=1317929330.769997s
|
|
|
|
## `numcopies.log`
|
|
|
|
Records the global numcopies setting.
|
|
|
|
The file format is simply a timestamp followed by a number.
|
|
|
|
## `mincopies.log`
|
|
|
|
Records the global mincopies setting.
|
|
|
|
The file format is simply a timestamp followed by a number.
|
|
|
|
## `config.log`
|
|
|
|
Records global configuration settings, which can be overridden by values
|
|
in `.git/config`.
|
|
|
|
The file format is a timestamp, followed by the name of the configuration,
|
|
followed by the value. For example:
|
|
|
|
1317929189.157237s annex.autocommit false
|
|
|
|
## `remote.log`
|
|
|
|
Holds persistent configuration settings for [[special_remotes]] such as
|
|
Amazon S3.
|
|
|
|
The file format is one line per remote, starting with the uuid of the
|
|
remote, followed by a space, and then a series of var=value pairs,
|
|
each separated by whitespace, and finally a timestamp.
|
|
|
|
Special remotes that are autoenabled have autoenable=true here.
|
|
|
|
Encrypted special remotes store their encryption key here,
|
|
in the "cipher" value. It is base64 encoded, and unless shared [[encryption]]
|
|
is used, is encrypted to one or more gpg keys. The first 256 bytes of
|
|
the cipher is used as the HMAC SHA1 encryption key, to encrypt filenames
|
|
stored on the special remote. The remainder of the cipher is used as a gpg
|
|
symmetric encryption key, to encrypt the content of files stored on the special
|
|
remote.
|
|
|
|
## `trust.log`
|
|
|
|
Records the [[trust]] information for repositories. Does not exist unless
|
|
[[trust]] values are configured.
|
|
|
|
The file format is one line per repository, with the uuid followed by a
|
|
space, and then either `1` (trusted), `0` (untrusted), `?` (semi-trusted),
|
|
`X` (dead) and finally a timestamp.
|
|
|
|
Example:
|
|
|
|
e605dca6-446a-11e0-8b2a-002170d25c55 1 timestamp=1317929189.157237s
|
|
26339d22-446b-11e0-9101-002170d25c55 ? timestamp=1317929330.769997s
|
|
|
|
Repositories not listed are semi-trusted.
|
|
|
|
## `group.log`
|
|
|
|
Used to group repositories together.
|
|
|
|
The file format is one line per repository, with the uuid followed by a space,
|
|
and then a space-separated list of groups this repository is part of,
|
|
and finally a timestamp.
|
|
|
|
## `preferred-content.log`
|
|
|
|
Used to indicate which repositories prefer to contain which file contents.
|
|
|
|
The file format is one line per repository, with the uuid followed by a space,
|
|
then a boolean expression, and finally a timestamp.
|
|
|
|
Files matching the expression are preferred to be retained in the
|
|
repository, while files not matching it are preferred to be stored
|
|
somewhere else.
|
|
|
|
## `required-content.log`
|
|
|
|
Used to indicate which repositories are required to contain which file
|
|
contents.
|
|
|
|
File format is identical to preferred-content.log.
|
|
|
|
## `group-preferred-content.log`
|
|
|
|
Contains standard preferred content settings for groups. (Overriding or
|
|
supplementing the ones built into git-annex.)
|
|
|
|
The file format is one line per group, starting with a timestamp, then a
|
|
space, then the group name followed by a space and then the preferred
|
|
content expression.
|
|
|
|
## `export.log`
|
|
|
|
Tracks what trees have been exported to special remotes by
|
|
[[git-annex-export]](1).
|
|
|
|
Each line starts with a timestamp, then the uuid of the repository
|
|
that exported to the special remote, followed by a colon (`:`) and
|
|
the uuid of the special remote. Then, separated by a spaces,
|
|
the SHA of the tree that was exported, and optionally any number of
|
|
subsequent SHAs, of trees that have started to be exported but whose
|
|
export is not yet complete.
|
|
|
|
In order to record the beginning of the first export, where nothing
|
|
has been exported yet, the SHA of the exported tree can be
|
|
the empty tree (eg 4b825dc642cb6eb9a060e54bf8d69288fbee4904).
|
|
|
|
For example:
|
|
|
|
1317929100.012345s e605dca6-446a-11e0-8b2a-002170d25c55:26339d22-446b-11e0-9101-002170d25c55 4b825dc642cb6eb9a060e54bf8d69288fbee4904 bb08b1abd207aeecccbc7060e523b011d80cb35b
|
|
1317929100.012345s e605dca6-446a-11e0-8b2a-002170d25c55:26339d22-446b-11e0-9101-002170d25c55 bb08b1abd207aeecccbc7060e523b011d80cb35b
|
|
1317929189.157237s e605dca6-446a-11e0-8b2a-002170d25c55:26339d22-446b-11e0-9101-002170d25c55 bb08b1abd207aeecccbc7060e523b011d80cb35b 7c7af825782b7c8706039b855c72709993542be4
|
|
1317923000.251111s e605dca6-446a-11e0-8b2a-002170d25c55:26339d22-446b-11e0-9101-002170d25c55 7c7af825782b7c8706039b855c72709993542be4
|
|
|
|
(The trees are also grafted into the git-annex branch, at
|
|
`export.tree`, to prevent git from garbage collecting it. However, the head
|
|
of the git-annex branch should never contain such a grafted in tree;
|
|
the grafted tree is removed in the same commit that updates `export.log`.)
|
|
|
|
## `aaa/bbb/*.log`
|
|
|
|
These log files record [[location_tracking]] information
|
|
for file contents. These are placed in two levels of subdirectories
|
|
for hashing. See [[hashing]] for details.
|
|
|
|
The name of the key is the filename, and the content
|
|
consists of a timestamp, either 1 (present) or 0 (not present) or X (dead),
|
|
and the UUID of the repository that has or lacks the file content.
|
|
|
|
Example:
|
|
|
|
1287290776.765152s 1 e605dca6-446a-11e0-8b2a-002170d25c55
|
|
1287290767.478634s 0 26339d22-446b-11e0-9101-002170d25c55
|
|
|
|
## `aaa/bbb/*.log.web`
|
|
|
|
These log files record urls used by the
|
|
[[web_special_remote|special_remotes/web]] and sometimes by other remotes.
|
|
Their format is similar to the location tracking files, but with urls
|
|
rather than UUIDs.
|
|
|
|
## `aaa/bbb/*.log.ek`
|
|
|
|
These log files record other keys that are equivilant to the key
|
|
used in the filename. This is currently used for the `VURL` backend.
|
|
Their format is similar to the location tracking files, but with keys
|
|
rather than UUIDs.
|
|
|
|
## `aaa/bbb/*.log.rmt`
|
|
|
|
These log files are used by remotes that need to record their own state
|
|
about keys. Each remote can store one line of data about a key, in
|
|
its own format.
|
|
|
|
Note that only the most recently set state about a key is seen
|
|
by remotes using this. The `log.rmet` documented below does not have this
|
|
limitation.
|
|
|
|
Example:
|
|
|
|
1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55 blah blah
|
|
1287290767.478634s 26339d22-446b-11e0-9101-002170d25c55 foo=bar
|
|
|
|
## `aaa/bbb/*.log.met`
|
|
|
|
These log files are used to store arbitrary [[design/metadata]] about keys.
|
|
Each key can have any number of metadata fields. Each field has a set of
|
|
values.
|
|
|
|
Lines are timestamped, and record when values are added (`field +value`),
|
|
but also when values are removed (`field -value`). Removed values
|
|
are retained in the log so that when merging an old line that sets a value
|
|
that was later unset, the value is not accidentally added back.
|
|
|
|
For example:
|
|
|
|
1287290776.765152s tag +foo +bar author +joey
|
|
1291237510.141453s tag -bar +baz
|
|
|
|
The value can be completely arbitrary data, although it's typically
|
|
reasonably short. If the value contains any whitespace
|
|
(including \r or \n), it will be base64 encoded. Base64 encoded values
|
|
are indicated by prefixing them with "!".
|
|
|
|
## `aaa/bbb/*.log.rmet`
|
|
|
|
These log files store per-remote metadata about keys. This metadata
|
|
is only used by the remote.
|
|
|
|
Format is the same as the metadata log files above, but each metadata key
|
|
is prefixed with "uuid:" to indicate the remote it belongs to.
|
|
|
|
For example:
|
|
|
|
1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55:foo +bar
|
|
1287290776.765152s 26339d22-446b-11e0-9101-002170d25c55:x +1
|
|
1291237510.141453s 26339d22-446b-11e0-9101-002170d25c55:x -1 26339d22-446b-11e0-9101-002170d25c55:x +2
|
|
|
|
## `aaa/bbb/*.log.cid`
|
|
|
|
These log files store per-remote content identifiers for keys.
|
|
A given key may have any number of content identifiers.
|
|
|
|
The format is a timestamp, followed by the UUID of the remote,
|
|
followed by the content identifiers which are separated by colons.
|
|
If a content identifier contains a colon or \r or \n, it will be base64
|
|
encoded. Base64 encoded values are indicated by prefixing them with "!".
|
|
|
|
1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55 5248916:5250378
|
|
|
|
## `aaa/bbb/*.log.cnk`
|
|
|
|
These log files are used when objects are stored in chunked form on
|
|
remotes. They record the size(s) of the chunks, and the number of chunks.
|
|
|
|
For example, this logs that a remote has an object stored using both
|
|
9 chunks of 1 mb size, and 1 chunk of 10 mb size.
|
|
|
|
1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55:10240 9
|
|
1287290776.765153s e605dca6-446a-11e0-8b2a-002170d25c55:102400 1
|
|
|
|
(When those chunks are removed from the remote, the 9 is changed to 0.)
|
|
|
|
## `proxy.log`
|
|
|
|
Used to record what repositories are accessible via a proxy.
|
|
|
|
Each line starts with a timestamp, then the UUID of the repository
|
|
that can serve as a proxy, and then a list of the remotes that it can
|
|
proxy to, separated by spaces.
|
|
|
|
Each remote in the list consists of a repository's UUID,
|
|
followed by a colon (`:`) and then a remote name.
|
|
|
|
For example:
|
|
|
|
1317929100.012345s e605dca6-446a-11e0-8b2a-002170d25c55 26339d22-446b-11e0-9101-002170d25c55:foo c076460c-2290-11ef-be53-b7f0d194c863:bar
|
|
|
|
## `cluster.log`
|
|
|
|
Used to record the UUIDs of clusters, and the UUIDs of the nodes
|
|
comprising each cluster.
|
|
|
|
Each line starts with a timestamp, then the UUID the cluster,
|
|
followed by a list of the UUIDs of its nodes, separated by spaces.
|
|
|
|
For example:
|
|
|
|
1317929100.012345s 5b070cc8-29b8-11ef-80e1-0fd524be241b 5c0c97d2-29b8-11ef-b1d2-5f3d1c80940d 5c40375e-29b8-11ef-814d-872959d2c013
|
|
|
|
## `schedule.log`
|
|
|
|
Used to record scheduled events, such as periodic fscks.
|
|
|
|
The file format is simply one line per repository, with the uuid followed by a
|
|
space and then its schedule, followed by a timestamp.
|
|
|
|
There can be multiple events in the schedule, separated by "; ".
|
|
|
|
The format of the scheduled events is the same described in
|
|
[[git-annex-schedule]].
|
|
|
|
Example:
|
|
|
|
42bf2035-0636-461d-a367-49e9dfd361dd fsck self 30m every day at any time; fsck 4b3ebc86-0faf-4892-83c5-ce00cbe30f0a 1h every year at any time timestamp=1385646997.053162s
|
|
|
|
## `activity.log`
|
|
|
|
Used to record the times of activities, such as fscks.
|
|
|
|
Example:
|
|
|
|
42bf2035-0636-461d-a367-49e9dfd361dd Fsck timestamp=1422387398.30395s
|
|
|
|
## `transitions.log`
|
|
|
|
Used to record transitions, eg by `git annex forget`
|
|
|
|
Each line of the file is a transition, followed by a timestamp.
|
|
|
|
Example:
|
|
|
|
ForgetGitHistory 1387325539.685136s
|
|
ForgetDeadRemotes 1387325539.685136s
|
|
|
|
## `difference.log`
|
|
|
|
Used when a repository has fundamental differences from other repositories,
|
|
that should prevent merging.
|
|
|
|
Example:
|
|
|
|
e605dca6-446a-11e0-8b2a-002170d25c55 [ObjectHashLower] timestamp=1422387398.30395s
|
|
|
|
## `multicast.log`
|
|
|
|
Records uftp public key fingerprints, for use by [[git-annex-multicast]].
|
|
|
|
## `migrate.tree/old` and `migrate.tree/new`
|
|
|
|
These are used to record migrations done by `git-annex migrate`. By diffing
|
|
between the two, the old and new keys can be determined. This lets
|
|
migrations be recorded while using a minimum of space in the git
|
|
repository. The filenames in these trees have no connection to the names
|
|
of actual annexed files.
|
|
|
|
These trees are recorded in history of the git-annex branch, but the
|
|
head of the git-annex branch will never contain them.
|
|
|
|
## Other internals documentation
|
|
|
|
* [[git-remote-annex]] documents how git repositories are stored
|
|
on special remotes when using git with "annex::" urls.
|