cc89699457
This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.
352 lines
12 KiB
Markdown
352 lines
12 KiB
Markdown
In the world of git, we're not scared about internal implementation
|
|
details, and sometimes we like to dive in and tweak things by hand. Here's
|
|
some documentation to that end.
|
|
|
|
## The .git/ directory
|
|
|
|
### `.git/annex/objects/aa/bb/*/*`
|
|
|
|
This is where locally available file contents are actually stored.
|
|
Files added to the annex get a symlink or pointer file checked into git,
|
|
that points to the file content.
|
|
|
|
First there are two levels of directories used for hashing, to prevent
|
|
too many things ending up in any one directory.
|
|
See [[hashing]] for details.
|
|
|
|
Each subdirectory has the [[name_of_a_key|key_format]] in one of the
|
|
[[key-value_backends|backends]]. The file inside also has the name of the key.
|
|
This two-level structure is used because it allows the write bit to be removed
|
|
from the subdirectories as well as from the files. That prevents accidentally
|
|
deleting or changing the file contents. See [[lockdown]] for details.
|
|
|
|
### `.git/annex/tmp/`
|
|
|
|
This directory contains partially transferred objects.
|
|
|
|
### `.git/annex/othertmp/`
|
|
|
|
This is a temp directory for miscellaneous other temp files.
|
|
|
|
While .git/annex/objects and .git/annex/tmp can be put on different
|
|
filesystems if desired, .git/annex/othertmp
|
|
has to be on the same filesystem as the work tree and git repository.
|
|
|
|
### `.git/annex/bad/`
|
|
|
|
git-annex fsck puts any bad objects it finds in here.
|
|
|
|
### `.git/annex/transfers/`
|
|
|
|
Contains information files for uploads and downloads that are in progress,
|
|
as well as any that have failed. Used especially by the assistant.
|
|
It is safe to delete these files.
|
|
|
|
### `.git/annex/ssh/`
|
|
|
|
ssh connection caching files are written in here. It is safe to delete
|
|
these files.
|
|
|
|
### `.git/annex/index`
|
|
|
|
This is a git index file which git-annex uses to stage files
|
|
when preparing commits to the git-annex branch.
|
|
|
|
It's pretty safe to delete this file if git-annex is not currently running.
|
|
It will be re-created as necessary.
|
|
|
|
### `.git/annex/journal/`
|
|
|
|
git-annex uses this to journal changes to the git-annex branch,
|
|
before committing a set of changes.
|
|
|
|
<a name="The_git-annex_branch"></a>
|
|
## The git-annex branch
|
|
|
|
This branch is managed by git-annex, with the contents listed below.
|
|
|
|
This branch is not connected to your master, etc branches. It it used for
|
|
internal tracking of information about git-annex repositories and annexed
|
|
objects.
|
|
|
|
The files stored in this branch are all designed to be auto-merged
|
|
using git's [[union merge driver|git-union-merge]]. So each line
|
|
has a timestamp, to allow the most recent information to be identified.
|
|
|
|
### `uuid.log`
|
|
|
|
Records the UUIDs of known repositories, and associates them with a
|
|
description of the repository. This allows git-annex to display something
|
|
more useful than a UUID when it refers to a repository that does not have
|
|
a configured git remote pointing at it.
|
|
|
|
The file format is simply one line per repository, with the uuid followed by a
|
|
space and then the description, followed by a timestamp. Example:
|
|
|
|
e605dca6-446a-11e0-8b2a-002170d25c55 laptop timestamp=1317929189.157237s
|
|
26339d22-446b-11e0-9101-002170d25c55 usb disk timestamp=1317929330.769997s
|
|
|
|
## `numcopies.log`
|
|
|
|
Records the global numcopies setting.
|
|
|
|
The file format is simply a timestamp followed by a number.
|
|
|
|
## `mincopies.log`
|
|
|
|
Records the global mincopies setting.
|
|
|
|
The file format is simply a timestamp followed by a number.
|
|
|
|
## `config.log`
|
|
|
|
Records global configuration settings, which can be overridden by values
|
|
in `.git/config`.
|
|
|
|
The file format is a timestamp, followed by the name of the configuration,
|
|
followed by the value. For example:
|
|
|
|
1317929189.157237s annex.autocommit false
|
|
|
|
## `remote.log`
|
|
|
|
Holds persistent configuration settings for [[special_remotes]] such as
|
|
Amazon S3.
|
|
|
|
The file format is one line per remote, starting with the uuid of the
|
|
remote, followed by a space, and then a series of var=value pairs,
|
|
each separated by whitespace, and finally a timestamp.
|
|
|
|
Special remotes that are autoenabled have autoenable=true here.
|
|
|
|
Encrypted special remotes store their encryption key here,
|
|
in the "cipher" value. It is base64 encoded, and unless shared [[encryption]]
|
|
is used, is encrypted to one or more gpg keys. The first 256 bytes of
|
|
the cipher is used as the HMAC SHA1 encryption key, to encrypt filenames
|
|
stored on the special remote. The remainder of the cipher is used as a gpg
|
|
symmetric encryption key, to encrypt the content of files stored on the special
|
|
remote.
|
|
|
|
## `trust.log`
|
|
|
|
Records the [[trust]] information for repositories. Does not exist unless
|
|
[[trust]] values are configured.
|
|
|
|
The file format is one line per repository, with the uuid followed by a
|
|
space, and then either `1` (trusted), `0` (untrusted), `?` (semi-trusted),
|
|
`X` (dead) and finally a timestamp.
|
|
|
|
Example:
|
|
|
|
e605dca6-446a-11e0-8b2a-002170d25c55 1 timestamp=1317929189.157237s
|
|
26339d22-446b-11e0-9101-002170d25c55 ? timestamp=1317929330.769997s
|
|
|
|
Repositories not listed are semi-trusted.
|
|
|
|
## `group.log`
|
|
|
|
Used to group repositories together.
|
|
|
|
The file format is one line per repository, with the uuid followed by a space,
|
|
and then a space-separated list of groups this repository is part of,
|
|
and finally a timestamp.
|
|
|
|
## `preferred-content.log`
|
|
|
|
Used to indicate which repositories prefer to contain which file contents.
|
|
|
|
The file format is one line per repository, with the uuid followed by a space,
|
|
then a boolean expression, and finally a timestamp.
|
|
|
|
Files matching the expression are preferred to be retained in the
|
|
repository, while files not matching it are preferred to be stored
|
|
somewhere else.
|
|
|
|
## `required-content.log`
|
|
|
|
Used to indicate which repositories are required to contain which file
|
|
contents.
|
|
|
|
File format is identical to preferred-content.log.
|
|
|
|
## `group-preferred-content.log`
|
|
|
|
Contains standard preferred content settings for groups. (Overriding or
|
|
supplementing the ones built into git-annex.)
|
|
|
|
The file format is one line per group, starting with a timestamp, then a
|
|
space, then the group name followed by a space and then the preferred
|
|
content expression.
|
|
|
|
## `export.log`
|
|
|
|
Tracks what trees have been exported to special remotes by
|
|
[[git-annex-export]](1).
|
|
|
|
Each line starts with a timestamp, then the uuid of the repository
|
|
that exported to the special remote, followed by a colon (`:`) and
|
|
the uuid of the special remote. Then, separated by a spaces,
|
|
the SHA of the tree that was exported, and optionally any number of
|
|
subsequent SHAs, of trees that have started to be exported but whose
|
|
export is not yet complete.
|
|
|
|
In order to record the beginning of the first export, where nothing
|
|
has been exported yet, the SHA of the exported tree can be
|
|
the empty tree (eg 4b825dc642cb6eb9a060e54bf8d69288fbee4904).
|
|
|
|
For example:
|
|
|
|
1317929100.012345s e605dca6-446a-11e0-8b2a-002170d25c55:26339d22-446b-11e0-9101-002170d25c55 4b825dc642cb6eb9a060e54bf8d69288fbee4904 bb08b1abd207aeecccbc7060e523b011d80cb35b
|
|
1317929100.012345s e605dca6-446a-11e0-8b2a-002170d25c55:26339d22-446b-11e0-9101-002170d25c55 bb08b1abd207aeecccbc7060e523b011d80cb35b
|
|
1317929189.157237s e605dca6-446a-11e0-8b2a-002170d25c55:26339d22-446b-11e0-9101-002170d25c55 bb08b1abd207aeecccbc7060e523b011d80cb35b 7c7af825782b7c8706039b855c72709993542be4
|
|
1317923000.251111s e605dca6-446a-11e0-8b2a-002170d25c55:26339d22-446b-11e0-9101-002170d25c55 7c7af825782b7c8706039b855c72709993542be4
|
|
|
|
(The trees are also grafted into the git-annex branch, at
|
|
`export.tree`, to prevent git from garbage collecting it. However, the head
|
|
of the git-annex branch should never contain such a grafted in tree;
|
|
the grafted tree is removed in the same commit that updates `export.log`.)
|
|
|
|
## `aaa/bbb/*.log`
|
|
|
|
These log files record [[location_tracking]] information
|
|
for file contents. These are placed in two levels of subdirectories
|
|
for hashing. See [[hashing]] for details.
|
|
|
|
The name of the key is the filename, and the content
|
|
consists of a timestamp, either 1 (present) or 0 (not present) or X (dead),
|
|
and the UUID of the repository that has or lacks the file content.
|
|
|
|
Example:
|
|
|
|
1287290776.765152s 1 e605dca6-446a-11e0-8b2a-002170d25c55
|
|
1287290767.478634s 0 26339d22-446b-11e0-9101-002170d25c55
|
|
|
|
## `aaa/bbb/*.log.web`
|
|
|
|
These log files record urls used by the
|
|
[[web_special_remote|special_remotes/web]] and sometimes by other remotes.
|
|
Their format is similar to the location tracking files, but with urls
|
|
rather than UUIDs.
|
|
|
|
## `aaa/bbb/*.log.rmt`
|
|
|
|
These log files are used by remotes that need to record their own state
|
|
about keys. Each remote can store one line of data about a key, in
|
|
its own format.
|
|
|
|
Note that only the most recently set state about a key is seen
|
|
by remotes using this. The `log.rmet` documented below does not have this
|
|
limitation.
|
|
|
|
Example:
|
|
|
|
1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55 blah blah
|
|
1287290767.478634s 26339d22-446b-11e0-9101-002170d25c55 foo=bar
|
|
|
|
## `aaa/bbb/*.log.met`
|
|
|
|
These log files are used to store arbitrary [[design/metadata]] about keys.
|
|
Each key can have any number of metadata fields. Each field has a set of
|
|
values.
|
|
|
|
Lines are timestamped, and record when values are added (`field +value`),
|
|
but also when values are removed (`field -value`). Removed values
|
|
are retained in the log so that when merging an old line that sets a value
|
|
that was later unset, the value is not accidentally added back.
|
|
|
|
For example:
|
|
|
|
1287290776.765152s tag +foo +bar author +joey
|
|
1291237510.141453s tag -bar +baz
|
|
|
|
The value can be completely arbitrary data, although it's typically
|
|
reasonably short. If the value contains any whitespace
|
|
(including \r or \n), it will be base64 encoded. Base64 encoded values
|
|
are indicated by prefixing them with "!".
|
|
|
|
## `aaa/bbb/*.log.rmet`
|
|
|
|
These log files store per-remote metadata about keys. This metadata
|
|
is only used by the remote.
|
|
|
|
Format is the same as the metadata log files above, but each metadata key
|
|
is prefixed with "uuid:" to indicate the remote it belongs to.
|
|
|
|
For example:
|
|
|
|
1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55:foo +bar
|
|
1287290776.765152s 26339d22-446b-11e0-9101-002170d25c55:x +1
|
|
1291237510.141453s 26339d22-446b-11e0-9101-002170d25c55:x -1 26339d22-446b-11e0-9101-002170d25c55:x +2
|
|
|
|
## `aaa/bbb/*.log.cid`
|
|
|
|
These log files store per-remote content identifiers for keys.
|
|
A given key may have any number of content identifiers.
|
|
|
|
The format is a timestamp, followed by the uuid of the remote,
|
|
followed by the content identifiers which are separated by colons.
|
|
If a content identifier contains a colon or \r or \n, it will be base64
|
|
encoded. Base64 encoded values are indicated by prefixing them with "!".
|
|
|
|
1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55 5248916:5250378
|
|
|
|
## `aaa/bbb/*.log.cnk`
|
|
|
|
These log files are used when objects are stored in chunked form on
|
|
remotes. They record the size(s) of the chunks, and the number of chunks.
|
|
|
|
For example, this logs that a remote has an object stored using both
|
|
9 chunks of 1 mb size, and 1 chunk of 10 mb size.
|
|
|
|
1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55:10240 9
|
|
1287290776.765153s e605dca6-446a-11e0-8b2a-002170d25c55:102400 1
|
|
|
|
(When those chunks are removed from the remote, the 9 is changed to 0.)
|
|
|
|
## `schedule.log`
|
|
|
|
Used to record scheduled events, such as periodic fscks.
|
|
|
|
The file format is simply one line per repository, with the uuid followed by a
|
|
space and then its schedule, followed by a timestamp.
|
|
|
|
There can be multiple events in the schedule, separated by "; ".
|
|
|
|
The format of the scheduled events is the same described in
|
|
[[git-annex-schedule]].
|
|
|
|
Example:
|
|
|
|
42bf2035-0636-461d-a367-49e9dfd361dd fsck self 30m every day at any time; fsck 4b3ebc86-0faf-4892-83c5-ce00cbe30f0a 1h every year at any time timestamp=1385646997.053162s
|
|
|
|
## `activity.log`
|
|
|
|
Used to record the times of activities, such as fscks.
|
|
|
|
Example:
|
|
|
|
42bf2035-0636-461d-a367-49e9dfd361dd Fsck timestamp=1422387398.30395s
|
|
|
|
## `transitions.log`
|
|
|
|
Used to record transitions, eg by `git annex forget`
|
|
|
|
Each line of the file is a transition, followed by a timestamp.
|
|
|
|
Example:
|
|
|
|
ForgetGitHistory 1387325539.685136s
|
|
ForgetDeadRemotes 1387325539.685136s
|
|
|
|
## `difference.log`
|
|
|
|
Used when a repository has fundamental differences from other repositories,
|
|
that should prevent merging.
|
|
|
|
Example:
|
|
|
|
e605dca6-446a-11e0-8b2a-002170d25c55 [ObjectHashLower] timestamp=1422387398.30395s
|
|
|
|
## `multicast.log`
|
|
|
|
Records uftp public key fingerprints, for use by [[git-annex-multicast]].
|