373 lines
14 KiB
Markdown
373 lines
14 KiB
Markdown
# NAME
|
|
|
|
git-annex-preferred-content - which files are wanted in a repository
|
|
|
|
# DESCRIPTION
|
|
|
|
Each repository has a preferred content setting, which specifies content
|
|
that the repository wants to have present. These settings can be configured
|
|
using `git annex vicfg` or `git annex wanted`.
|
|
They are used by the `--auto` option, by `git annex sync --content`,
|
|
by clusters, and by the git-annex assistant.
|
|
|
|
While preferred content expresses a preference, it can be overridden
|
|
by simply using `git annex drop`. On the other hand, required content
|
|
settings are enforced; `git annex drop` will refuse to drop a file if
|
|
doing so would violate its required content settings. A repository's
|
|
required content can be configured using `git annex vicfg` or
|
|
`git annex required`.
|
|
|
|
# SYNTAX
|
|
|
|
Preferred content expressions use a similar syntax to
|
|
the [[git-annex-matching-options]](1), without the dashes.
|
|
For example:
|
|
|
|
exclude=archive/* and (include=*.mp3 or smallerthan=1mb)
|
|
|
|
The idea is that you write an expression that files are matched against. If
|
|
a file matches, the repository wants to store its content. If it doesn't,
|
|
the repository wants to drop its content (if there are enough copies
|
|
elsewhere to allow removing it).
|
|
|
|
# EXPRESSIONS
|
|
|
|
* `include=glob` / `exclude=glob`
|
|
|
|
Match files to include, or exclude.
|
|
|
|
While the command-line options --include=glob and --exclude=glob match
|
|
files relative to the current directory, preferred content expressions
|
|
match files relative to the top of the git repository.
|
|
|
|
A glob is something like `foo.*` or `b?r`.
|
|
Globs can also contain character classes,
|
|
like `foo[Bb]ar`, as well as additional POSIX character classes like
|
|
`[[:space:]]`. Which is useful, since a glob in a preferred content
|
|
expression cannot contain spaces. See the `glob(7)` man page for more
|
|
about globs.
|
|
|
|
For example, suppose you put files into `archive` directories
|
|
when you're done with them. Then you could configure your laptop to prefer
|
|
to not retain those files, like this: `exclude=*/archive/*`
|
|
|
|
When a subdirectory is being exported or imported to a special remote (see
|
|
[[git-annex-export]](1)) and [[git-annex-import]](1), these match relative
|
|
to the top of the subdirectory.
|
|
|
|
Note that, when a command is run with the `--all` option, or in a bare
|
|
repository, there is no filename associated with an annexed object,
|
|
and so "include=" and "exclude=" will not match.
|
|
|
|
* `copies=number`
|
|
|
|
Matches only files that git-annex believes to have the specified number
|
|
of copies, or more. Note that it does not check remotes to verify that
|
|
the copies still exist.
|
|
|
|
To decide if content should be dropped, git-annex evaluates the preferred
|
|
content expression under the assumption that the content has *already* been
|
|
dropped. If the content would not be wanted then, the drop can be done.
|
|
So, for example, `copies=2` in a preferred content expression lets
|
|
content be dropped only when there are currently 3 copies of it, including
|
|
the repo it's being dropped from. This is different than running `git annex
|
|
drop --copies=2`, which will drop files that currently have 2 copies.
|
|
|
|
* `copies=trustlevel:number`
|
|
|
|
Matches only files that git-annex believes have the specified number
|
|
copies, on remotes with the specified trust level. For example,
|
|
`copies=trusted:2`
|
|
|
|
To match any trust level at or higher than a given level,
|
|
use `trustlevel+`. For example, `copies=semitrusted+:2`
|
|
|
|
* `copies=groupname:number`
|
|
|
|
Matches only files that git-annex believes have the specified number of
|
|
copies, on remotes in the specified group. For example,
|
|
`copies=archive:2`
|
|
|
|
Preferred content expressions have no equivalent to the `--in`
|
|
option, but groups can accomplish similar things. You can add
|
|
repositories to groups, and match against the groups in a
|
|
preferred content expression. So rather than `--in=usbdrive`,
|
|
put all the USB drives into a "transfer" group, and use
|
|
`copies=transfer:1`
|
|
|
|
* `lackingcopies=number`
|
|
|
|
Matches only files that git-annex believes need the specified number or
|
|
more additional copies to be made in order to satisfy their numcopies
|
|
settings.
|
|
|
|
* `approxlackingcopies=number`
|
|
|
|
Like lackingcopies, but does not look at .gitattributes annex.numcopies
|
|
settings. This makes it significantly faster.
|
|
|
|
* `inbackend=backendname`
|
|
|
|
Matches only files whose content is stored using the specified key-value
|
|
backend.
|
|
|
|
See [[git-annex-backends]](1) for information about available backends.
|
|
|
|
* `securehash`
|
|
|
|
Matches only files whose content is hashed using a cryptographically
|
|
secure function.
|
|
|
|
* `inallgroup=groupname`
|
|
|
|
Matches only files that git-annex believes are present in all repositories
|
|
in the specified group.
|
|
|
|
* `onlyingroup=groupname`
|
|
|
|
Matches files that git-annex believes are present in at least one
|
|
repository that is in the specified group, and are not present in any
|
|
repositories that are not in the specified group.
|
|
|
|
* `smallerthan=size` / `largerthan=size`
|
|
|
|
Matches only files whose content is smaller than, or larger than the
|
|
specified size.
|
|
|
|
The size can be specified with any commonly used units, for example,
|
|
"0.5 gb" or "100 KiloBytes"
|
|
|
|
* `metadata=field=glob`
|
|
|
|
Matches only files that have a metadata field attached with a value that
|
|
matches the glob. The values of metadata fields are matched case
|
|
insensitively.
|
|
|
|
A glob is something like `foo.*` or `b?r`.
|
|
Globs can also contain character classes,
|
|
like `foo[Bb]ar`, as well as additional POSIX character classes like
|
|
`[[:space:]]`. Which is useful, since a glob in a preferred content
|
|
expression cannot contain spaces. See the `glob(7)` man page for more
|
|
about globs.
|
|
|
|
To match a tag "done", use `metadata=tag=done`
|
|
|
|
To match author metadata, use `metadata=author=*Smith`
|
|
|
|
* `metadata=field<number` / `metadata=field>number`
|
|
* `metadata=field<=number` / `metadata=field>=number`
|
|
|
|
Matches only files that have a metadata field attached with a value that
|
|
is a number and is less than or greater than the specified number.
|
|
|
|
To match PDFs with between 100 and 200 pages (assuming something has set
|
|
that metadata), use `metadata=pagecount>=100 and metadata=pagecount<=200`
|
|
|
|
* `present`
|
|
|
|
Makes content be wanted if it's present, but not otherwise.
|
|
|
|
This leaves it up to you to use git-annex manually
|
|
to move content around. You can use this to avoid preferred content
|
|
settings from affecting a subdirectory. For example:
|
|
`auto/* or (include=ad-hoc/* and present)`
|
|
|
|
Note that `not present` is a very bad thing to put in a preferred content
|
|
expression. It'll make it want to get content that's not present, and
|
|
drop content that is present! Don't go there..
|
|
|
|
* `inpreferreddir`
|
|
|
|
Makes content be preferred if it's in a directory (located anywhere
|
|
in the tree) with a particular name.
|
|
|
|
The name of the directory can be configured using
|
|
`git annex enableremote $remote preferreddir=$dirname`
|
|
|
|
(If no directory name is configured, it uses "public" by default.)
|
|
|
|
Note that, when a command is run with the `--all` option, or in a bare
|
|
repository, there is no filename associated with an annexed object,
|
|
and so "inpreferreddir" will not match.
|
|
|
|
* `standard`
|
|
|
|
git-annex comes with some built-in preferred content expressions, that
|
|
can be used with repositories that are in some standard groups
|
|
such as "client" and "transfer".
|
|
|
|
When a repository is in exactly one such group, you can use the "standard"
|
|
keyword in its preferred content expression, to match whatever content
|
|
the group's expression matches.
|
|
|
|
Most often, the whole preferred content expression is simply "standard".
|
|
But, you can do more complicated things, for example:
|
|
`standard or include=otherdir/*`
|
|
|
|
* `groupwanted`
|
|
|
|
The "groupwanted" keyword can be used to refer to a preferred content
|
|
expression that is associated with a group, as long as there is exactly
|
|
one such expression amoung the groups a repository is in. This is like
|
|
the "standard" keyword, but you can configure the preferred content
|
|
expressions using `git annex groupwanted`.
|
|
|
|
When writing a groupwanted preferred content expression,
|
|
you can use all the keywords documented here, including "standard".
|
|
(But not "groupwanted".)
|
|
|
|
For example, to make a variant of the standard client preferred content
|
|
expression that does not want files in the "out" directory, you
|
|
could run: `git annex groupwanted client "standard and exclude=out/*"`
|
|
|
|
Then repositories that are in the client group and have their preferred
|
|
content expression set to "groupwanted" will use that, while
|
|
other client repositories that have their preferred content expression
|
|
set to "standard" will use the standard expression.
|
|
|
|
Or, you could make a new group, with your own custom preferred content
|
|
expression tuned for your needs, and every repository you put in this
|
|
group and make its preferred content be "groupwanted" will use it.
|
|
|
|
For example, the archive group only wants to archive 1 copy of each file,
|
|
spread among every repository in the group.
|
|
Here's how to configure a group named redundantarchive, that instead
|
|
wants to contain 3 copies of each file:
|
|
|
|
git annex groupwanted redundantarchive "not (copies=redundantarchive:3)"
|
|
for repo in foo bar baz; do
|
|
git annex group $repo redundantarchive
|
|
git annex wanted $repo groupwanted
|
|
done
|
|
|
|
* `unused`
|
|
|
|
Matches only keys that `git annex unused` has determined to be unused.
|
|
|
|
This is related the the --unused option.
|
|
However, putting `unused` in a preferred content expression
|
|
doesn't make git-annex consider those unused keys. So when git-annex is
|
|
only checking preferred content expressions against files in the
|
|
repository (which are obviously used), `unused` in a preferred
|
|
content expression won't match anything.
|
|
|
|
So when is `unused` useful in a preferred content expression?
|
|
|
|
Using `git annex sync --content --all` will operate on all files,
|
|
including unused ones, and take `unused` in preferred content expressions
|
|
into account.
|
|
|
|
The git-annex assistant periodically scans for unused files, and
|
|
moves them to some repository whose preferred content expression
|
|
says it wants them. (Or, if annex.expireunused is set, it may just delete
|
|
them.)
|
|
|
|
* `balanced=groupname[:number]`
|
|
|
|
Makes content be evenly balanced amoung repositories in the group.
|
|
|
|
The number is the number of repositories in the group that will
|
|
want each file. When not specified, the default is 1.
|
|
|
|
For this to work, each repository in the group should have its preferred
|
|
content set to the same expression. Using `groupwanted` is a good
|
|
way to do that.
|
|
|
|
For example, "balanced=backup:2", when there are 3 members of the backup
|
|
group, will make each backup repository want 2/3rds of the files.
|
|
|
|
The sizes of files are not taken into account, so it's possible for
|
|
one repository to get larger than usual files and so fill up before
|
|
the other repositories. But files are only wanted by repositories that
|
|
have enough free space to hold them. So once a repository is full,
|
|
the remaining repositories will have any additional files balanced
|
|
amoung them. In order for this to work, you must use
|
|
[[git-annex-maxsize]](1) to specify the size of each repository in the
|
|
group.
|
|
|
|
This usually avoids moving files between repositories of the group, even
|
|
if that means that things are not optimally balanced. Some of the ways
|
|
that it can get out of balance include adding a new repository to the
|
|
group, or a file getting copied into more repositories in the group than
|
|
the specified number. Running git-annex commands with the `--rebalance`
|
|
option will make this expression instead behave like the `fullybalanced`
|
|
expression, which will make repositories want to move files around as
|
|
necessary in order to get fully balanced.
|
|
|
|
Using this in a perferred content expression makes git-annex need to do
|
|
some additional work to keep track of how full repositories are. Usually
|
|
that won't affect performance much. However, the first time git-annex
|
|
processes this in a given git repository, it will need to examine
|
|
all the locations of all files, which can be slow when there are a lot of
|
|
them. When this causes git-annex to do a lot of work, it will
|
|
display "(calculating repository sizes)".
|
|
|
|
Note that `not balanced` is a bad thing to put in a preferred content
|
|
expression for the same reason `not present` is.
|
|
|
|
* `fullybalanced=groupname[:number]`
|
|
|
|
This is like `balanced`, but allows moving content between repositories
|
|
in the group at any time to keep it fully balanced.
|
|
|
|
Normally "balanced=groupname:number" is the same as
|
|
"(fullybalanced=groupname:number and not copies=groupname:number) or present"
|
|
|
|
When the `--rebalance` option is used, `balanced` is the same as
|
|
`fullybalanced`.
|
|
|
|
* `anything`
|
|
|
|
Always matches.
|
|
|
|
* `nothing`
|
|
|
|
Never matches. (Same as "not anything")
|
|
|
|
* `not expression`
|
|
|
|
Inverts what the expression matches. For example, `not include=archive/*`
|
|
is the same as `exclude=archive/*`
|
|
|
|
* `and` / `or` / `( expression )`
|
|
|
|
These can be used to build up more complicated expressions.
|
|
|
|
# TESTING
|
|
|
|
To check at the command line which files are matched by a repository's
|
|
preferred content settings, you can use the --want-get and --want-drop
|
|
options.
|
|
|
|
For example, git annex find --want-get --not --in . will find all the files
|
|
that git annex get --auto will want to get, and git annex find --want-drop --in
|
|
. will find all the files that git annex drop --auto will want to drop.
|
|
|
|
The --explain option can be used to understand why a complex preferred
|
|
content expression matches or fails to match. The expression will
|
|
be displayed, with each term followed by "[TRUE]" or "[FALSE]" to indicate
|
|
the value. Irrelevant terms will be ommitted from the explanation,
|
|
for example `"exclude=* and copies=1"` will be displayed as
|
|
`"exclude=*[FALSE]"`
|
|
|
|
# SEE ALSO
|
|
|
|
[[git-annex]](1)
|
|
|
|
[[git-annex-vicfg]](1)
|
|
|
|
[[git-annex-wanted]](1)
|
|
|
|
[[git-annex-maxsize]](1)
|
|
|
|
<https://git-annex.branchable.com/preferred_content/>
|
|
|
|
<https://git-annex.branchable.com/preferred_content/standard_groups/>
|
|
|
|
# AUTHOR
|
|
|
|
Joey Hess <id@joeyh.name>
|
|
|
|
<http://git-annex.branchable.com/>
|
|
|
|
Warning: Automatically converted into a man page by mdwn2man. Edit with care.
|