git-annex/doc/git-annex-matching-options.mdwn
Joey Hess a56b151f90
fix longstanding indeterminite preferred content for duplicated file problem
* drop: When two files have the same content, and a preferred content
  expression matches one but not the other, do not drop the file.
* sync --content, assistant: Fix an edge case where a file that is not
  preferred content did not get dropped.

The sync --content edge case is that handleDropsFrom loaded associated files
and used them without verifying that the information from the database was
not stale.

It seemed best to avoid changing --want-drop's behavior, this way when
debugging a preferred content expression with it, the files matched will
still reflect the expression. So added a note to the --want-drop documentation,
to make clear it may not behave identically to git-annex drop --auto.

While it would be possible to introspect the preferred content
expression to see if it matches on filenames, and only look up the
associated files when it does, it's generally fairly rare for 2 files to
have the same content, and the database lookup is already avoided when
there's only 1 file, so I did not implement that further optimisation.

Note that there are still some situations where the associated files
database does not get locked files recorded in it, which will prevent
this fix from working.

Sponsored-by: Dartmouth College's Datalad project
2021-05-24 14:07:05 -04:00

230 lines
6.5 KiB
Markdown

# NAME
git-annex-matching-options - specifying files to act on
# DESCRIPTION
Many git-annex commands support using these options to specify which
files they act on.
Arbitrarily complicated expressions can be built using these options.
For example:
--include='*.mp3' --and -( --in=usbdrive --or --in=archive -)
The above example makes git-annex work on only mp3 files that are present
in either of two repositories.
# OPTIONS
* `--exclude=glob`
Skips files matching the glob pattern. The glob is matched relative to
the current directory. For example:
--exclude='*.mp3' --exclude='subdir/*'
Note that this will not match anything when using --all or --unused.
* `--include=glob`
Skips files not matching the glob pattern. (Same as `--not --exclude`.)
For example, to include only mp3 and ogg files:
--include='*.mp3' --or --include='*.ogg'
Note that this will not skip anything when using --all or --unused.
* `--in=repository`
Matches only files that git-annex believes have their contents present
in a repository. Note that it does not check the repository to verify
that it still has the content.
The repository should be specified using the name of a configured remote,
or the UUID or description of a repository. For the current repository,
use `--in=here`
* `--in=repository@{date}`
Matches files currently in the work tree whose content was present in
the repository on the given date.
The date is specified in the same syntax documented in
gitrevisions(7). Note that this uses the reflog, so dates far in the
past cannot be queried.
For example, you might need to run `git annex drop .` to temporarily
free up disk space. The next day, you can get back the files you dropped
using `git annex get . --in=here@{yesterday}`
* `--copies=number`
Matches only files that git-annex believes to have the specified number
of copies, or more. Note that it does not check remotes to verify that
the copies still exist.
* `--copies=trustlevel:number`
Matches only files that git-annex believes have the specified number of
copies, on remotes with the specified trust level. For example,
`--copies=trusted:2`
To match any trust level at or higher than a given level,
use 'trustlevel+'. For example, `--copies=semitrusted+:2`
* `--copies=groupname:number`
Matches only files that git-annex believes have the specified number of
copies, on remotes in the specified group. For example,
`--copies=archive:2`
* `--lackingcopies=number`
Matches only files that git-annex believes need the specified number or
more additional copies to be made in order to satisfy their numcopies
settings.
* `--approxlackingcopies=number`
Like lackingcopies, but does not look at .gitattributes annex.numcopies
settings. This makes it significantly faster.
* `--inbackend=name`
Matches only files whose content is stored using the specified key-value
backend.
* `--securehash`
Matches only files whose content is hashed using a cryptographically
secure function.
* `--inallgroup=groupname`
Matches only files that git-annex believes are present in all repositories
in the specified group.
* `--smallerthan=size`
* `--largerthan=size`
Matches only files whose content is smaller than, or larger than the
specified size.
The size can be specified with any commonly used units, for example,
"0.5 gb" or "100 KiloBytes"
* `--metadata field=glob`
Matches only files that have a metadata field attached with a value that
matches the glob. The values of metadata fields are matched case
insensitively.
* `--metadata field<number` / `--metadata field>number`
* `--metadata field<=number` / `--metadata field>=number`
Matches only files that have a metadata field attached with a value that
is a number and is less than or greater than the specified number.
(Note that you will need to quote the second parameter to avoid
the shell doing redirection.)
* `--want-get`
Matches files that the preferred content settings for the repository
make it want to get. Note that this will match even files that are
already present, unless limited with e.g., `--not --in .`
* `--want-drop`
Matches files that the preferred content settings for the repository
make it want to drop. Note that this will match even files that have
already been dropped, unless limited with e.g., `--in .`
Files that this matches will not necessarily be dropped by
`git-annex drop --auto`. This does not check that there are enough copies
to drop. Also the same content may be used by a file that is not wanted
to be dropped.
* `--accessedwithin=interval`
Matches files that were accessed recently, within the specified time
interval.
The interval can be in the form "5m" or "1h" or "2d" or "1y", or a
combination such as "1h5m".
So for example, `--accessedwithin=1d` matches files that have been
accessed within the past day.
If the OS or filesystem does not support access times, this will not
match any files.
* `--unlocked`
Matches annexed files that are unlocked.
* `--locked`
Matches annexed files that are locked.
* `--mimetype=glob`
Looks up the MIME type of a file, and checks if the glob matches it.
For example, `--mimetype="text/*"` will match many varieties of text files,
including "text/plain", but also "text/x-shellscript", "text/x-makefile",
etc.
The MIME types are the same that are displayed by running `file --mime-type`
If the file's annexed content is not present, the file will not match.
This is only available to use when git-annex was built with the
MagicMime build flag.
* `--mimeencoding=glob`
Looks up the MIME encoding of a file, and checks if the glob matches it.
For example, `--mimeencoding=binary` will match many kinds of binary
files.
The MIME encodings are the same that are displayed by running `file --mime-encoding`
If the file's annexed content is not present, the file will not match.
This is only available to use when git-annex was built with the
MagicMime build flag.
* `--not`
Inverts the next matching option. For example, to only act on
files with less than 3 copies, use `--not --copies=3`
* `--and`
Requires that both the previous and the next matching option matches.
The default.
* `--or`
Requires that either the previous, or the next matching option matches.
* `-(`
Opens a group of matching options.
* `-)`
Closes a group of matching options.
# SEE ALSO
[[git-annex]](1)
# AUTHOR
Joey Hess <id@joeyh.name>
Warning: Automatically converted into a man page by mdwn2man. Edit with care.