git-annex/doc/git-annex.mdwn

2115 lines
68 KiB
Text
Raw Normal View History

2010-10-19 19:59:40 +00:00
# NAME
git-annex - manage files with git, without checking their contents in
# SYNOPSIS
git annex command [params ...]
2010-10-19 19:59:40 +00:00
# DESCRIPTION
2010-10-09 18:06:25 +00:00
git-annex allows managing files with git, without checking the file
2010-10-15 23:32:33 +00:00
contents into git. While that may seem paradoxical, it is useful when
dealing with files larger than git can currently easily handle, whether due
to limitations in memory, checksumming time, or disk space.
Even without file content tracking, being able to manage files with git,
move files around and delete files with versioned directory trees, and use
branches and distributed clones, are all very handy reasons to use git. And
annexed files can co-exist in the same git repository with regularly
versioned files, which is convenient for maintaining documents, Makefiles,
etc that are associated with annexed files but that benefit from full
revision control.
2010-10-09 18:06:25 +00:00
2010-10-19 19:59:40 +00:00
When a file is annexed, its content is moved into a key-value store, and
a symlink is made that points to the content. These symlinks are checked into
git and versioned like regular files. You can move them around, delete
2010-10-19 19:59:40 +00:00
them, and so on. Pushing to another git repository will make git-annex
there aware of the annexed file, and it can be used to retrieve its
content from the key-value store.
# EXAMPLES
# git annex get video/hackity_hack_and_kaxxt.mov
2014-12-20 12:18:24 +00:00
get video/hackity_hack_and_kaxxt.mov (not available)
2010-10-19 19:59:40 +00:00
I was unable to access these remotes: server
Try making some of these repositories available:
5863d8c0-d9a9-11df-adb2-af51e6559a49 -- my home file server
58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive
ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive
failed
# sudo mount /media/usb
# git remote add usbdrive /media/usb
# git annex get video/hackity_hack_and_kaxxt.mov
get video/hackity_hack_and_kaxxt.mov (from usbdrive...) ok
2010-10-19 19:59:40 +00:00
# git annex add iso
add iso/Debian_5.0.iso ok
2010-10-23 16:41:13 +00:00
# git annex drop iso/Debian_4.0.iso
drop iso/Debian_4.0.iso ok
2010-10-21 21:59:32 +00:00
2010-10-23 16:41:13 +00:00
# git annex move iso --to=usbdrive
2010-10-26 00:48:32 +00:00
move iso/Debian_5.0.iso (moving to usbdrive...) ok
2010-10-19 19:59:40 +00:00
2011-09-16 02:22:43 +00:00
# COMMONLY USED COMMANDS
2010-10-19 19:59:40 +00:00
* `help`
Display built-in help.
For help on a specific command, use `git annex help command`
2013-09-11 01:12:30 +00:00
* `add [path ...]`
2010-10-19 19:59:40 +00:00
Adds files to the annex.
See [[git-annex-add]](1) for details.
2010-10-19 19:59:40 +00:00
2013-09-11 01:12:30 +00:00
* `get [path ...]`
2010-10-19 19:59:40 +00:00
Makes the content of annexed files available in this repository.
See [[git-annex-get]](1) for details.
2011-10-27 23:04:12 +00:00
2013-09-11 01:12:30 +00:00
* `drop [path ...]`
2010-10-19 19:59:40 +00:00
Drops the content of annexed files from this repository.
See [[git-annex-drop]](1) for details.
2010-10-19 19:59:40 +00:00
* `move [path ...] [--from=remote|--to=remote]`
2010-10-21 21:59:32 +00:00
Moves the content of files from or to another remote.
2010-11-27 21:02:53 +00:00
See [[git-annex-move]](1) for details.
* `copy [path ...] [--from=remote|--to=remote]`
2010-11-27 21:02:53 +00:00
Copies the content of files from or to another remote.
See [[git-annex-copy]](1) for details.
2010-11-27 21:02:53 +00:00
2013-12-26 21:08:43 +00:00
* `status [path ...]`
Show the working tree status. (deprecated)
See [[git-annex-status]](1) for details.
2013-09-11 01:12:30 +00:00
* `unlock [path ...]`
2010-10-19 19:59:40 +00:00
Unlock annexed files for modification.
See [[git-annex-unlock]](1) for details.
2011-03-03 20:58:52 +00:00
2013-09-11 01:12:30 +00:00
* `edit [path ...]`
2011-03-03 20:58:52 +00:00
This is an alias for the unlock command. May be easier to remember,
if you think of this as allowing you to edit an annexed file.
2010-10-19 19:59:40 +00:00
2013-09-11 01:12:30 +00:00
* `lock [path ...]`
2010-11-08 01:02:25 +00:00
Use this to undo an unlock command if you don't want to modify
the files, or have made modifications you want to discard.
See [[git-annex-lock]](1) for details.
2010-11-08 01:02:25 +00:00
* `pull [remote ...]`
Pull content from remotes.
See [[git-annex-pull]](1) for details.
* `push [remote ...]`
Push content to remotes.
See [[git-annex-push]](1) for details.
2013-09-11 01:12:30 +00:00
* `sync [remote ...]`
Synchronize local repository with remotes.
See [[git-annex-sync]](1) for details.
* `assist [remote ...]`
Add files and sync changes with remotes.
See [[git-annex-assist]](1) for details.
* `satisfy [remote ...]`
2023-06-29 18:15:01 +00:00
Satisfy preferred content settings by transferring and dropping content.
2023-06-29 18:15:01 +00:00
See [[git-annex-satisfy]](1) for details.
* `mirror [path ...] [--to=remote|--from=remote]`
Mirror content of files to/from another repository.
See [[git-annex-mirror]](1) for details.
2014-01-01 21:39:33 +00:00
2013-09-11 01:12:30 +00:00
* `addurl [url ...]`
2011-09-16 02:22:43 +00:00
Downloads each url to its own file, which is added to the annex.
See [[git-annex-addurl]](1) for details.
2013-09-11 01:12:30 +00:00
* `rmurl file url`
Record that the file is no longer available at the url.
See [[git-annex-rmurl]](1) for details.
* `import --from remote branch[:subdir] | [path ...]`
Add a tree of files to the repository.
See [[git-annex-import]](1) for details.
2013-09-11 01:12:30 +00:00
* `importfeed [url ...]`
Imports the contents of podcast feeds into the annex.
See [[git-annex-importfeed]](1) for details.
* `export treeish --to remote`
Export content to a remote.
See [[git-annex-export]](1) for details.
* `undo [filename|directory] ...`
Undo last change to a file or directory.
See [[git-annex-undo]](1) for details.
* `multicast`
Multicast file distribution.
See [[git-annex-multicast]](1) for details.
2013-09-11 01:12:30 +00:00
* `watch`
2012-06-07 03:27:20 +00:00
Daemon to watch for changes and autocommit.
See [[git-annex-watch]](1) for details.
2012-06-07 03:27:20 +00:00
2013-09-11 01:12:30 +00:00
* `assistant`
Daemon to automatically sync changes.
See [[git-annex-assistant]](1) for details.
2012-12-29 18:43:53 +00:00
2013-09-11 01:12:30 +00:00
* `webapp`
Opens a web app, that allows easy setup of a git-annex repository,
and control of the git-annex assistant. If the assistant is not
already running, it will be started.
See [[git-annex-webapp]](1) for details.
* `remotedaemon`
Persistant communication with remotes.
See [[git-annex-remotedaemon]](1) for details.
2011-09-16 02:22:43 +00:00
# REPOSITORY SETUP COMMANDS
2013-09-11 01:12:30 +00:00
* `init [description]`
2011-08-17 18:43:38 +00:00
Until a repository (or one of its remotes) has been initialized,
git-annex will refuse to operate on it, to avoid accidentally
2011-08-17 18:43:38 +00:00
using it in a repository that was not intended to have an annex.
See [[git-annex-init]](1) for details.
2011-03-03 20:58:52 +00:00
2013-09-11 01:12:30 +00:00
* `describe repository description`
2011-03-03 20:58:52 +00:00
2011-03-29 02:05:11 +00:00
Changes the description of a repository.
See [[git-annex-describe]](1) for details.
2011-03-03 20:58:52 +00:00
* `initremote name type=value [param=value ...]`
2011-03-28 06:12:05 +00:00
Creates a new special remote, and adds it to `.git/config`.
See [[git-annex-initremote]](1) for details.
2013-09-11 01:12:30 +00:00
* `enableremote name [param=value ...]`
Enables use of an existing special remote in the current repository.
See [[git-annex-enableremote]](1) for details.
* `configremote name [param=value ...]`
Changes configuration of an existing special remote.
See [[git-annex-configremote]](1) for details.
2019-04-15 17:05:44 +00:00
* `renameremote`
Renames a special remote.
See [[git-annex-renameremote]](1) for details.
* `enable-tor`
Sets up tor hidden service.
See [[git-annex-enable-tor]](1) for details.
* `numcopies [N]`
Configure desired number of copies.
See [[git-annex-numcopies]](1) for details.
* `mincopies [N]`
Configure minimum number of copies.
See [[git-annex-mincopies]](1) for details.
2013-09-11 01:12:30 +00:00
* `trust [repository ...]`
2011-09-16 02:22:43 +00:00
Records that a repository is trusted to not unexpectedly lose
content. Use with care.
See [[git-annex-trust]](1) for details.
2011-09-16 02:22:43 +00:00
2013-09-11 01:12:30 +00:00
* `untrust [repository ...]`
2011-09-16 02:22:43 +00:00
Records that a repository is not trusted and could lose content
at any time.
See [[git-annex-untrust]](1) for details.
2011-09-16 02:22:43 +00:00
2013-09-11 01:12:30 +00:00
* `semitrust [repository ...]`
2011-09-16 02:22:43 +00:00
Returns a repository to the default semi trusted state.
See [[git-annex-semitrust]](1) for details.
2011-09-16 02:22:43 +00:00
2013-09-11 01:12:30 +00:00
* `group repository groupname`
Add a repository to a group.
See [[git-annex-group]](1) for details.
2013-09-11 01:12:30 +00:00
* `ungroup repository groupname`
Removes a repository from a group.
See [[git-annex-ungroup]](1) for details.
* `wanted repository [expression]`
Get or set preferred content expression.
See [[git-annex-wanted]](1) for details.
* `groupwanted groupname [expression]`
Get or set groupwanted expression.
See [[git-annex-groupwanted]](1) for details.
* `required repository [expression]`
Get or set required content expression.
See [[git-annex-required]](1) for details.
* `schedule repository [expression]`
Get or set scheduled jobs.
See [[git-annex-schedule]](1) for details.
* `config`
Get and set other configuration stored in git-annex branch.
See [[git-annex-config]](1) for details.
2013-09-11 01:12:30 +00:00
* `vicfg`
Opens EDITOR on a temp file containing most of the above configuration
settings, as well as a few others, and when it exits, stores any changes
made back to the git-annex branch.
See [[git-annex-vicfg]](1) for details.
* `adjust`
Switches a repository to use an adjusted branch, which can automatically
unlock all files, etc.
See [[git-annex-adjust]](1) for details.
2013-09-11 01:12:30 +00:00
* `direct`
2012-12-13 19:44:56 +00:00
Switches a repository to use direct mode. (deprecated)
See [[git-annex-direct]](1) for details.
2013-01-06 21:26:22 +00:00
2013-09-11 01:12:30 +00:00
* `indirect`
2012-12-13 19:44:56 +00:00
Switches a repository to use indirect mode. (deprecated)
See [[git-annex-indirect]](1) for details.
2012-12-13 19:44:56 +00:00
2011-09-16 02:22:43 +00:00
# REPOSITORY MAINTENANCE COMMANDS
2013-09-11 01:12:30 +00:00
* `fsck [path ...]`
2010-11-15 22:04:19 +00:00
Checks the annex consistency, and warns about or fixes any problems found.
This is a good complement to `git fsck`.
2010-11-15 22:04:19 +00:00
See [[git-annex-fsck]](1) for details.
* `expire [repository:]time ...`
Expires repositories that have not recently performed an activity
(such as a fsck).
2015-05-29 16:12:11 +00:00
See [[git-annex-expire]](1) for details.
2013-09-11 01:12:30 +00:00
* `unused`
2010-11-15 22:04:19 +00:00
Checks the annex for data that does not correspond to any files present
2011-09-28 21:48:45 +00:00
in any tag or branch, and prints a numbered list of the data.
See [[git-annex-unused]](1) for details.
2013-09-11 01:12:30 +00:00
* `dropunused [number|range ...]`
2010-11-15 22:04:19 +00:00
Drops the data corresponding to the numbers, as listed by the last
2010-11-15 22:22:50 +00:00
`git annex unused`
See [[git-annex-dropunused]](1) for details.
2011-04-03 00:59:41 +00:00
2013-09-11 01:12:30 +00:00
* `addunused [number|range ...]`
Adds back files for the content corresponding to the numbers or ranges,
as listed by the last `git annex unused`.
See [[git-annex-addunused]](1) for details.
2013-09-11 01:12:30 +00:00
* `fix [path ...]`
2011-09-16 02:22:43 +00:00
Fixes up symlinks that have become broken to again point to annexed content.
See [[git-annex-fix]](1) for details.
2011-09-16 02:22:43 +00:00
* `merge`
Automatically merge changes from remotes.
See [[git-annex-merge]](1) for details.
2013-09-11 01:12:30 +00:00
* `upgrade`
2011-09-16 02:22:43 +00:00
Upgrades the repository.
See [[git-annex-upgrade]](1) for details.
2011-09-16 02:22:43 +00:00
* `dead [repository ...] [--key key]`
Indicates that a repository or a single key has been irretrievably lost.
See [[git-annex-dead]](1) for details.
2013-09-11 01:12:30 +00:00
* `forget`
Causes the git-annex branch to be rewritten, throwing away historical
data about past locations of files.
See [[git-annex-forget]](1) for details.
2021-05-13 20:17:45 +00:00
* `filter-branch`
Produces a filtered version of the git-annex branch.
See [[git-annex-filter-branch]](1) for details.
2013-10-23 16:21:59 +00:00
* `repair`
This can repair many of the problems with git repositories that `git fsck`
2013-10-23 16:21:59 +00:00
detects, but does not itself fix. It's useful if a repository has become
badly damaged. One way this can happen is if a repository used by git-annex
2013-10-23 16:21:59 +00:00
is on a removable drive that gets unplugged at the wrong time.
See [[git-annex-repair]](1) for details.
2013-10-23 16:21:59 +00:00
* `p2p`
Configure peer-2-Peer links between repositories.
See [[git-annex-p2p]](1) for details.
2011-09-16 02:22:43 +00:00
# QUERY COMMANDS
2013-09-11 01:12:30 +00:00
* `find [path ...]`
2010-11-15 22:04:19 +00:00
Outputs a list of annexed files in the specified path. With no path,
finds files in the current directory and its subdirectories.
2010-11-15 22:04:19 +00:00
See [[git-annex-find]](1) for details.
2013-09-11 01:12:30 +00:00
* `whereis [path ...]`
2015-02-25 18:31:17 +00:00
Displays information about where the contents of files are located.
See [[git-annex-whereis]](1) for details.
2013-09-19 18:16:28 +00:00
* `list [path ...]`
Displays a table of remotes that contain the contents of the specified
files. This is similar to whereis but a more compact display.
See [[git-annex-list]](1) for details.
* `whereused`
2021-07-14 21:08:38 +00:00
Finds what files use or used a key.
2013-09-11 01:12:30 +00:00
* `log [path ...]`
Displays the location log for the specified file or files,
showing each repository they were added to ("+") and removed from ("-").
See [[git-annex-log]](1) for details.
* `oldkeys [path ...]`
List keys used for old versions of files.
2023-09-11 17:15:37 +00:00
See [[git-annex-oldkeys]](1) for details.
* `info [directory|file|remote|uuid ...]`
Displays statistics and other information for the specified item,
which can be a directory, or a file, or a remote, or the uuid of a
repository.
When no item is specified, displays statistics and information
for the repository as a whole.
See [[git-annex-info]](1) for details.
* `version`
Shows the version of git-annex, as well as repository version information.
See [[git-annex-version]](1) for details.
2013-09-11 01:12:30 +00:00
* `map`
Generate map of repositories.
See [[git-annex-map]](1) for details.
* `inprogress`
Access files while they're being downloaded.
See [[git-annex-inprogress]](1) for details.
* `findkeys`
Similar to `git-annex find`, but operating on keys.
See [[git-annex-findkeys]](1) for details.
2014-02-19 18:55:34 +00:00
# METADATA COMMANDS
2011-09-16 02:22:43 +00:00
* `metadata [path ...]`
The content of an annexed file can have any number of metadata fields
2014-03-26 20:55:29 +00:00
attached to it to describe it. Each metadata field can in turn
have any number of values.
2014-03-15 21:29:40 +00:00
This command can be used to set metadata, or show the currently set
metadata.
See [[git-annex-metadata]](1) for details.
* `view [tag ...] [field=value ...] [field=glob ...] [?tag ...] [field?=glob] [!tag ...] [field!=value ...]`
2014-02-19 18:55:34 +00:00
Uses metadata to build a view branch of the files in the current branch,
and checks out the view branch. Only files in the current branch whose
metadata matches all the specified field values and tags will be
shown in the view.
See [[git-annex-view]](1) for details.
2014-02-19 18:55:34 +00:00
* `vpop [N]`
Switches from the currently active view back to the previous view.
Or, from the first view back to original branch.
See [[git-annex-vpop]](1) for details.
2014-02-19 18:55:34 +00:00
* `vfilter [tag ...] [field=value ...] [!tag ...] [field!=value ...]`
Filters the current view to only the files that have the
specified field values and tags.
See [[git-annex-vfilter]](1) for details.
2014-03-02 19:46:58 +00:00
* `vadd [field=glob ...] [field=value ...] [tag ...]`
2014-02-19 18:55:34 +00:00
Changes the current view, adding an additional level of directories
to categorize the files.
See [[git-annex-vfilter]](1) for details.
2014-02-19 18:55:34 +00:00
* `vcycle`
When a view involves nested subdirectories, this cycles the order.
See [[git-annex-vcycle]](1) for details.
2014-02-19 18:55:34 +00:00
# UTILITY COMMANDS
2013-09-11 01:12:30 +00:00
* `migrate [path ...]`
2011-09-16 02:22:43 +00:00
Changes the specified annexed files to use a different key-value backend.
See [[git-annex-migrate]](1) for details.
2011-09-16 02:22:43 +00:00
2013-09-11 01:12:30 +00:00
* `reinject src dest`
2011-10-31 19:18:41 +00:00
Moves the src file into the annex as the content of the dest file.
This can be useful if you have obtained the content of a file from
elsewhere and want to put it in the local annex.
See [[git-annex-reinject]](1) for details.
2013-09-11 01:12:30 +00:00
* `unannex [path ...]`
2010-10-21 21:59:32 +00:00
Use this to undo an accidental `git annex add` command. It puts the
file back how it was before the add.
See [[git-annex-unannex]](1) for details.
2013-09-11 01:12:30 +00:00
* `uninit`
2010-12-03 04:33:41 +00:00
De-initialize git-annex and clean out repository.
2015-05-29 16:12:11 +00:00
See [[git-annex-uninit]](1) for details.
2010-12-03 04:33:41 +00:00
* `reinit uuid|description`
Initialize repository, reusing old UUID.
See [[git-annex-reinit]](1) for details.
2011-09-16 02:22:43 +00:00
# PLUMBING COMMANDS
2010-10-19 19:59:40 +00:00
2013-09-11 01:12:30 +00:00
* `pre-commit [path ...]`
2010-10-27 18:33:44 +00:00
This is meant to be called from git's pre-commit hook. `git annex init`
automatically creates a pre-commit hook using this.
See [[git-annex-pre-commit]](1) for details.
2010-10-27 18:33:44 +00:00
* `post-receive`
This is meant to be called from git's post-receive hook. `git annex init`
automatically creates a post-receive hook using this.
See [[git-annex-post-receive]](1) for details.
2013-12-15 18:02:23 +00:00
* `lookupkey [file ...]`
Looks up key used for file.
See [[git-annex-lookupkey]](1) for details.
* `calckey [file ...]`
Calculates the key that would be used to refer to a file.
See [[git-annex-calckey]](1) for details.
* `contentlocation [key ..]`
Looks up location of annexed content for a key.
See [[git-annex-contentlocation]](1) for details.
* `examinekey [key ...]`
Print information that can be determined purely by looking at the key.
See [[git-annex-examinekey]](1) for details.
* `matchexpression`
Checks if a preferred content expression matches provided data.
See [[git-annex-matchexpression]](1) for details.
2015-03-15 18:07:43 +00:00
* `fromkey [key file]`
Manually set up a file in the git repository to link to a specified key.
See [[git-annex-fromkey]](1) for details.
2015-03-15 18:07:43 +00:00
* `registerurl [key url]`
Registers an url for a key.
See [[git-annex-registerurl]](1) for details.
* `unregisterurl [key url]`
Unregisters an url for a key.
See [[git-annex-unregisterurl]](1) for details.
* `setkey key file`
Moves a file into the annex as the content of a key.
See [[git-annex-setkey]](1) for details.
2013-09-11 01:12:30 +00:00
* `dropkey [key ...]`
Drops annexed content for specified keys.
See [[git-annex-dropkey]](1) for details.
* `transferkey key [--from=remote|--to=remote]`
2013-12-15 17:48:26 +00:00
Transfers a key from or to a remote.
See [[git-annex-transferkey]](1) for details.
* `transferrer`
Used internally by git-annex to transfer content.
See [[git-annex-transferrer]](1) for details.
2013-12-19 20:51:57 +00:00
* `transferkeys`
Used internally by old versions of the assistant.
2013-12-19 20:51:57 +00:00
See [[git-annex-transferkey]](1) for details.
2013-12-19 20:51:57 +00:00
* `setpresentkey key uuid [1|0]`
This plumbing-level command changes git-annex's records about whether
the specified key's content is present in a remote with the specified uuid.
See [[git-annex-setpresentkey]](1) for details.
* `readpresentkey key uuid`
Read records of where key is present.
See [[git-annex-readpresentkey]](1) for details.
* `checkpresentkey key remote`
Check if key is present in remote.
See [[git-annex-checkpresentkey]](1) for details.
2013-09-11 01:12:30 +00:00
* `rekey [file key ...]`
Change keys used for files.
See [[git-annex-rekey]](1) for details.
* `resolvemerge`
Resolves a conflicted merge, by adding both conflicting versions of the
file to the tree, using variants of their filename. This is done
automatically when using `git annex sync` or `git-annex pull`
or `git annex merge`.
See [[git-annex-resolvemerge]](1) for details.
* `diffdriver`
This can be used to make `git diff` diff the content of annexed files.
See [[git-annex-diffdriver]](1) for details.
* `smudge`
2015-12-04 17:02:56 +00:00
This command lets git-annex be used as a git filter driver, allowing
annexed files in the git repository to be unlocked regular files instead
of symlinks.
2015-12-04 17:02:56 +00:00
See [[git-annex-smudge]](1) for details.
2015-12-04 17:02:56 +00:00
* `filter-process`
An alternative implementation of a git filter driver, that is faster
in some situations and slower in others than `git-annex smudge`.
See [[git-annex-filter-process]](1) for details.
* `restage`
Restages unlocked files in the git index.
See [[git-annex-restage]](1) for details.
* `findref [ref]`
Lists files in a git ref. (deprecated)
See [[git-annex-findref]](1) for details.
* `proxy -- git cmd [options]`
Bypass direct mode guard. (deprecated)
See [[git-annex-proxy]](1) for details.
2014-08-01 16:49:26 +00:00
# TESTING COMMANDS
* `test`
This runs git-annex's built-in test suite.
See [[git-annex-test]](1) for details.
2014-08-01 16:49:26 +00:00
* `testremote remote`
This tests a remote by generating some random objects and sending them to
the remote, then redownloading them, removing them from the remote, etc.
It's safe to run in an existing repository (the repository contents are
not altered), although it may perform expensive data transfers.
See [[git-annex-testremote]](1) for details.
2014-08-01 16:49:26 +00:00
* `fuzztest`
Generates random changes to files in the current repository,
for use in testing the assistant.
See [[git-annex-fuzztest]](1) for details.
2014-08-01 16:49:26 +00:00
add database benchmark The benchmark shows that the database access is quite fast indeed! And, it scales linearly to the number of keys, with one exception, getAssociatedKey. Based on this benchmark, I don't think I need worry about optimising for cases where all files are locked and the database is mostly empty. In those cases, database access will be misses, and according to this benchmark, should add only 50 milliseconds to runtime. (NB: There may be some overhead to getting the database opened and locking the handle that this benchmark doesn't see.) joey@darkstar:~/src/git-annex>./git-annex benchmark setting up database with 1000 setting up database with 10000 benchmarking keys database/getAssociatedFiles from 1000 (hit) time 62.77 μs (62.70 μs .. 62.85 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 62.81 μs (62.76 μs .. 62.88 μs) std dev 201.6 ns (157.5 ns .. 259.5 ns) benchmarking keys database/getAssociatedFiles from 1000 (miss) time 50.02 μs (49.97 μs .. 50.07 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.09 μs (50.04 μs .. 50.17 μs) std dev 206.7 ns (133.8 ns .. 295.3 ns) benchmarking keys database/getAssociatedKey from 1000 (hit) time 211.2 μs (210.5 μs .. 212.3 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 211.0 μs (210.7 μs .. 212.0 μs) std dev 1.685 μs (334.4 ns .. 3.517 μs) benchmarking keys database/getAssociatedKey from 1000 (miss) time 173.5 μs (172.7 μs .. 174.2 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 173.7 μs (173.0 μs .. 175.5 μs) std dev 3.833 μs (1.858 μs .. 6.617 μs) variance introduced by outliers: 16% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (hit) time 64.01 μs (63.84 μs .. 64.18 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 64.85 μs (64.34 μs .. 66.02 μs) std dev 2.433 μs (547.6 ns .. 4.652 μs) variance introduced by outliers: 40% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (miss) time 50.33 μs (50.28 μs .. 50.39 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.32 μs (50.26 μs .. 50.38 μs) std dev 202.7 ns (167.6 ns .. 252.0 ns) benchmarking keys database/getAssociatedKey from 10000 (hit) time 1.142 ms (1.139 ms .. 1.146 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.142 ms (1.140 ms .. 1.144 ms) std dev 7.142 μs (4.994 μs .. 10.98 μs) benchmarking keys database/getAssociatedKey from 10000 (miss) time 1.094 ms (1.092 ms .. 1.096 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.095 ms (1.095 ms .. 1.097 ms) std dev 4.277 μs (2.591 μs .. 7.228 μs)
2016-01-12 17:01:44 +00:00
* `benchmark`
This runs git-annex's built-in benchmarks, if it was built with
benchmarking support.
See [[git-annex-benchmark]](1) for details.
add database benchmark The benchmark shows that the database access is quite fast indeed! And, it scales linearly to the number of keys, with one exception, getAssociatedKey. Based on this benchmark, I don't think I need worry about optimising for cases where all files are locked and the database is mostly empty. In those cases, database access will be misses, and according to this benchmark, should add only 50 milliseconds to runtime. (NB: There may be some overhead to getting the database opened and locking the handle that this benchmark doesn't see.) joey@darkstar:~/src/git-annex>./git-annex benchmark setting up database with 1000 setting up database with 10000 benchmarking keys database/getAssociatedFiles from 1000 (hit) time 62.77 μs (62.70 μs .. 62.85 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 62.81 μs (62.76 μs .. 62.88 μs) std dev 201.6 ns (157.5 ns .. 259.5 ns) benchmarking keys database/getAssociatedFiles from 1000 (miss) time 50.02 μs (49.97 μs .. 50.07 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.09 μs (50.04 μs .. 50.17 μs) std dev 206.7 ns (133.8 ns .. 295.3 ns) benchmarking keys database/getAssociatedKey from 1000 (hit) time 211.2 μs (210.5 μs .. 212.3 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 211.0 μs (210.7 μs .. 212.0 μs) std dev 1.685 μs (334.4 ns .. 3.517 μs) benchmarking keys database/getAssociatedKey from 1000 (miss) time 173.5 μs (172.7 μs .. 174.2 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 173.7 μs (173.0 μs .. 175.5 μs) std dev 3.833 μs (1.858 μs .. 6.617 μs) variance introduced by outliers: 16% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (hit) time 64.01 μs (63.84 μs .. 64.18 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 64.85 μs (64.34 μs .. 66.02 μs) std dev 2.433 μs (547.6 ns .. 4.652 μs) variance introduced by outliers: 40% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (miss) time 50.33 μs (50.28 μs .. 50.39 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.32 μs (50.26 μs .. 50.38 μs) std dev 202.7 ns (167.6 ns .. 252.0 ns) benchmarking keys database/getAssociatedKey from 10000 (hit) time 1.142 ms (1.139 ms .. 1.146 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.142 ms (1.140 ms .. 1.144 ms) std dev 7.142 μs (4.994 μs .. 10.98 μs) benchmarking keys database/getAssociatedKey from 10000 (miss) time 1.094 ms (1.092 ms .. 1.096 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.095 ms (1.095 ms .. 1.097 ms) std dev 4.277 μs (2.591 μs .. 7.228 μs)
2016-01-12 17:01:44 +00:00
# ADDON COMMANDS
In addition to all the commands listed above, more commands can be added to
git-annex by dropping commands named like "git-annex-foo" into a directory
in the PATH.
# CONFIGURATION
2010-10-19 19:59:40 +00:00
2010-11-28 21:54:42 +00:00
Like other git commands, git-annex is configured via `.git/config`.
2020-12-17 16:17:58 +00:00
These settings, as well as relevant git config settings, are
the ones git-annex uses.
(Some of these settings can also be set, across all clones of the
repository, using [[git-annex-config]]. See its man page for a list.)
2010-10-19 19:59:40 +00:00
2010-12-10 21:30:13 +00:00
* `annex.uuid`
A unique UUID for this repository (automatically set).
* `annex.backend`
2010-12-10 21:30:13 +00:00
Name of the default key-value backend to use when adding new files
to the repository. See [[git-annex-backends]](1) for information about
available backends.
2010-12-10 21:30:13 +00:00
This is overridden by annex annex.backend configuration in the
.gitattributes files, and by the --backend option.
(This used to be named `annex.backends`, and that will still be used
if set.)
* `annex.securehashesonly`
Set to true to indicate that the repository should only use
2019-05-04 15:45:55 +00:00
cryptographically secure hashes (SHA2, SHA3) and not insecure
hashes (MD5, SHA1) for content.
When this is set, the contents of files using cryptographically
insecure hashes will not be allowed to be added to the repository.
2019-05-04 15:45:55 +00:00
Also, `git-annex fsck` will complain about any files present in
the repository that use insecure hashes. And,
`git-annex import --no-content` will refuse to import files
from special remotes using insecure hashes.
2019-05-04 15:45:55 +00:00
To configure the behavior in new clones of the repository,
2019-09-18 16:34:40 +00:00
this can be set using [[git-annex-config]].
* `annex.maxextensionlength`
Maximum length, in bytes, of what is considered a filename extension.
This is used when adding a file to a backend that preserves filename extensions,
and also when generating a view branch.
The default length is 4, which allows extensions like "jpeg". The dot before
the extension is not counted part of its length. At most two extensions
at the end of a filename will be preserved, e.g. .gz or .tar.gz .
2011-10-14 22:23:17 +00:00
* `annex.diskreserve`
Amount of disk space to reserve. Disk space is checked when transferring
annexed content to avoid running out, and additional free space can be
reserved via this option, to make space for other data (such as git
commit logs). Can be specified with any commonly used units, for
example, "0.5 gb", "500M", or "100 KiloBytes"
2011-10-14 22:23:17 +00:00
The default reserve is 100 megabytes.
2011-10-14 22:23:17 +00:00
* `annex.skipunknown`
Set to true to make commands like "git-annex get" silently skip over
items that are listed in the command line, but are not checked into git.
Set to false to make it an error for commands like "git-annex get"
to be asked to operate on files that are not checked into git.
(This is the default in recent versions of git-annex.)
Note that, when annex.skipunknown is false, a command like "git-annex get
." will fail if no files in the current directory are checked into git.
Commands like "git-annex get foo/" will fail if no files in the directory
are checked into git, but if at least one file is, it will ignore other
files that are not. This is all the same as the behavior of "git-ls files
--error-unmatch".
Also note that git-annex skips files that are checked into git, but are
not annexed files; this setting does not affect that.
* `annex.largefiles`
2016-02-02 20:50:58 +00:00
Used to configure which files are large enough to be added to the annex.
It is an expression that matches the large files, eg
"`include=*.mp3 or largerthan=500kb`"
See [[git-annex-matching-expression]](1) for details on the syntax.
2016-02-02 20:50:58 +00:00
Overrides any annex.largefiles attributes in `.gitattributes` files.
To configure a default annex.largefiles for all clones of the repository,
this can be set in [[git-annex-config]](1).
2016-02-02 20:50:58 +00:00
This configures the behavior of both git-annex and git when adding
files to the repository. By default, `git-annex add` adds all files
to the annex (except dotfiles), and `git add` adds files to git
(unless they were added to the annex previously).
When annex.largefiles is configured, both
`git annex add` and `git add` will add matching large files to the
annex, and the other files to git.
Other git-annex commands also honor annex.largefiles, including
`git annex import`, `git annex addurl`, `git annex importfeed`,
`git-annex assist`, and the `git-annex assistant`.
* `annex.dotfiles`
Normally, dotfiles are assumed to be files like .gitignore,
whose content should always be part of the git repository, so
they will not be added to the annex. Setting annex.dotfiles to true
makes dotfiles be added to the annex the same as any other file.
To annex only some dotfiles, set this and configure annex.largefiles
to match the ones you want. For example, to match only dotfiles ending
in ".big"
git config annex.largefiles "(include=.*.big or include=*/.*.big) or (exclude=.* and exclude=*/.*)"
git config annex.dotfiles true
To configure a default annex.dotfiles for all clones of the repository,
this can be set in [[git-annex-config]](1).
* `annex.gitaddtoannex`
2019-12-20 14:35:44 +00:00
Setting this to false will prevent `git add` from adding
files to the annex, despite the annex.largefiles configuration.
* `annex.addsmallfiles`
Controls whether small files (not matching annex.largefiles)
should be checked into git by `git annex add`. Defaults to true;
set to false to instead make small files be skipped.
* `annex.addunlocked`
Commands like `git-annex add` default to adding files to the repository
in locked form. This can make them add the files in unlocked form,
the same as if [[git-annex-unlock]](1) were run on the files.
This can be set to "true" to add everything unlocked, or it can be a more
complicated expression that matches files by name, size, or content. See
[[git-annex-matching-expression]](1) for details.
To configure a default annex.addunlocked for all clones of the repository,
this can be set in [[git-annex-config]](1).
(Using `git add` always adds files in unlocked form and it is not
affected by this setting.)
When a repository has core.symlinks set to false, or has an adjusted
unlocked branch checked out, this setting is ignored, and files are
always added to the repository in unlocked form.
* `annex.numcopies`
This is a deprecated setting. You should instead use the
`git annex numcopies` command to configure how many copies of files
are kept across all repositories, or the annex.numcopies .gitattributes
setting.
This config setting is only looked at when `git annex numcopies` has
never been configured, and when there's no annex.numcopies setting in the
.gitattributes file.
* `annex.genmetadata`
Set this to `true` to make git-annex automatically generate some metadata
when adding files to the repository.
In particular, it stores year, month, and day metadata, from the file's
modification date.
When importfeed is used, it stores additional metadata from the feed,
such as the author, title, etc.
2015-05-14 19:44:08 +00:00
* `annex.used-refspec`
This controls which refs `git-annex unused` considers to be used.
See REFSPEC FORMAT in [[git-annex-unused]](1) for details.
2019-09-18 16:27:10 +00:00
* `annex.jobs`
Configure the number of concurrent jobs to run. Default is 1.
Only git-annex commands that support the --jobs option will
use this.
Setting this to "cpus" will run one job per CPU core.
When the `--batch` option is used, this configuration is ignored.
* `annex.adjustedbranchrefresh`
When [[git-annex-adjust]](1) is used to set up an adjusted branch
that needs to be refreshed after getting or dropping files, this config
controls how frequently the branch is refreshed.
Refreshing the branch takes some time, so doing it after every file
can be too slow. (It also can generate a lot of dangling git objects.)
The default value is 0 (or false), which does not
refresh the branch. Setting 1 (or true) will refresh only once,
after git-annex has made other changes. Setting 2 refreshes after every
file, 3 after every other file, and so on; setting 100 refreshes after
every 99 files.
(If git-annex gets faster in the future, refresh rates will increase
proportional to the speed improvements.)
* `annex.queuesize`
git-annex builds a queue of git commands, in order to combine similar
commands for speed. By default the size of the queue is limited to
10240 commands; this can be used to change the size. If you have plenty
of memory and are working with very large numbers of files, increasing
the queue size can speed it up.
* `annex.bloomcapacity`
The `git annex unused` and `git annex sync --content` commands use
a bloom filter to determine what files are present in eg, the work tree.
The default bloom filter is sized to handle
up to 500000 files. If your repository is larger than that,
you should increase this value. Larger values will
make `git-annex unused` and `git annex sync --content` consume more memory;
2013-11-07 16:45:59 +00:00
run `git annex info` for memory usage numbers.
* `annex.bloomaccuracy`
Adjusts the accuracy of the bloom filter used by
`git annex unused` and `git annex sync --content`.
The default accuracy is 10000000 -- 1 unused file out of 10000000
will be missed by `git annex unused`. Increasing the accuracy will make
`git annex unused` consume more memory; run `git annex info`
for memory usage numbers.
2012-01-20 21:13:36 +00:00
* `annex.sshcaching`
By default, git-annex caches ssh connections using ssh's
ControlMaster and ControlPersist settings
(if built using a new enough ssh). To disable this, set to `false`.
2012-01-20 21:13:36 +00:00
* `annex.adviceNoSshCaching`
When git-annex is unable to use ssh connection caching, or has been
configured not to, and concurrency is enabled, it will warn that
this might result in multiple ssh processes prompting for passwords
at the same time. To disable that warning, eg if you have configured ssh
connection caching yourself, or have ssh agent caching passwords,
set this to `false`.
* `annex.alwayscommit`
By default, git-annex automatically commits data to the git-annex branch
2014-07-14 18:37:14 +00:00
after each command is run. If you have a series
of commands that you want to make a single commit, you can
2014-07-14 18:37:14 +00:00
run the commands with `-c annex.alwayscommit=false`. You can later
commit the data by running `git annex merge` (or by automatic merges)
or `git annex sync`.
* `annex.commitmessage`
When git-annex updates the git-annex branch, it usually makes up
its own commit message (eg "update"), since users rarely look at or
care about changes to that branch. If you do care, you can
specify this setting by running commands with
`-c annex.commitmessage=whatever`
This works well in combination with annex.alwayscommit=false,
to gather up a set of changes and commit them with a message you specify.
* `annex.alwayscompact`
By default, git-annex compacts data it records in the git-annex branch.
Setting this to false avoids doing that compaction in some cases, which
can speed up operations that populate the git-annex branch with a lot
of data. However, when used with operations that overwrite old values in
the git-annex branch, that may cause the git-annex branch to use more disk
space, and so slow down reading data from it.
An example of a command that can be sped up by using
`-c annex.alwayscompact=false` is `git-annex registerurl --batch`,
when adding a large number of urls to the same key.
This option was first supported by git-annex version 10.20220724.
It is not entirely safe to set this option in a repository that may also
be used by an older version of git-annex at the same time as a version
that supports this option.
* `annex.allowsign`
By default git-annex avoids gpg signing commits that it makes when
they're not the purpose of a command, but only a side effect.
That default avoids lots of gpg password prompts when
commit.gpgSign is set. A command like `git annex sync` or `git annex merge`
will gpg sign its commit, but a command like `git annex get`,
that updates the git-annex branch, will not. The assistant also avoids
signing commits.
Setting annex.allowsign to true lets all commits be signed, as
controlled by commit.gpgSign and other git configuration.
* `annex.merge-annex-branches`
By default, git-annex branches that have been pulled from remotes
are automatically merged into the local git-annex branch, so that
git-annex has the most up-to-date possible knowledge.
To avoid that merging, set this to "false".
This can be useful particularly when you don't have write permission
to the repository. While git-annex is mostly able to work in a read-only
repository with unmerged git-annex branches, some things do not work,
and when it does work it will be slower due to needing to look at each of
the unmerged branches.
* `annex.private`
2021-04-21 21:01:03 +00:00
When this is set to true, no information about the repository will be
recorded in the git-annex branch.
For example, to make a repository without any mention of it ever
appearing in the git-annex branch:
git init myprivate
2021-04-21 21:01:03 +00:00
cd myprivaterepo
git config annex.private true
2021-04-21 21:01:03 +00:00
git annex init
* `annex.hardlink`
Set this to `true` to make file contents be hard linked between the
repository and its remotes when possible, instead of a more expensive copy.
Use with caution -- This can invalidate numcopies counting, since
with hard links, fewer copies of a file can exist. So, it is a good
idea to mark a repository using this setting as untrusted.
When a repository is set up using `git clone --shared`, git-annex init
will automatically set annex.hardlink and mark the repository as
untrusted.
When `annex.thin` is also set, setting `annex.hardlink` has no effect.
* `annex.thin`
Set this to `true` to make unlocked files be a hard link to their content
in the annex, rather than a second copy. This can save considerable
disk space, but when a modification is made to a file, you will lose the
local (and possibly only) copy of the old version. Any other, locked
files in the repository that pointed to that content will get broken
as well (`git-annex fsck` will detect and clean up after that).
So, enable this with care.
After setting (or unsetting) this, you should run `git annex fix` to
fix up the annexed files in the work tree to be hard links (or copies).
Note that this has no effect when the filesystem does not support hard links.
And when multiple files in the work tree have the same content, only
one of them gets hard linked to the annex.
* `annex.supportunlocked`
By default git-annex supports unlocked files as well as locked files,
so this defaults to true. If set to false, git-annex will only support
locked files. That will avoid doing the work needed to support unlocked
files.
Note that setting this to false does not prevent a repository from
having unlocked files added to it, and in that case the content of the
files will not be accessible until they are locked.
After changing this config, you need to re-run `git-annex init` for it
to take effect.
* `annex.resolvemerge`
Set to false to prevent merge conflicts in the checked out branch
being automatically resolved by the `git-annex assitant`,
`git-annex assist`, `git-annex sync`, `git-annex pull`, `git-annex merge`,
and the git-annex post-receive hook.
To configure the behavior in all clones of the repository,
2019-09-18 16:34:40 +00:00
this can be set in [[git-annex-config]](1).
* `annex.synccontent`
Set to true to make `git-annex sync` default to transferring
annexed content.
Set to false to prevent `git-annex assist`, `git-annex pull` and
`git-annex push` from transferring annexed content.
sync --only-annex and annex.synconlyannex * Added sync --only-annex, which syncs the git-annex branch and annexed content but leaves managing the other git branches up to you. * Added annex.synconlyannex git config setting, which can also be set with git-annex config to configure sync in all clones of the repo. Use case is then the user has their own git workflow, and wants to use git-annex without disrupting that, so they sync --only-annex to get the git-annex stuff in sync in addition to their usual git workflow. When annex.synconlyannex is set, --not-only-annex can be used to override it. It's not entirely clear what --only-annex --commit or --only-annex --push should do, and I left that combination not documented because I don't know if I might want to change the current behavior, which is that such options do not override the --only-annex. My gut feeling is that there is no good reasons to use such combinations; if you want to use your own git workflow, you'll be doing your own committing and pulling and pushing. A subtle question is, how should import/export special remotes be handled? Importing updates their remote tracking branch and merges it into master. If --only-annex prevented that git branch stuff, then it would prevent exporting to the special remote, in the case where it has changes that were not imported yet, because there would be a unresolved conflict. I decided that it's best to treat the fact that there's a remote tracking branch for import/export as an implementation detail in this case. The more important thing is that an import/export special remote is entirely annexed content, and so it makes a lot of sense that --only-annex will still sync with it.
2020-02-17 19:19:58 +00:00
To configure the behavior in all clones of the repository,
this can be set in [[git-annex-config]](1).
* `annex.synconlyannex`
Set to true to make `git-annex assist`, `git-annex sync`,
`git-annex pull`, and `git-annex push` default to only operating
on the git-annex branch and annexed content.
To configure the behavior in all clones of the repository,
2019-09-18 16:34:40 +00:00
this can be set in [[git-annex-config]](1).
* `annex.syncmigrations`
Set to false to prevent `git-annex sync` and `git-annex pull`
from scanning for migrations and updating the local
repository for those migrations.
* `annex.viewunsetdirectory`
This configures the name of a directory that is used in a view to contain
files that do not have metadata set. The default name for the directory
is `"_"`. See [[git-annex-view]](1) for details.
* `annex.debug`
Set to true to enable debug logging by default.
* `annex.debugfilter`
Set to configure which debug messages to display (when debug message
display has been enabled by annex.debug or --debug). The value is one
or more module names, separated by commas.
* `annex.version`
The current version of the git-annex repository. This is
maintained by git-annex and should never be manually changed.
* `annex.autoupgraderepository`
When an old git-annex repository version is no longer supported,
git-annex will normally automatically upgrade the repository to
the new version. It may also sometimes upgrade from an old repository
version that is still supported but that is not as good as a later
version.
If this is set to false, git-annex won't automatically upgrade the
repository. If the repository version is not supported, git-annex
will instead exit with an error message. If it is still supported,
git-annex will continue to work.
You can run `git annex upgrade` yourself when you are ready to upgrade the
repository.
* `annex.crippledfilesystem`
Set to true if the repository is on a crippled filesystem, such as FAT,
which does not support symbolic links, or hard links, or unix permissions.
This is automatically probed by "git annex init".
* `annex.pidlock`
Normally, git-annex uses fine-grained lock files to allow multiple
processes to run concurrently without getting in each others' way.
That works great, unless you are using git-annex on a filesystem that
does not support POSIX fcntl locks. This is sometimes the case when
using NFS or Lustre filesystems.
To support such situations, you can set annex.pidlock to true, and it
will fall back to a single top-level pid file lock.
Although, often, you'd really be better off fixing your networked
filesystem configuration to support POSIX locks.. And, some networked
filesystems are so inconsistent that one node can't reliably tell when
the other node is holding a pid lock. Caveat emptor.
* `annex.pidlocktimeout`
git-annex will wait up to this many seconds for the pid lock
file to go away, and will then abort if it cannot continue. Default: 300
When using pid lock files, it's possible for a stale lock file to get
left behind by previous run of git-annex that crashed or was interrupted.
This is mostly avoided, but can occur especially when using a network
file system. This timeout prevents git-annex waiting forever in such a
situation.
* `annex.dbdir`
The directory where git-annex should store its sqlite databases.
The default location is inside `.git/annex/`.
Certian filesystems, such as cifs, may not support locking operations
that sqlite needs, and setting this to a directory on another filesystem
can work around such a problem.
This can safely be set to the same directory in the configuration of
multiple repositories; each repository will use a subdirectory for its
sqlite database.
* `annex.cachecreds`
When "true" (the default), git-annex will cache credentials used to
access special remotes in files in .git/annex/creds/
that only you can read. To disable that caching, set to "false",
and credentials will only be read from the environment, or if
they have been embedded in encrypted form in the git repository, will
be extracted and decrypted each time git-annex needs to access the
remote.
* `annex.secure-erase-command`
This can be set to a command that should be run whenever git-annex
removes the content of a file from the repository.
In the command line, %file is replaced with the file that should be
erased.
For example, to use the wipe command, set it to `wipe -f %file`.
* `annex.freezecontent-command`, `annex.thawcontent-command`
Usually the write permission bits are unset to protect annexed objects
from being modified or deleted. The freezecontent-command is run after
git-annex has removed (or attempted to remove) the write bit, and can
be used to prevent writing in some other way.
The thawcontent-command should undo its effect, and is run before
git-annex restores the write bit.
In the command line, %path is replaced with the file or directory to
operate on.
(When annex.crippledfilesystem is set, git-annex will not try to
remove/restore the write bit, but it will still run these hooks.)
* `annex.tune.objecthash1`, `annex.tune.objecthashlower`, `annex.tune.branchhash1`
These can be passed to `git annex init` to tune the repository.
They cannot be safely changed in a running repository and should never be
set in global git configuration.
For details, see <https://git-annex.branchable.com/tuning/>.
# CONFIGURATION OF REMOTES
Remotes are configured using these settings in `.git/config`.
2010-12-10 21:30:13 +00:00
* `remote.<name>.annex-cost`
When determining which repository to
2010-10-09 18:06:25 +00:00
transfer annexed files from or to, ones with lower costs are preferred.
2010-10-13 00:26:02 +00:00
The default cost is 100 for local repositories, and 200 for remote
2010-11-01 04:26:47 +00:00
repositories.
2010-12-10 21:30:13 +00:00
* `remote.<name>.annex-cost-command`
If set, the command is run, and the number it outputs is used as the cost.
This allows varying the cost based on e.g., the current network.
* `remote.<name>.annex-start-command`
A command to run when git-annex begins to use the remote. This can
be used to, for example, mount the directory containing the remote.
2012-03-16 20:03:04 +00:00
The command may be run repeatedly when multiple git-annex processes
are running concurrently.
* `remote.<name>.annex-stop-command`
A command to run when git-annex is done using the remote.
The command will only be run once *all* running git-annex processes
are finished using the remote.
* `remote.<name>.annex-shell`
Specify an alternative git-annex-shell executable on the remote
instead of looking for "git-annex-shell" on the PATH.
This is useful if the git-annex-shell program is outside the PATH
or has a non-standard name.
2010-12-10 21:30:13 +00:00
* `remote.<name>.annex-ignore`
If set to `true`, prevents git-annex from storing or retrieving annexed
file contents on this remote by default.
(You can still request it be used with the `--from` and `--to` options.)
2011-03-03 19:59:16 +00:00
This is, for example, useful if the remote is located somewhere
2011-04-09 19:57:45 +00:00
without git-annex-shell. (For example, if it's on GitHub).
2011-03-03 19:59:16 +00:00
Or, it could be used if the network connection between two
repositories is too slow to be used normally.
2010-12-10 21:30:13 +00:00
This does not prevent `git-annex sync`, `git-annex pull`, `git-annex push`,
`git-annex assist` or the `git-annex assistant` from operating on the
git repository. It only affects annexed content.
* `remote.<name>.annex-ignore-command`
If set, the command is run, and if it exits nonzero, that's the same
as setting annex-ignore to true. This allows controlling behavior based
on e.g., the current network.
* `remote.<name>.annex-sync`
If set to `false`, prevents `git-annex sync` (and `git-annex pull`,
`git-annex push`, `git-annex assist`, and the `git-annex assistant`)
from operating on this remote by default.
* `remote.<name>.annex-sync-command`
If set, the command is run, and if it exits nonzero, that's the same
as setting annex-sync to false. This allows controlling behavior based
on e.g., the current network.
* `remote.<name>.annex-pull`
If set to `false`, prevents `git-annex pull`, `git-annex sync`,
`git-annex assist` and the `git-annex assistant` from ever pulling
(or fetching) from the remote.
* `remote.<name>.annex-push`
If set to `false`, prevents `git-annex push`, `git-annex sync`,
`git-annex assist` and the `git-annex assistant` from ever pushing
to the remote.
* `remote.<name>.annex-readonly`
If set to `true`, prevents git-annex from making changes to a remote.
This prevents `git-annex sync` and `git-annex assist` from pushing
changes to a git repository. And it prevents storing or removing
files from read-only remote.
* `remote.<name>.annex-verify`, `annex.verify`
By default, git-annex will verify the checksums of objects downloaded
from remotes. If you trust a remote and don't want the overhead
of these checksums, you can set this to `false`.
Note that even when this is set to `false`, git-annex does verification
in some edge cases, where it's likely the case than an
object was downloaded incorrectly, or when needed for security.
* `remote.<name>.annex-tracking-branch`
2019-03-09 17:10:30 +00:00
This is for use with special remotes that support exports and imports.
When set to eg, "master", this tells git-annex that you want the
special remote to track that branch.
When set to eg, "master:subdir", the special remote tracks only
the subdirectory of that branch.
2023-09-22 16:48:35 +00:00
Setting this enables some other commands to work with these special
remotes: `git-annex pull` will import changes from the remote and merge them into
the annex-tracking-branch. And `git-annex push` will export changes to
the remote. Higher-level commands `git-annex sync --content`
and `git-annex assist` both import and export.
* `remote.<name>.annex-export-tracking`
Deprecated name for `remote.<name>.annex-tracking-branch`. Will still be used
if it's configured and `remote.<name>.annex-tracking-branch` is not.
* `remote.<name>.annexUrl`
Can be used to specify a different url than the regular `remote.<name>.url`
for git-annex to use when talking with the remote. Similar to the `pushUrl`
used by git-push.
2010-12-10 21:30:13 +00:00
* `remote.<name>.annex-uuid`
2011-03-29 02:05:11 +00:00
git-annex caches UUIDs of remote repositories here.
2010-12-10 21:30:13 +00:00
* `remote.<name>.annex-config-uuid`
Used for some special remotes, points to a different special remote
configuration to use.
2018-03-24 14:37:25 +00:00
* `remote.<name>.annex-retry`, `annex.retry`
Number of times a transfer that fails can be retried. (default 0)
* `remote.<name>.annex-forward-retry`, `annex.forward-retry`
If a transfer made some forward progress before failing,
this allows it to be retried even when `annex.retry` does not.
The value is the maximum number of times to do that. (default 5)
When both `annex.retry` and this are set, the maximum number of
retries is the larger of the two.
2018-03-24 14:37:25 +00:00
* `remote.<name>.annex-retry-delay`, `annex.retry-delay`
Number of seconds to delay before the first retry of a transfer.
When making multiple retries of the same transfer, the delay
doubles after each retry. (default 1)
* `remote.<name>.annex-bwlimit`, `annex.bwlimit`
This can be used to limit how much bandwidth is used for a transfer
from or to a remote.
For example, to limit transfers to 1 mebibyte per second:
`git config annex.bwlimit "1MiB"`
This will work with many remotes, including git remotes, but not
for remotes where the transfer is run by a separate program than
git-annex.
* `remote.<name>.annex-stalldetecton`, `annex.stalldetection`
Configuring this lets stalled or too-slow transfers be detected, and
dealt with, so rather than getting stuck, git-annex will cancel the
stalled operation. The transfer will be considered to have failed, so
settings like annex.retry will control what it does next.
By default, git-annex detects transfers that have probably stalled,
and suggests configuring this. If it is incorrectly detecting
stalls, setting this to "false" will avoid that.
Set to "true" to enable automatic stall detection. If a remote does not
update its progress consistently, no automatic stall detection will be
done. And it may take a while for git-annex to decide a remote is really
stalled when using automatic stall detection, since it needs to be
conservative about what looks like a stall.
For more fine control over what constitutes a stall, set to a value in
the form "$amount/$timeperiod" to specify how much data git-annex should
expect to see flowing, minimum, over a given period of time.
For example, to detect outright stalls where no data has been transferred
after 30 seconds: `git config annex.stalldetection "1KB/30s"`
Or, if you have a remote on a USB drive that is normally capable of
several megabytes per second, but has bad sectors where it gets
stuck for a long time, you could use:
`git config remote.usbdrive.annex-stalldetection "1MB/1m"`
2020-12-10 15:25:02 +00:00
This is not enabled by default, because it can make git-annex use
more resources. To be able to cancel stalls, git-annex has to run
transfers in separate processes (one per concurrent job). So it
may need to open more connections to a remote than usual, or
the communication with those processes may make it a bit slower.
2018-03-08 16:54:56 +00:00
* `remote.<name>.annex-checkuuid`
This only affects remotes that have their url pointing to a directory on
the same system. git-annex normally checks the uuid of such
remotes each time it's run, which lets it transparently deal with
different drives being mounted to the location at different times.
Setting annex-checkuuid to false will prevent it from checking the uuid
at startup (although the uuid is still verified before making any
changes to the remote repository). This may be useful to set to prevent
unnecessary spin-up or automounting of a drive.
* `remote.<name>.annex-trustlevel`
Configures a local trust level for the remote. This overrides the value
configured by the trust and untrust commands. The value can be any of
"trusted", "semitrusted" or "untrusted".
* `remote.<name>.annex-availability`
This configuration setting is no longer used.
* `remote.<name>.annex-speculate-present`
2018-09-12 19:31:03 +00:00
Set to "true" to make git-annex speculate that this remote may contain the
content of any file, even though its normal location tracking does not
indicate that it does. This will cause git-annex to try to get all file
contents from the remote. Can be useful in setting up a caching remote.
2021-04-21 21:01:03 +00:00
* `remote.<name>.annex-private`
When this is set to true, no information about the remote will be
recorded in the git-annex branch. This is mostly useful for special
remotes, and is set when using [[git-annex-initremote]](1) with the
`--private` option.
* `remote.<name>.annex-bare`
Can be used to tell git-annex if a remote is a bare repository
or not. Normally, git-annex determines this automatically.
2010-12-10 21:30:13 +00:00
* `remote.<name>.annex-ssh-options`
2011-03-29 02:05:11 +00:00
Options to use when using ssh to talk to this remote.
2010-12-10 21:30:13 +00:00
* `remote.<name>.annex-rsync-options`
Options to use when using rsync
to or from this remote. For example, to force IPv6, and limit
the bandwidth to 100Kbyte/s, set it to `-6 --bwlimit 100`
2010-12-10 21:30:13 +00:00
2017-05-09 18:02:48 +00:00
Note that git-annex-shell has a whitelist of allowed rsync options,
and others will not be be passed to the remote rsync. So using some
options may break the communication between the local and remote rsyncs.
* `remote.<name>.annex-rsync-upload-options`
Options to use when using rsync to upload a file to a remote.
These options are passed after other applicable rsync options,
so can be used to override them. For example, to limit upload bandwidth
to 10Kbyte/s, set `--bwlimit 10`.
* `remote.<name>.annex-rsync-download-options`
Options to use when using rsync to download a file from a remote.
These options are passed after other applicable rsync options,
so can be used to override them.
* `remote.<name>.annex-rsync-transport`
The remote shell to use to connect to the rsync remote. Possible
values are `ssh` (the default) and `rsh`, together with their
arguments, for instance `ssh -p 2222 -c blowfish`; Note that the
remote hostname should not appear there, see rsync(1) for details.
When the transport used is `ssh`, connections are automatically cached
unless `annex.sshcaching` is unset.
2011-04-08 18:56:57 +00:00
* `remote.<name>.annex-bup-split-options`
2010-12-10 21:30:13 +00:00
2011-04-08 18:56:57 +00:00
Options to pass to bup split when storing content in this remote.
For example, to limit the bandwidth to 100Kbyte/s, set it to `--bwlimit 100k`
2011-04-08 18:56:57 +00:00
(There is no corresponding option for bup join.)
* `remote.<name>.annex-gnupg-options`
2011-04-08 18:56:57 +00:00
Options to pass to GnuPG when it's encrypting data. For instance, to
use the AES cipher with a 256 bits key and disable compression, set it
to `--cipher-algo AES256 --compress-algo none`. (These options take
precedence over the default GnuPG configuration, which is otherwise
used.)
* `remote.<name>.annex-gnupg-decrypt-options`
Options to pass to GnuPG when it's decrypting data. (These options take
precedence over the default GnuPG configuration, which is otherwise
used.)
* `annex.ssh-options`, `annex.rsync-options`,
`annex.rsync-upload-options`, `annex.rsync-download-options`,
`annex.bup-split-options`, `annex.gnupg-options`,
`annex.gnupg-decrypt-options`
Default options to use if a remote does not have more specific options
as described above.
2010-12-10 21:30:13 +00:00
* `remote.<name>.annex-rsyncurl`
Used by rsync special remotes, this configures
the location of the rsync repository to use. Normally this is automatically
set up by `git annex initremote`, but you can change it if needed.
* `remote.<name>.annex-buprepo`
Used by bup special remotes, this configures
the location of the bup repository to use. Normally this is automatically
set up by `git annex initremote`, but you can change it if needed.
* `remote.<name>.annex-borgrepo`
Used by borg special remotes, this configures
the location of the borg repository to use. Normally this is automatically
set up by `git annex initremote`, but you can change it if needed.
* `remote.<name>.annex-ddarrepo`
Used by ddar special remotes, this configures
the location of the ddar repository to use. Normally this is automatically
set up by `git annex initremote`, but you can change it if needed.
* `remote.<name>.annex-directory`
Used by directory special remotes, this configures
the location of the directory where annexed files are stored for this
remote. Normally this is automatically set up by `git annex initremote`,
but you can change it if needed.
* `remote.<name>.annex-adb`
Used to identify remotes on Android devices accessed via adb.
Normally this is automatically set up by `git annex initremote`.
* `remote.<name>.annex-androiddirectory`
Used by adb special remotes, this is the directory on the Android
device where files are stored for this remote. Normally this is
automatically set up by `git annex initremote`, but you can change
it if needed.
* `remote.<name>.annex-androidserial`
Used by adb special remotes, this is the serial number of the Android
device used by the remote. Normally this is automatically set up by
`git annex initremote`, but you can change it if needed, eg when
upgrading to a new Android device.
* `remote.<name>.annex-s3`
Used to identify Amazon S3 special remotes.
Normally this is automatically set up by `git annex initremote`.
* `remote.<name>.annex-glacier`
Used to identify Amazon Glacier special remotes.
Normally this is automatically set up by `git annex initremote`.
* `remote.<name>.annex-web`
Used to identify web special remotes.
Normally this is automatically set up by `git annex initremote`.
* `remote.<name>.annex-webdav`
Used to identify webdav special remotes.
Normally this is automatically set up by `git annex initremote`.
* `remote.<name>.annex-tahoe`
Used to identify tahoe special remotes.
Points to the configuration directory for tahoe.
* `remote.<name>.annex-gcrypt`
Used to identify gcrypt special remotes.
Normally this is automatically set up by `git annex initremote`.
It is set to "true" if this is a gcrypt remote.
If the gcrypt remote is accessible over ssh and has git-annex-shell
available to manage it, it's set to "shell".
* `remote.<name>.annex-git-lfs`
Used to identify git-lfs special remotes.
Normally this is automatically set up by `git annex initremote`.
It is set to "true" if this is a git-lfs remote.
* `remote.<name>.annex-httpalso`
Used to identify httpalso special remotes.
Normally this is automatically set up by `git annex initremote`.
* `remote.<name>.annex-externaltype`
Used external special remotes to record the type of the remote.
Eg, if this is set to "foo", git-annex will run a "git-annex-remote-foo"
program to communicate with the external special remote.
If this is set to "readonly", then git-annex will not run any external
special remote program, but will try to access things stored in the
remote using http. That only works for some external special remotes,
so consult the documentation of the one you are using.
* `remote.<name>.annex-hooktype`
Used by hook special remotes to record the type of the remote.
* `annex.web-options`
Options to pass to curl when git-annex uses it to download urls
(rather than the default built-in url downloader).
For example, to force IPv4 only, set it to "-4".
Setting this option makes git-annex use curl, but only
when annex.security.allowed-ip-addresses is configured in a
specific way. See its documentation.
Setting this option prevents git-annex from using git-credential
for prompting for http passwords. Instead, you can include "--netrc"
to make curl use your ~/.netrc file and record the passwords there.
* `annex.youtube-dl-options`
default to yt-dlp and fix progress parsing bugs I noticed git-annex was using a lot of CPU when downloading from youtube, and was not displaying progress. Turns out that yt-dlp (and I think also youtube-dl) sometimes only knows an estimated size, not the actual size, and displays the progress output slightly differently for that. That broke the parser. And, the parser was feeding chunks that failed to parse back as a remainder, which caused it to try to re-parse the entire output each time, so it got slower and slower. Using --progress-template like this should avoid parsing problems as well as future proof against output changes. But it will work with only yt-dlp. So, this seemed like the right time to deprecate youtube-dl, and default to yt-dlp when available. git-annex will still use youtube-dl if that's all that's available. However, since the progress parser for youtube-dl was buggy, and I don't want to maintain two different progress parsers (especially since youtube-dl is no longer in debian unstable having been replaced by yt-dlp), made git-annex no longer try to parse youtube-dl's progress. Also, updated docs for yt-dlp being default. It did not seem worth renaming annex.youtube-dl-options and annex.youtube-dl-command. Note that yt-dlp does not seem to document the fields available in the progress template. I found them by reading the source and looking at the templates it uses internally. Also note that the use of "i" (rather than "s") in progressTemplate makes it display floats rounded to integers; particularly the estimated total size can be a float. That also does not seem to be documented but I assume is a python thing? Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 16:45:16 +00:00
Options to pass to yt-dlp (or deprecated youtube-dl) when using it to
find the url to download for a video.
default to yt-dlp and fix progress parsing bugs I noticed git-annex was using a lot of CPU when downloading from youtube, and was not displaying progress. Turns out that yt-dlp (and I think also youtube-dl) sometimes only knows an estimated size, not the actual size, and displays the progress output slightly differently for that. That broke the parser. And, the parser was feeding chunks that failed to parse back as a remainder, which caused it to try to re-parse the entire output each time, so it got slower and slower. Using --progress-template like this should avoid parsing problems as well as future proof against output changes. But it will work with only yt-dlp. So, this seemed like the right time to deprecate youtube-dl, and default to yt-dlp when available. git-annex will still use youtube-dl if that's all that's available. However, since the progress parser for youtube-dl was buggy, and I don't want to maintain two different progress parsers (especially since youtube-dl is no longer in debian unstable having been replaced by yt-dlp), made git-annex no longer try to parse youtube-dl's progress. Also, updated docs for yt-dlp being default. It did not seem worth renaming annex.youtube-dl-options and annex.youtube-dl-command. Note that yt-dlp does not seem to document the fields available in the progress template. I found them by reading the source and looking at the templates it uses internally. Also note that the use of "i" (rather than "s") in progressTemplate makes it display floats rounded to integers; particularly the estimated total size can be a float. That also does not seem to be documented but I assume is a python thing? Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 16:45:16 +00:00
Some options may break git-annex's integration with yt-dlp. For
example, the --output option could cause it to store files somewhere
default to yt-dlp and fix progress parsing bugs I noticed git-annex was using a lot of CPU when downloading from youtube, and was not displaying progress. Turns out that yt-dlp (and I think also youtube-dl) sometimes only knows an estimated size, not the actual size, and displays the progress output slightly differently for that. That broke the parser. And, the parser was feeding chunks that failed to parse back as a remainder, which caused it to try to re-parse the entire output each time, so it got slower and slower. Using --progress-template like this should avoid parsing problems as well as future proof against output changes. But it will work with only yt-dlp. So, this seemed like the right time to deprecate youtube-dl, and default to yt-dlp when available. git-annex will still use youtube-dl if that's all that's available. However, since the progress parser for youtube-dl was buggy, and I don't want to maintain two different progress parsers (especially since youtube-dl is no longer in debian unstable having been replaced by yt-dlp), made git-annex no longer try to parse youtube-dl's progress. Also, updated docs for yt-dlp being default. It did not seem worth renaming annex.youtube-dl-options and annex.youtube-dl-command. Note that yt-dlp does not seem to document the fields available in the progress template. I found them by reading the source and looking at the templates it uses internally. Also note that the use of "i" (rather than "s") in progressTemplate makes it display floats rounded to integers; particularly the estimated total size can be a float. That also does not seem to be documented but I assume is a python thing? Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 16:45:16 +00:00
git-annex won't find them. Avoid setting here or in the yt-dlp config
file any options that cause it to download more than one file,
or to store the file anywhere other than the current working directory.
* `annex.youtube-dl-command`
default to yt-dlp and fix progress parsing bugs I noticed git-annex was using a lot of CPU when downloading from youtube, and was not displaying progress. Turns out that yt-dlp (and I think also youtube-dl) sometimes only knows an estimated size, not the actual size, and displays the progress output slightly differently for that. That broke the parser. And, the parser was feeding chunks that failed to parse back as a remainder, which caused it to try to re-parse the entire output each time, so it got slower and slower. Using --progress-template like this should avoid parsing problems as well as future proof against output changes. But it will work with only yt-dlp. So, this seemed like the right time to deprecate youtube-dl, and default to yt-dlp when available. git-annex will still use youtube-dl if that's all that's available. However, since the progress parser for youtube-dl was buggy, and I don't want to maintain two different progress parsers (especially since youtube-dl is no longer in debian unstable having been replaced by yt-dlp), made git-annex no longer try to parse youtube-dl's progress. Also, updated docs for yt-dlp being default. It did not seem worth renaming annex.youtube-dl-options and annex.youtube-dl-command. Note that yt-dlp does not seem to document the fields available in the progress template. I found them by reading the source and looking at the templates it uses internally. Also note that the use of "i" (rather than "s") in progressTemplate makes it display floats rounded to integers; particularly the estimated total size can be a float. That also does not seem to be documented but I assume is a python thing? Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 16:45:16 +00:00
Default is to use "yt-dlp" or if that is not available in the PATH,
to use "youtube-dl".
* `annex.aria-torrent-options`
Options to pass to aria2c when using it to download a torrent.
* `annex.http-headers`
HTTP headers to send when downloading from the web. Multiple lines of
this option can be set, one per header.
* `annex.http-headers-command`
If set, the command is run and each line of its output is used as a HTTP
header. This overrides annex.http-headers.
limit url downloads to whitelisted schemes Security fix! Allowing any schemes, particularly file: and possibly others like scp: allowed file exfiltration by anyone who had write access to the git repository, since they could add an annexed file using such an url, or using an url that redirected to such an url, and wait for the victim to get it into their repository and send them a copy. * Added annex.security.allowed-url-schemes setting, which defaults to only allowing http and https URLs. Note especially that file:/ is no longer enabled by default. * Removed annex.web-download-command, since its interface does not allow supporting annex.security.allowed-url-schemes across redirects. If you used this setting, you may want to instead use annex.web-options to pass options to curl. With annex.web-download-command removed, nearly all url accesses in git-annex are made via Utility.Url via http-client or curl. http-client only supports http and https, so no problem there. (Disabling one and not the other is not implemented.) Used curl --proto to limit the allowed url schemes. Note that this will cause git annex fsck --from web to mark files using a disallowed url scheme as not being present in the web. That seems acceptable; fsck --from web also does that when a web server is not available. youtube-dl already disabled file: itself (probably for similar reasons). The scheme check was also added to youtube-dl urls for completeness, although that check won't catch any redirects it might follow. But youtube-dl goes off and does its own thing with other protocols anyway, so that's fine. Special remotes that support other domain-specific url schemes are not affected by this change. In the bittorrent remote, aria2c can still download magnet: links. The download of the .torrent file is otherwise now limited by annex.security.allowed-url-schemes. This does not address any external special remotes that might download an url themselves. Current thinking is all external special remotes will need to be audited for this problem, although many of them will use http libraries that only support http and not curl's menagarie. The related problem of accessing private localhost and LAN urls is not addressed by this commit. This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-15 20:52:24 +00:00
* `annex.security.allowed-url-schemes`
limit url downloads to whitelisted schemes Security fix! Allowing any schemes, particularly file: and possibly others like scp: allowed file exfiltration by anyone who had write access to the git repository, since they could add an annexed file using such an url, or using an url that redirected to such an url, and wait for the victim to get it into their repository and send them a copy. * Added annex.security.allowed-url-schemes setting, which defaults to only allowing http and https URLs. Note especially that file:/ is no longer enabled by default. * Removed annex.web-download-command, since its interface does not allow supporting annex.security.allowed-url-schemes across redirects. If you used this setting, you may want to instead use annex.web-options to pass options to curl. With annex.web-download-command removed, nearly all url accesses in git-annex are made via Utility.Url via http-client or curl. http-client only supports http and https, so no problem there. (Disabling one and not the other is not implemented.) Used curl --proto to limit the allowed url schemes. Note that this will cause git annex fsck --from web to mark files using a disallowed url scheme as not being present in the web. That seems acceptable; fsck --from web also does that when a web server is not available. youtube-dl already disabled file: itself (probably for similar reasons). The scheme check was also added to youtube-dl urls for completeness, although that check won't catch any redirects it might follow. But youtube-dl goes off and does its own thing with other protocols anyway, so that's fine. Special remotes that support other domain-specific url schemes are not affected by this change. In the bittorrent remote, aria2c can still download magnet: links. The download of the .torrent file is otherwise now limited by annex.security.allowed-url-schemes. This does not address any external special remotes that might download an url themselves. Current thinking is all external special remotes will need to be audited for this problem, although many of them will use http libraries that only support http and not curl's menagarie. The related problem of accessing private localhost and LAN urls is not addressed by this commit. This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-15 20:52:24 +00:00
List of URL schemes that git-annex is allowed to download content from.
The default is "http https ftp".
limit url downloads to whitelisted schemes Security fix! Allowing any schemes, particularly file: and possibly others like scp: allowed file exfiltration by anyone who had write access to the git repository, since they could add an annexed file using such an url, or using an url that redirected to such an url, and wait for the victim to get it into their repository and send them a copy. * Added annex.security.allowed-url-schemes setting, which defaults to only allowing http and https URLs. Note especially that file:/ is no longer enabled by default. * Removed annex.web-download-command, since its interface does not allow supporting annex.security.allowed-url-schemes across redirects. If you used this setting, you may want to instead use annex.web-options to pass options to curl. With annex.web-download-command removed, nearly all url accesses in git-annex are made via Utility.Url via http-client or curl. http-client only supports http and https, so no problem there. (Disabling one and not the other is not implemented.) Used curl --proto to limit the allowed url schemes. Note that this will cause git annex fsck --from web to mark files using a disallowed url scheme as not being present in the web. That seems acceptable; fsck --from web also does that when a web server is not available. youtube-dl already disabled file: itself (probably for similar reasons). The scheme check was also added to youtube-dl urls for completeness, although that check won't catch any redirects it might follow. But youtube-dl goes off and does its own thing with other protocols anyway, so that's fine. Special remotes that support other domain-specific url schemes are not affected by this change. In the bittorrent remote, aria2c can still download magnet: links. The download of the .torrent file is otherwise now limited by annex.security.allowed-url-schemes. This does not address any external special remotes that might download an url themselves. Current thinking is all external special remotes will need to be audited for this problem, although many of them will use http libraries that only support http and not curl's menagarie. The related problem of accessing private localhost and LAN urls is not addressed by this commit. This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-15 20:52:24 +00:00
Think very carefully before changing this; there are security
implications. For example, if it's changed to allow "file" URLs, then
anyone who can get a commit into your git-annex repository could
`git-annex addurl` a pointer to a private file located outside that
repository, possibly causing it to be copied into your repository
and transferred on to other remotes, exposing its content.
limit url downloads to whitelisted schemes Security fix! Allowing any schemes, particularly file: and possibly others like scp: allowed file exfiltration by anyone who had write access to the git repository, since they could add an annexed file using such an url, or using an url that redirected to such an url, and wait for the victim to get it into their repository and send them a copy. * Added annex.security.allowed-url-schemes setting, which defaults to only allowing http and https URLs. Note especially that file:/ is no longer enabled by default. * Removed annex.web-download-command, since its interface does not allow supporting annex.security.allowed-url-schemes across redirects. If you used this setting, you may want to instead use annex.web-options to pass options to curl. With annex.web-download-command removed, nearly all url accesses in git-annex are made via Utility.Url via http-client or curl. http-client only supports http and https, so no problem there. (Disabling one and not the other is not implemented.) Used curl --proto to limit the allowed url schemes. Note that this will cause git annex fsck --from web to mark files using a disallowed url scheme as not being present in the web. That seems acceptable; fsck --from web also does that when a web server is not available. youtube-dl already disabled file: itself (probably for similar reasons). The scheme check was also added to youtube-dl urls for completeness, although that check won't catch any redirects it might follow. But youtube-dl goes off and does its own thing with other protocols anyway, so that's fine. Special remotes that support other domain-specific url schemes are not affected by this change. In the bittorrent remote, aria2c can still download magnet: links. The download of the .torrent file is otherwise now limited by annex.security.allowed-url-schemes. This does not address any external special remotes that might download an url themselves. Current thinking is all external special remotes will need to be audited for this problem, although many of them will use http libraries that only support http and not curl's menagarie. The related problem of accessing private localhost and LAN urls is not addressed by this commit. This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-15 20:52:24 +00:00
Any url schemes supported by curl can be listed here, but you will
also need to configure annex.security.allowed-ip-addresses to allow
using curl.
limit url downloads to whitelisted schemes Security fix! Allowing any schemes, particularly file: and possibly others like scp: allowed file exfiltration by anyone who had write access to the git repository, since they could add an annexed file using such an url, or using an url that redirected to such an url, and wait for the victim to get it into their repository and send them a copy. * Added annex.security.allowed-url-schemes setting, which defaults to only allowing http and https URLs. Note especially that file:/ is no longer enabled by default. * Removed annex.web-download-command, since its interface does not allow supporting annex.security.allowed-url-schemes across redirects. If you used this setting, you may want to instead use annex.web-options to pass options to curl. With annex.web-download-command removed, nearly all url accesses in git-annex are made via Utility.Url via http-client or curl. http-client only supports http and https, so no problem there. (Disabling one and not the other is not implemented.) Used curl --proto to limit the allowed url schemes. Note that this will cause git annex fsck --from web to mark files using a disallowed url scheme as not being present in the web. That seems acceptable; fsck --from web also does that when a web server is not available. youtube-dl already disabled file: itself (probably for similar reasons). The scheme check was also added to youtube-dl urls for completeness, although that check won't catch any redirects it might follow. But youtube-dl goes off and does its own thing with other protocols anyway, so that's fine. Special remotes that support other domain-specific url schemes are not affected by this change. In the bittorrent remote, aria2c can still download magnet: links. The download of the .torrent file is otherwise now limited by annex.security.allowed-url-schemes. This does not address any external special remotes that might download an url themselves. Current thinking is all external special remotes will need to be audited for this problem, although many of them will use http libraries that only support http and not curl's menagarie. The related problem of accessing private localhost and LAN urls is not addressed by this commit. This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-15 20:52:24 +00:00
Some special remotes support their own domain-specific URL
schemes; those are not affected by this configuration setting.
* `annex.security.allowed-ip-addresses`
By default, git-annex only makes connections to public IP addresses;
it will refuse to use HTTP and other servers on localhost or on a
private network.
This setting can override that behavior, allowing access to particular
IP addresses that would normally be blocked. For example "127.0.0.1 ::1"
allows access to localhost (both IPV4 and IPV6).
To allow access to all IP addresses, use "all"
Think very carefully before changing this; there are security
implications. Anyone who can get a commit into your git-annex repository
could `git annex addurl` an url on a private server, possibly
2018-06-18 19:57:13 +00:00
causing it to be downloaded into your repository and transferred to
other remotes, exposing its content.
default to yt-dlp and fix progress parsing bugs I noticed git-annex was using a lot of CPU when downloading from youtube, and was not displaying progress. Turns out that yt-dlp (and I think also youtube-dl) sometimes only knows an estimated size, not the actual size, and displays the progress output slightly differently for that. That broke the parser. And, the parser was feeding chunks that failed to parse back as a remainder, which caused it to try to re-parse the entire output each time, so it got slower and slower. Using --progress-template like this should avoid parsing problems as well as future proof against output changes. But it will work with only yt-dlp. So, this seemed like the right time to deprecate youtube-dl, and default to yt-dlp when available. git-annex will still use youtube-dl if that's all that's available. However, since the progress parser for youtube-dl was buggy, and I don't want to maintain two different progress parsers (especially since youtube-dl is no longer in debian unstable having been replaced by yt-dlp), made git-annex no longer try to parse youtube-dl's progress. Also, updated docs for yt-dlp being default. It did not seem worth renaming annex.youtube-dl-options and annex.youtube-dl-command. Note that yt-dlp does not seem to document the fields available in the progress template. I found them by reading the source and looking at the templates it uses internally. Also note that the use of "i" (rather than "s") in progressTemplate makes it display floats rounded to integers; particularly the estimated total size can be a float. That also does not seem to be documented but I assume is a python thing? Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 16:45:16 +00:00
Note that, since the interfaces of curl and yt-dlp do not allow
these IP address restrictions to be enforced, curl and yt-dlp will
never be used unless annex.security.allowed-ip-addresses=all.
To allow accessing local or private IP addresses on only specific ports,
use the syntax "[addr]:port". For example,
"[127.0.0.1]:80 [127.0.0.1]:443 [::1]:80 [::1]:443" allows
localhost on the http ports only.
* `annex.security.allowed-http-addresses`
Old name for annex.security.allowed-ip-addresses.
If set, this is treated the same as having
annex.security.allowed-ip-addresses set.
* `annex.security.allow-unverified-downloads`
For security reasons, git-annex refuses to download content from
most special remotes when it cannot check a hash to verify
that the correct content was downloaded. This particularly impacts
downloading the content of URL or WORM keys, which lack hashes.
The best way to avoid problems due to this is to migrate files
away from such keys, before their content reaches a special remote.
See [[git-annex-migrate]](1).
When the content is only available from a special remote, you can
use this configuration to force git-annex to download it.
But you do so at your own risk, and it's very important you read and
understand the information below first!
Downloading unverified content from encrypted special remotes is
prevented, because the special remote could send some other encrypted
content than what you expect, causing git-annex to decrypt data that you
never checked into git-annex, and risking exposing the decrypted
data to any non-encrypted remotes you send content to.
Downloading unverified content from (non-encrypted)
external special remotes is prevented, because they could follow
http redirects to web servers on localhost or on a private network,
or in some cases to a file:/// url.
If you decide to bypass this security check, the best thing to do is
to only set it temporarily while running the command that gets the file.
The value to set the config to is "ACKTHPPT".
For example:
git -c annex.security.allow-unverified-downloads=ACKTHPPT annex get myfile
It would be a good idea to check that it downloaded the file you expected,
too.
* `remote.<name>.annex-security-allow-unverified-downloads`
Per-remote configuration of annex.security.allow-unverified-downloads.
# CONFIGURATION OF ASSISTANT
2011-04-09 19:57:45 +00:00
* `annex.delayadd`
Makes the watch and assistant commands delay for the specified number of
seconds before adding a newly created file to the annex. Normally this
is not needed, because they already wait for all writers of the file
to close it.
2011-04-09 19:57:45 +00:00
2021-05-19 15:13:53 +00:00
Note that this only delays adding files created while the daemon is
running. Changes made when it is not running will be added immediately
the next time it is started up.
2021-05-19 15:13:53 +00:00
* `annex.expireunused`
2011-03-28 06:12:05 +00:00
Controls what the assistant does about unused file contents
that are stored in the repository.
The default is `false`, which causes
all old and unused file contents to be retained, unless the assistant
is able to move them to some other repository (such as a backup repository).
Can be set to a time specification, like "7d" or "1m", and then
file contents that have been known to be unused for a week or a
month will be deleted.
2012-11-15 19:42:07 +00:00
* `annex.fscknudge`
2012-11-15 19:42:07 +00:00
When set to false, prevents the webapp from reminding you when using
repositories that lack consistency checks.
* `annex.autoupgrade`
When set to ask (the default), the webapp will check for new versions
and prompt if they should be upgraded to. When set to true, automatically
upgrades without prompting (on some supported platforms). When set to
false, disables any upgrade checking.
Note that upgrade checking is only done when git-annex is installed
from one of the prebuilt images from its website. This does not
bypass e.g., a Linux distribution's own upgrade handling code.
This setting also controls whether to restart the git-annex assistant
when the git-annex binary is detected to have changed. That is useful
no matter how you installed git-annex.
* `annex.autocommit`
Set to false to prevent the `git-annex assistant`, `git-annex assist`,
and `git-annex sync` from automatically committing changes to files in
the repository.
To configure the behavior in all clones of the repository,
2019-09-18 16:34:40 +00:00
this can be set in [[git-annex-config]](1).
* `annex.startupscan`
Set to false to prevent the git-annex assistant from scanning the
repository for new and changed files on startup. This will prevent it
from noticing changes that were made while it was not running, but can be
a useful performance tweak for a large repository.
* `annex.listen`
Configures which address the webapp listens on. The default is localhost.
Can be either an IP address, or a hostname that resolves to the desired
address.
2010-12-10 21:30:13 +00:00
# CONFIGURATION VIA .gitattributes
2010-10-09 18:06:25 +00:00
The key-value backend used when adding a new file to the annex can be
configured on a per-file-type basis via `.gitattributes` files. In the file,
the `annex.backend` attribute can be set to the name of the backend to
use. (See [[git-annex-backends]](1) for information about
available backends.)
For example, this here's how to use the WORM backend by default,
but the SHA256E backend for ogg files:
2010-11-01 18:49:05 +00:00
* annex.backend=WORM
*.ogg annex.backend=SHA256E
2010-11-01 18:49:05 +00:00
There is a annex.largefiles attribute, which is used to configure which
files are large enough to be added to the annex. Since attributes cannot
contain spaces, it is difficult to use for more complex annex.largefiles
settings. Setting annex.largefiles in [[git-annex-config]](1) is an easier
way to configure it across all clones of the repository.
See [[git-annex-matching-expression]](1) for details on the syntax.
The numcopies and mincopies settings can also be configured on a
per-file-type basis via the `annex.numcopies` and `annex.mincopies`
attributes in `.gitattributes` files. This overrides other settings.
For example, this makes two copies be needed for wav files and 3 copies
for flac files:
2010-11-28 22:55:49 +00:00
*.wav annex.numcopies=2
*.flac annex.numcopies=3
These settings are honored by git-annex whenever it's operating on a
matching file. However, when using --all, --unused, or --key to specify
keys to operate on, git-annex is operating on keys and not files, so will
not honor the settings from .gitattributes. For this reason, the `git annex
numcopies` and `git annex mincopies` commands are useful to configure a
global default.
Also note that when using views, only the toplevel .gitattributes file is
preserved in the view, so other settings in other files won't have any
effect.
# EXIT STATUS
git-annex itself will exit 0 on success and 1 on failure, unless
the `--size-limit` or `--time-limit` option is hit, in
which case it exits 101.
A few git-annex subcommands have other exit statuses used to indicate
specific problems, which are documented on their individual man pages.
# ENVIRONMENT
These environment variables are used by git-annex when set:
* `GIT_WORK_TREE`, `GIT_DIR`
Handled the same as they are by git, see git(1)
* `GIT_SSH`, `GIT_SSH_COMMAND`
Handled similarly to the same as described in git(1).
The one difference is that git-annex will sometimes pass an additional
"-n" parameter to these, as the first parameter, to prevent ssh from
reading from stdin. Since that can break existing uses of these
environment variables that don't expect the extra parameter, you will
need to set `GIT_ANNEX_USE_GIT_SSH=1` to make git-annex support
these.
Note that setting either of these environment variables prevents
git-annex from automatically enabling ssh connection caching
(see `annex.sshcaching`), so it will slow down some operations with
remotes over ssh. It's up to you to enable ssh connection caching
if you need it; see ssh's documentation.
Also, `annex.ssh-options` and `remote.<name>.annex-ssh-options`
won't have any effect when these envionment variables are set.
Usually it's better to configure any desired options through your
~/.ssh/config file, or by setting `annex.ssh-options`.
* `GIT_ANNEX_VECTOR_CLOCK`
Normally git-annex timestamps lines in the log files committed to the
git-annex branch. Setting this environment variable to a number
deal better with clock skew situations, using vector clocks * Deal with clock skew, both forwards and backwards, when logging information to the git-annex branch. * GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1) rather than needing to be advanced each time a new change is made. * Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex. When changing a file in the git-annex branch, the vector clock to use is now determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK when set), and comparing it to the newest vector clock already in use in that file. If a newer time stamp was already in use, advance it forward by a second instead. When the clock is set to a time in the past, this avoids logging with an old timestamp, which would risk that log line later being ignored in favor of "newer" line that is really not newer. When a log entry has been made with a clock that was set far ahead in the future, this avoids newer information being logged with an older timestamp and so being ignored in favor of that future-timestamped information. Once all clocks get fixed, this will result in the vector clocks being incremented, until finally enough time has passed that time gets back ahead of the vector clock value, and then it will return to usual operation. (This latter situation is not ideal, but it seems the best that can be done. The issue with it is, since all writers will be incrementing the last vector clock they saw, there's no way to tell when one writer made a write significantly later in time than another, so the earlier write might arbitrarily be picked when merging. This problem is why git-annex uses timestamps in the first place, rather than pure vector clocks.) Advancing forward by 1 second is somewhat arbitrary. setDead advances a timestamp by just 1 picosecond, and the vector clock could too. But then it would interfere with setDead, which wants to be overrulled by any change. So it could use 2 picoseconds or something, but that seems weird. It could just as well advance it forward by a minute or whatever, but then it would be harder for real time to catch up with the vector clock when forward clock slew had happened. A complication is that many log files contain several different peices of information, and it may be best to only use vector clocks for the same peice of information. For example, a key's location log file contains InfoPresent/InfoMissing for each UUID, and it only looks at the vector clocks for the UUID that is being changed, and not other UUIDs. Although exactly where the dividing line is can be hard to determine. Consider metadata logs, where a field "tag" can have multiple values set at different times. Should it advance forward past the last tag? Probably. What about when a different field is set, should it look at the clocks of other fields? Perhaps not, but currently it does, and this does not seems like it will cause any problems. Another one I'm not entirely sure about is the export log, which is keyed by (fromuuid, touuid). So if multiple repos are exporting to the same remote, different vector clocks can be used for that remote. It looks like that's probably ok, because it does not try to determine what order things occurred when there was an export conflict. Sponsored-by: Jochen Bartl on Patreon
2021-08-03 20:45:20 +00:00
will make git-annex use that (or a larger number)
rather than the current number of seconds since the UNIX epoch.
Note that decimal seconds are supported.
This is only provided for advanced users who either have a better way to
tell which commit is current than the local clock, or who need to avoid
deal better with clock skew situations, using vector clocks * Deal with clock skew, both forwards and backwards, when logging information to the git-annex branch. * GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1) rather than needing to be advanced each time a new change is made. * Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex. When changing a file in the git-annex branch, the vector clock to use is now determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK when set), and comparing it to the newest vector clock already in use in that file. If a newer time stamp was already in use, advance it forward by a second instead. When the clock is set to a time in the past, this avoids logging with an old timestamp, which would risk that log line later being ignored in favor of "newer" line that is really not newer. When a log entry has been made with a clock that was set far ahead in the future, this avoids newer information being logged with an older timestamp and so being ignored in favor of that future-timestamped information. Once all clocks get fixed, this will result in the vector clocks being incremented, until finally enough time has passed that time gets back ahead of the vector clock value, and then it will return to usual operation. (This latter situation is not ideal, but it seems the best that can be done. The issue with it is, since all writers will be incrementing the last vector clock they saw, there's no way to tell when one writer made a write significantly later in time than another, so the earlier write might arbitrarily be picked when merging. This problem is why git-annex uses timestamps in the first place, rather than pure vector clocks.) Advancing forward by 1 second is somewhat arbitrary. setDead advances a timestamp by just 1 picosecond, and the vector clock could too. But then it would interfere with setDead, which wants to be overrulled by any change. So it could use 2 picoseconds or something, but that seems weird. It could just as well advance it forward by a minute or whatever, but then it would be harder for real time to catch up with the vector clock when forward clock slew had happened. A complication is that many log files contain several different peices of information, and it may be best to only use vector clocks for the same peice of information. For example, a key's location log file contains InfoPresent/InfoMissing for each UUID, and it only looks at the vector clocks for the UUID that is being changed, and not other UUIDs. Although exactly where the dividing line is can be hard to determine. Consider metadata logs, where a field "tag" can have multiple values set at different times. Should it advance forward past the last tag? Probably. What about when a different field is set, should it look at the clocks of other fields? Perhaps not, but currently it does, and this does not seems like it will cause any problems. Another one I'm not entirely sure about is the export log, which is keyed by (fromuuid, touuid). So if multiple repos are exporting to the same remote, different vector clocks can be used for that remote. It looks like that's probably ok, because it does not try to determine what order things occurred when there was an export conflict. Sponsored-by: Jochen Bartl on Patreon
2021-08-03 20:45:20 +00:00
embedding timestamps for policy reasons.
* Some special remotes use additional environment variables
for authentication etc. For example, `AWS_ACCESS_KEY_ID`
and `GIT_ANNEX_P2P_AUTHTOKEN`. See special remote documentation.
2010-10-19 19:59:40 +00:00
# FILES
These files are used by git-annex:
2010-10-19 19:59:40 +00:00
`.git/annex/objects/` in your git repository contains the annexed file
contents that are currently available. Annexed files in your git
repository symlink to that content.
`.git/annex/` in your git repository contains other run-time information
used by git-annex.
`~/.config/git-annex/autostart` is a list of git repositories
to start the git-annex assistant in.
2010-10-19 19:59:40 +00:00
2014-04-01 00:15:01 +00:00
`.git/hooks/pre-commit-annex` in your git repository will be run whenever
a commit is made to the HEAD branch, either by git commit, git-annex
sync, or the git-annex assistant.
`.git/hooks/post-update-annex` in your git repository will be run
whenever the git-annex branch is updated. You can make this hook run
`git update-server-info` when publishing a git-annex repository by http.
2010-10-27 18:40:50 +00:00
# SEE ALSO
More git-annex documentation is available on its web site,
2016-02-02 20:50:58 +00:00
<https://git-annex.branchable.com/>
2010-10-27 18:40:50 +00:00
If git-annex is installed from a package, a copy of its documentation
should be included, in, for example, `/usr/share/doc/git-annex/`.
2010-10-27 18:40:50 +00:00
2010-10-19 19:59:40 +00:00
# AUTHOR
Joey Hess <id@joeyh.name>
2010-10-16 19:58:42 +00:00
2016-02-02 20:50:58 +00:00
<https://git-annex.branchable.com/>
2010-10-19 20:17:29 +00:00
Warning: Automatically converted into a man page by mdwn2man. Edit with care.