git-annex/doc/git-annex-move.mdwn
Joey Hess 579d9b60c1
improve concurrency of move/copy --from --to
Use separate stages for download and upload. In the common case where
it downloads the file from one remote and then uploads to the other,
those are by far the most expensive operations, and there's a decent
chance the two remotes bottleneck on different resources.

Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads
will be started both downloading from the src remote. They will probably
finish at the same time. Then two threads will be started uploading to
the dst remote. They will probably take the same time as well. Before
this change, it would alternate back and forth, bottlenecking on src and dst.
With this change, as soon as the two threads start uploading to dst, two
more threads are able to start, downloading from src. So bandwidth to
both remotes is saturated more often.

Other commands that use transferStages only send in one direction at a
time. So the worker threads for the other direction will sit idle, and
there will be no change in their behavior.

Sponsored-by: Dartmouth College's DANDI project
2023-01-24 13:59:39 -04:00

141 lines
3.5 KiB
Markdown

# NAME
git-annex move - move content of files to/from another repository
# SYNOPSIS
git annex move `[path ...] [--from=remote|--to=remote|--to=here]`
# DESCRIPTION
Moves the content of files from or to another remote.
With no parameters, operates on all annexed files in the current directory.
Paths of files or directories to operate on can be specified.
# OPTIONS
* `--from=remote`
Move the content of files from the specified remote to the local repository.
* `--to=remote`
Move the content of files from the local repository to the specified remote.
* `--to=here`
Move the content of files from all reachable remotes to the local
repository.
* `--from=remote1 --to=remote2`
Move the content of files that are in remote1 to remote2. Does not change
what is stored in the local repository.
This is implemented by first downloading the content from remote1 to the
local repository (if not already present), then sending it to remote2, and
then deleting the content from the local repository (if it was not present
to start with).
* `--force`
Override numcopies and required content checking, and always remove
files from the source repository once the destination repository has a
copy.
Note that, even without this option, you can move the content of a file
from one repository to another when numcopies is not satisfied, as long
as the move does not result in there being fewer copies.
* `--jobs=N` `-JN`
Enables parallel transfers with up to the specified number of jobs
running at once. For example: `-J10`
Setting this to "cpus" will run one job per CPU core.
Note that when using --from with --to, twice this many jobs will
run at once, evenly split between the two remotes.
* `--all` `-A`
Rather than specifying a filename or path to move, this option can be
used to move all available versions of all files.
This is the default behavior when running git-annex in a bare repository.
* `--branch=ref`
Operate on files in the specified branch or treeish.
* `--unused`
Operate on files found by last run of git-annex unused.
* `--failed`
Operate on files that have recently failed to be transferred.
* `--key=keyname`
Use this option to move a specified key.
* matching options
The [[git-annex-matching-options]](1)
can be used to control what to move.
* `--batch`
Enables batch mode, in which lines containing names of files to move
are read from stdin.
As each specified file is processed, the usual progress output is
displayed. If a file's content does not need to be moved,
or it does not match specified matching options, or it
is not an annexed file, a blank line is output in response instead.
Since the usual output while moving a file is verbose and not
machine-parseable, you may want to use --json in combination with
--batch.
* `--batch-keys`
This is like `--batch` but the lines read from stdin are parsed as keys.
* `-z`
Makes batch input be delimited by nulls instead of the usual newlines.
* `--json`
Enable JSON output. This is intended to be parsed by programs that use
git-annex. Each line of output is a JSON object.
* `--json-progress`
Include progress objects in JSON output.
* `--json-error-messages`
Messages that would normally be output to standard error are included in
the json instead.
* Also the [[git-annex-common-options]](1) can be used.
# SEE ALSO
[[git-annex]](1)
[[git-annex-get]](1)
[[git-annex-copy]](1)
[[git-annex-drop]](1)
# AUTHOR
Joey Hess <id@joeyh.name>
Warning: Automatically converted into a man page by mdwn2man. Edit with care.