Commit graph

77 commits

Author SHA1 Message Date
Joey Hess
152be2948b
use transfer stages for copy --from
See commit e04a931439 for an explanation
of why move uses transfer stages for --from, but command stages for
--to. At the point of that commit, copy was actually already using
command stages for everything, so the commit was incorrect about
improving copy --to.

But, the same reasoning about --from applies to copy as to move; when
verification is not done incrementally, download and verification are
the main two stages. The cleanup stage for copy is even less work than
for move (it doesn't drop from the remote).

Sponsored-by: Dartmouth College's DANDI project
2023-01-24 14:07:49 -04:00
Joey Hess
579d9b60c1
improve concurrency of move/copy --from --to
Use separate stages for download and upload. In the common case where
it downloads the file from one remote and then uploads to the other,
those are by far the most expensive operations, and there's a decent
chance the two remotes bottleneck on different resources.

Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads
will be started both downloading from the src remote. They will probably
finish at the same time. Then two threads will be started uploading to
the dst remote. They will probably take the same time as well. Before
this change, it would alternate back and forth, bottlenecking on src and dst.
With this change, as soon as the two threads start uploading to dst, two
more threads are able to start, downloading from src. So bandwidth to
both remotes is saturated more often.

Other commands that use transferStages only send in one direction at a
time. So the worker threads for the other direction will sit idle, and
there will be no change in their behavior.

Sponsored-by: Dartmouth College's DANDI project
2023-01-24 13:59:39 -04:00
Joey Hess
a6c1d9752b
move/copy: option parsing for --from with --to
Allowing --from and --to as an alternative to --from or --to
is hard to do with optparse-applicative!

The obvious approach of (pfrom <|> pto <|> pfromandto) does not work
when pfromandto uses the same option names as pfrom and pto do.
It compiles but the generated parser does not work for all desired
combinations.

Instead, have to parse optionally from and optionally to. When neither
is provided, the parser succeeds, but it's a result that can't be
handled. So, have to giveup after option parsing. There does not seem to
be a way to make an optparse-applicative Parser give up internally
either.

Also, need seek' because I first tried making fto be a where binding,
but that resulted in a hang when git-annex move was run without --from
or --to. I think because startConcurrency was not expecting the stages
value to contain an exception and so ended up blocking.

Sponsored-by: Dartmouth College's DANDI project
2023-01-18 14:42:39 -04:00
Joey Hess
be19a68276
new matching options --want-get-by and --want-drop-by
Sponsored-by: Graham Spencer on Patreon
2022-07-28 13:26:03 -04:00
Joey Hess
b223988e22
remove --backend from global options
--backend is no longer a global option, and is only accepted by commands
that actually need it.

Three commands that used to support backend but don't any longer are
watch, webapp, and assistant. It would be possible to make them support it,
but I doubt anyone used the option with these. And in the case of webapp
and assistant, the option was handled inconsistently, only taking affect
when the command is run with an existing git-annex repo, not when it
creates a new one.

Also, renamed GlobalOption etc to AnnexOption. Because there are many
options of this type that are not actually global (any more) and get
added to commands that need them.

Sponsored-by: Kevin Mueller on Patreon
2022-06-29 13:33:25 -04:00
Joey Hess
835c50966a
reject batch options combined with non-batch options
Reject combinations of --batch (or --batch-keys) with options like --all or
--key or with filenames.

Most commands ignored the non-batch items when batch mode was enabled.

For some reason, addurl and dropkey both processed first the specified
non-batch items, followed by entering batch mode. Changed them to also
error out, for consistency.

Sponsored-by: Dartmouth College's Datalad project
2022-01-26 13:00:19 -04:00
Joey Hess
ab7b5a492c
--batch-keys
New --batch-keys option added to these commands:  get, drop, move, copy, whereis

git-annex-matching-options had to be reworded since some of its options
can be used to match on keys, not only files.

Sponsored-by: Luke Shumaker on Patreon
2021-08-25 14:21:12 -04:00
Joey Hess
eb42cd4d46
more RawFilePath conversion
535/645

This commit was sponsored by Brett Eisenberg on Patreon.
2020-11-03 10:11:04 -04:00
Joey Hess
3a05d53761
add SeekInput (not yet used)
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.

Over the course of 2 grueling days.

withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.

Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
2020-09-15 15:41:13 -04:00
Joey Hess
d732ef1a89
move, copy: Sped up seeking for annexed files to operate on by a factor of nearly 2x. 2020-07-24 12:56:02 -04:00
Joey Hess
00865cdae8
Fix a bug in find --branch in the previous version
inAnnex check was lost for that code path. To avoid more such mistakes,
made withKeyOptions check it when the AnnexedFileSeeker specifies.
2020-07-24 12:05:28 -04:00
Joey Hess
1be92381ec
unify batch mode with non-batch by using AnnexedFileSeeker 2020-07-22 14:23:28 -04:00
Joey Hess
75aab72d23
mostly done with location log precaching
Some nice wins.
2020-07-13 17:04:02 -04:00
Joey Hess
88a7fb5cbb
convert all applicable commands to new 2x faster annexed file seeking
This removes all calls to inAnnex, except for some involving --batch.
It may be that the batch code could get a similar speedup, but I don't
know if people habitually pass a huge number of files through --batch
that git-annex does not need to do anything to process, so I skipped it
for now.

A few calls to ifAnnexed remain, and might be worth doing more to
convert. In particular, Command.Sync has one that would probably speed
it up by a good amount.

(also removed some dead code from Command.Lock)
2020-07-10 15:45:38 -04:00
Joey Hess
89b2542d3c
annex.skipunknown with transition plan
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.

Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.

annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
2020-05-28 15:55:17 -04:00
Joey Hess
b88f89c1ef
get the most commonly used commands building again
A quick benchmark of whereis shows not much speed improvement, maybe a
few percent. Profiling it found a hotspot, adds to todo.
2019-12-04 13:45:18 -04:00
Joey Hess
53882ab4a7
make WorkerStage an open type
Rather than limiting it to PerformStage and CleanupStage, this opens it
up so any number of stages can be added as needed by commands.

Each concurrent command has a set of stages that it uses, and only
transitions between those can block waiting for a free slot in the
worker pool. Calling enteringStage for some other stage does not block,
and has very little overhead.

Note that while before the Annex state was duplicated on the first call
to commandAction, this now happens earlier, in startConcurrency.
That means that seek stage actions should that use startConcurrency
and then modify Annex state won't modify the state of worker threads
they then start. I audited all of them, and only Command.Seek
did so; prepMerge changes the working directory and so has to come
before startConcurrency.

Also, the remote list is built before duplicating the state, which means
that it gets built earlier now than it used to. This would only have an
effect of making commands that end up not needing to perform any actions
unncessary build the remote list (only when they're run with concurrency
enable), but that's a minor overhead compared to commands seeking
through the work tree and determining they don't need to do anything.
2019-06-19 13:05:03 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
53526136e8
move commandAction out of CmdLine.Seek
This is groundwork for nested seek loops, eg seeking over all files and
then performing commandActions on a list of remotes, which can be done
concurrently.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-10-01 14:12:06 -04:00
Joey Hess
1d1054faa6
added -z
Added -z option to git-annex commands that use --batch, useful for
supporting filenames containing newlines.

It only controls input to --batch, the output will still be line delimited
unless --json or etc is used to get some other output. While git often
makes -z affect both input and output, I don't like trying them together,
and making it affect output would have been a significant complication,
and also git-annex output is generally not intended to be machine parsed,
unless using --json or a format option.

Commands that take pairs like "file key" still separate them with a space
in --batch mode. All such commands take care to support filenames with
spaces when parsing that, so there was no need to change it, and it would
have needed significant changes to the batch machinery to separate tose
with a null.

To make fromkey and registerurl support -z, I had to give them a --batch
option. The implicit batch mode they enter when not provided with input
parameters does not support -z as that would have complicated option
parsing. Seemed better to move these toward using the same --batch as
everything else, though the implicit batch mode can still be used.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-09-20 16:11:47 -04:00
Joey Hess
12460fcea6
make --batch honor matching options
When --batch is used with matching options like --in, --metadata, etc, only
operate on the provided files when they match those options. Otherwise, a
blank line is output in the batch protocol.

Affected commands: find, add, whereis, drop, copy, move, get

In the case of find, the documentation for --batch already said it honored
the matching options. The docs for the rest didn't, but it makes sense to
have them honor them. While this is a behavior change, why specify the
matching options with --batch if you didn't want them to apply?

Note that the batch output for all of the affected commands could
already output a blank line in other cases, so batch users should
already be prepared to deal with it.

git-annex metadata didn't seem worth making support the matching options,
since all it does is output metadata or set metadata, the use cases for
using it in combination with the martching options seem small. Made it
refuse to run when they're combined, leaving open the possibility for later
support if a use case develops.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-08-08 12:07:06 -04:00
Joey Hess
af8546990d
move: --safe/--unsafe and potential drop race fix
move: Added --safe option, which makes move honor numcopies settings.
Also --unsafe enables the default behavior, anticipating that the
default may one day change.

This commit was sponsored by Ethan Aubin.
2018-04-09 16:20:10 -04:00
Joey Hess
ae530f043e
disentagle copy and move option parsing 2018-04-09 14:38:46 -04:00
Joey Hess
6583448bab
add --json-error-messages (not yet implemented)
Added --json-error-messages option, which includes error messages in the
json output, rather than outputting them to stderr.

The actual rediretion of errors is not implemented yet, this is only
the docs and option plumbing.

This commit was supported by the NSF-funded DataLad project.
2018-02-19 14:32:15 -04:00
Joey Hess
85ed38a574
Avoid repeated checking that files passed on the command line exist.
git annex add, git annex lock etc make multiple seek passes,
and each seek pass checked that files existed. That was unncessary
redundant work.

Fixed by adding a new WorkTreeItem type, make seek actions use it,
and check that the files exist when constructing it.

This commit was supported by the NSF-funded DataLad project.
2017-10-16 14:10:20 -04:00
Joey Hess
2eb6309d3e
move, copy: Support --batch. 2017-08-15 12:39:10 -04:00
Joey Hess
bb18026b2c
move --to=here
* move --to=here moves from all reachable remotes to the local repository.

The output of move --from remote is changed slightly, when the remote and
local both have the content. It used to say:
move foo ok
Now:
move foo (from theremote...) ok

That was done so that, when move --to=here is used and the content is
locally present and also in several remotes, it's clear which remotes the
content gets dropped from.

Note that move --to=here will report an error if a non-reachable remote
contains the file, even if the local repository also contains the file. I
think that's reasonable; the user may be intending to move all other copies
of the file from remotes.

OTOH, if a copy of the file is believed to be present in some repository
that is not a configured remote, move --to=here does not report an error.
So a little bit inconsistent, but erroring in this case feels wrong.

copy --to=here came along for free, but it's basically the same behavior as
git-annex get, and probably with not as good messages in edge cases
(especially on failure), so I've not documented it.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-05-31 17:00:18 -04:00
Joey Hess
5ee6912cf3
support parsing options like --to=here
Reworked remote name parsing to allow things like that. Command.Move
uses it for --to=here, although there's not yet an implementation of
that option.

This commit was sponsored by Ignacio on Patreon.
2017-05-31 16:49:28 -04:00
Joey Hess
c8e1e3dada
AssociatedFile newtype
To prevent any further mistakes like 301aff34c4

This commit was sponsored by Francois Marier on Patreon.
2017-03-10 13:35:31 -04:00
Joey Hess
3e22d60549
copy, move, mirror: Support --json and --json-progress. 2016-09-09 16:24:26 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
8ea594f565
missed adding allowConcurrentOutput here 2015-11-06 13:41:26 -04:00
Joey Hess
1ac79e6fe5 copy --auto was checking the wrong repo's preferred content. (--from was checking what --to should, and vice-versa.) Fixed this bug, which was introduced in version 5.20150727. 2015-10-06 17:29:44 -04:00
Joey Hess
b7a5d9c3e1 The last release accidentially removed a number of options from the copy command. (-J, file matching options, etc). These have been added back. 2015-07-30 13:33:35 -04:00
Joey Hess
a7f58634b8 wip 2015-07-09 16:05:45 -04:00
Joey Hess
8ad927dbc6 converted copy and move
Got a little tricky..
2015-07-09 15:23:14 -04:00
Joey Hess
6e5c1f8db3 convert all commands to work with optparse-applicative
Still no options though.
2015-07-08 15:08:02 -04:00
Joey Hess
a2ba701056 started converting to use optparse-applicative
This is a work in progress. It compiles and is able to do basic command
dispatch, including git autocorrection, while using optparse-applicative
for the core commandline parsing.

* Many commands are temporarily disabled before conversion.
* Options are not wired in yet.
* cmdnorepo actions don't work yet.

Also, removed the [Command] list, which was only used in one place.
2015-07-08 13:36:25 -04:00
Joey Hess
38c458b407 refactor 2015-04-30 14:02:56 -04:00
Joey Hess
cd6b62f35e --auto is no longer a global option; only get, drop, and copy accept it.
Not a behavior change unless you were passing it to a command that ignored it.
2015-03-25 17:06:14 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
59f88558d5 doh't use "def" for command definitions, it conflicts with Data.Default.def 2014-10-14 14:20:10 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
e880d0d22c replace (Key, Backend) with Key
Only fsck and reinject and the test suite used the Backend, and they can
look it up as needed from the Key. This simplifies the code and also speeds
it up.

There is a small behavior change here. Before, all commands would warn when
acting on an annexed file with an unknown backend. Now, only fsck and
reinject show that warning.
2014-04-17 18:03:39 -04:00
Joey Hess
86ffeb73d1 reorganize some files and imports 2014-01-26 16:25:55 -04:00
Joey Hess
3518c586cf fix transfers of key with no associated file
Several places assumed this would not happen, and when the AssociatedFile
was Nothing, did nothing.

As part of this, preferred content checks pass the Key around.

Note that checkMatcher is sometimes now called with Just Key and Just File.
It currently constructs a FileMatcher, ignoring the Key. However, if it
constructed a FileKeyMatcher, which contained both, then it might be
possible to speed up parts of Limit, which currently call the somewhat
expensive lookupFileKey to get the Key.

I have not made this optimisation yet, because I am not sure if the key is
always the same. Will need some significant checking to satisfy myself
that's the case..
2014-01-23 16:44:02 -04:00
Joey Hess
f7cdc40f7b reorg 2014-01-21 18:08:56 -04:00
Joey Hess
b40df4f0d0 reorganize numcopies code (no behavior changes)
Move stuff into Logs.NumCopies. Add a NumCopies newtype.

Better names for various serialization classes that are specific to one
thing or another.
2014-01-21 16:08:59 -04:00
Joey Hess
34c8af74ba fix inversion of control in CommandSeek (no behavior changes)
I've been disliking how the command seek actions were written for some
time, with their inversion of control and ugly workarounds.

The last straw to fix it was sync --content, which didn't fit the
Annex [CommandStart] interface well at all. I have not yet made it take
advantage of the changed interface though.

The crucial change, and probably why I didn't do it this way from the
beginning, is to make each CommandStart action be run with exceptions
caught, and if it fails, increment a failure counter in annex state.
So I finally remove the very first code I wrote for git-annex, which
was before I had exception handling in the Annex monad, and so ran outside
that monad, passing state explicitly as it ran each CommandStart action.

This was a real slog from 1 to 5 am.

Test suite passes.

Memory usage is lower than before, sometimes by a couple of megabytes, and
remains constant, even when running in a large repo, and even when
repeatedly failing and incrementing the error counter. So no accidental
laziness space leaks.

Wall clock speed is identical, even in large repos.

This commit was sponsored by an anonymous bitcoiner.
2014-01-20 04:57:36 -04:00
Joey Hess
0f921307e7 mirror: New command, makes two repositories contain the same set of files.
This is a simple approach for setting up a mirroring repository.

It will work with any type of remotes.

Mirror --from is more expensive than mirror --to in general.
OTOH, mirror --from will get the file from any remote that has it, not only
the named mirror remote. And if the named mirror remote is not the fastest
available remote with a file, that can speed things up.

It would be possible to make the assistant or watch command do a more
dynamic mirroring, that didn't need to scan every time.
2013-08-20 15:46:35 -04:00