Use separate stages for download and upload. In the common case where
it downloads the file from one remote and then uploads to the other,
those are by far the most expensive operations, and there's a decent
chance the two remotes bottleneck on different resources.
Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads
will be started both downloading from the src remote. They will probably
finish at the same time. Then two threads will be started uploading to
the dst remote. They will probably take the same time as well. Before
this change, it would alternate back and forth, bottlenecking on src and dst.
With this change, as soon as the two threads start uploading to dst, two
more threads are able to start, downloading from src. So bandwidth to
both remotes is saturated more often.
Other commands that use transferStages only send in one direction at a
time. So the worker threads for the other direction will sit idle, and
there will be no change in their behavior.
Sponsored-by: Dartmouth College's DANDI project
Lock the local content for drop after getting it from src, to prevent another
process from using the local content as a copy and dropping it from src,
which would prevent dropping the local content after sending it to dest.
Support resuming an interrupted move that downloaded the content from
src, leaving the local content populated. In this case, the location log
has not been updated to say the content is present locally, so we can
assume that it's resuming and go ahead and drop the local content after
sending it to dest.
Note that if a `git-annex get` is being ran at the same time as a
`git-annex move --from --to`, it may get a file just before the move
processes it. So the location log has not been updated yet, and the move
thinks it's resuming. Resulting in local copy being dropped after it's
sent to the dest. This race is something we'll just have to live with,
it seems.
I also gave up on the idea of checking if the location log had been updated
by a `git-annex get` that is ran at the same time. That wouldn't work, because
the location log is precached in the seek stage, so reading it again after
sending the content to dest would not notice changes made to it, unless the cache
were invalidated, which would slow it down a lot. That idea anyway was subject
to races where it would not detect the concurrent `git-annex get`.
So concurrent `git-annex get` will have results that may be surprising.
To make that less surprising, updated the documentation of this feature to
be explicit that it downloads content to the local repository
temporarily.
Sponsored-by: Dartmouth College's DANDI project
When the destination already has a copy, it behaves the same as
drop --from really, but display it as a move and implement it
reusing the factored out code from fromPerform.
(Note that willDropMakeItWorse never returns DropAllowed in that
situation, because it's told that dest has a copy. So numcopies is
always checked.)
And when only the source and not the local repo or destination have a
copy, do the full copy from source to local, then copy from local to
dest, then drop from local, then drop from source dance.
This is complicated by fromPerform being hardcoded to assume there is a
local copy, but the local copy has already been dropped. That's why
it uses cleanupfromsrc RemoveNever to avoid the code that makes that
assumption, and finishes with a call to dropfromsrc.
And, since the location log has not yet been updated, checking numcopies
was not working, until I added UnVerifiedRemote dest to the list of
things to check.
This is not yet quite mergeable though. There are two things in the
comment above fromToPerform that are not implemented yet: Checking the
location log before dropping the local copy, and locking the temporary
local copy for drop.
Sponsored-by: Dartmouth College's DANDI project
Prep for move --to --from, which needs to download from a src repo
without updating the location log for the local repo, before sending the
content on to the dest repo.
Note that caller of download' already update the log themselves.
See previous commit a422a056f2
that pushed it up to download from getViaTmpFrom.
(Also removed in passing a debug print + readline that I accidentially
committed last week on this branch.)
Sponsored-by: Dartmouth College's DANDI project
This is rather trivial, since it does not need to temporarily get the
local copy.
Added fromPerform' to handle the situation where the local copy
is dropped by another process during the copy to the dest. This avoids
ever re-downloading the local copy before dropping from the src.
Sponsored-by: Dartmouth College's DANDI project
Allowing --from and --to as an alternative to --from or --to
is hard to do with optparse-applicative!
The obvious approach of (pfrom <|> pto <|> pfromandto) does not work
when pfromandto uses the same option names as pfrom and pto do.
It compiles but the generated parser does not work for all desired
combinations.
Instead, have to parse optionally from and optionally to. When neither
is provided, the parser succeeds, but it's a result that can't be
handled. So, have to giveup after option parsing. There does not seem to
be a way to make an optparse-applicative Parser give up internally
either.
Also, need seek' because I first tried making fto be a where binding,
but that resulted in a hang when git-annex move was run without --from
or --to. I think because startConcurrency was not expecting the stages
value to contain an exception and so ended up blocking.
Sponsored-by: Dartmouth College's DANDI project
I've long been asked for `git-annex find --all` or something like that,
but pushed back on it because I feel that the command is analagous to
find(1) and so it would be surprising for it to list keys rather than
files. So instead, add a new findkeys subcommand.
Note that the use of withKeyOptions is rather strange because usually
that is used to fall back to --all rather than listing files, but here
it's made to default to --all like behavior and never list files.
A performance thing that could be improved is that withKeyOptions
always reads and caches location logs. But findkeys with no options does
not need them, so it could be made faster. That caching does speed up
options like --in though. This is really just a subset of a more general
performance thing that --all reads location logs sometimes unncessarily.
Anyway, it needs to read the location log in order to checkDead,
and it seems good that findkeys does skip dead keys.
Also, cleaned up comments on git-annex-find man page asking for --all
option.
Sponsored-by: Dartmouth College's DANDI project