Commit graph

42832 commits

Author SHA1 Message Date
Joey Hess
579d9b60c1
improve concurrency of move/copy --from --to
Use separate stages for download and upload. In the common case where
it downloads the file from one remote and then uploads to the other,
those are by far the most expensive operations, and there's a decent
chance the two remotes bottleneck on different resources.

Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads
will be started both downloading from the src remote. They will probably
finish at the same time. Then two threads will be started uploading to
the dst remote. They will probably take the same time as well. Before
this change, it would alternate back and forth, bottlenecking on src and dst.
With this change, as soon as the two threads start uploading to dst, two
more threads are able to start, downloading from src. So bandwidth to
both remotes is saturated more often.

Other commands that use transferStages only send in one direction at a
time. So the worker threads for the other direction will sit idle, and
there will be no change in their behavior.

Sponsored-by: Dartmouth College's DANDI project
2023-01-24 13:59:39 -04:00
jpds
50af8467e3 Added a comment 2023-01-24 15:55:36 +00:00
nobodyinperson
4edae1430a Added a comment 2023-01-23 22:55:11 +00:00
nobodyinperson
c2c405d45a Added a comment 2023-01-23 22:38:28 +00:00
Joey Hess
57987ed2cd
update 2023-01-23 18:08:55 -04:00
Joey Hess
62f8a26dd9
Merge branch 'fromto' 2023-01-23 18:01:57 -04:00
Joey Hess
2a9999f5f1
Merge branch 'master' of ssh://git-annex.branchable.com 2023-01-23 18:01:12 -04:00
Joey Hess
1ee72de32e
done 2023-01-23 17:57:15 -04:00
Joey Hess
77266e46dd
fix behavior of copy --from --to
Sponsored-by: Dartmouth College's DANDI project
2023-01-23 17:55:16 -04:00
Joey Hess
3585481470
Merge branch 'master' into fromto 2023-01-23 17:44:44 -04:00
Joey Hess
acc3f6211f
finishing up move --from --to
Lock the local content for drop after getting it from src, to prevent another
process from using the local content as a copy and dropping it from src,
which would prevent dropping the local content after sending it to dest.

Support resuming an interrupted move that downloaded the content from
src, leaving the local content populated. In this case, the location log
has not been updated to say the content is present locally, so we can
assume that it's resuming and go ahead and drop the local content after
sending it to dest.

Note that if a `git-annex get` is being ran at the same time as a
`git-annex move --from --to`, it may get a file just before the move
processes it. So the location log has not been updated yet, and the move
thinks it's resuming. Resulting in local copy being dropped after it's
sent to the dest. This race is something we'll just have to live with,
it seems.

I also gave up on the idea of checking if the location log had been updated
by a `git-annex get` that is ran at the same time. That wouldn't work, because
the location log is precached in the seek stage, so reading it again after
sending the content to dest would not notice changes made to it, unless the cache
were invalidated, which would slow it down a lot. That idea anyway was subject
to races where it would not detect the concurrent `git-annex get`.

So concurrent `git-annex get` will have results that may be surprising.
To make that less surprising, updated the documentation of this feature to
be explicit that it downloads content to the local repository
temporarily.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 17:43:48 -04:00
Joey Hess
f5f799f17e
fully working move --from --to (not release quality)
When the destination already has a copy, it behaves the same as
drop --from really, but display it as a move and implement it
reusing the factored out code from fromPerform.

(Note that willDropMakeItWorse never returns DropAllowed in that
situation, because it's told that dest has a copy. So numcopies is
always checked.)

And when only the source and not the local repo or destination have a
copy, do the full copy from source to local, then copy from local to
dest, then drop from local, then drop from source dance.

This is complicated by fromPerform being hardcoded to assume there is a
local copy, but the local copy has already been dropped. That's why
it uses cleanupfromsrc RemoveNever to avoid the code that makes that
assumption, and finishes with a call to dropfromsrc.

And, since the location log has not yet been updated, checking numcopies
was not working, until I added UnVerifiedRemote dest to the list of
things to check.

This is not yet quite mergeable though. There are two things in the
comment above fromToPerform that are not implemented yet: Checking the
location log before dropping the local copy, and locking the temporary
local copy for drop.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 16:12:33 -04:00
Joey Hess
1abd457e98
push location log updating up to callers of download
Prep for move --to --from, which needs to download from a src repo
without updating the location log for the local repo, before sending the
content on to the dest repo.

Note that caller of download' already update the log themselves.
See previous commit a422a056f2
that pushed it up to download from getViaTmpFrom.

(Also removed in passing a debug print + readline that I accidentially
committed last week on this branch.)

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 13:47:41 -04:00
Joey Hess
8c349b8802
implement move --from --to when there is a local copy already
This is rather trivial, since it does not need to temporarily get the
local copy.

Added fromPerform' to handle the situation where the local copy
is dropped by another process during the copy to the dest. This avoids
ever re-downloading the local copy before dropping from the src.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 13:17:35 -04:00
Joey Hess
05b2ae30f0
update 2023-01-23 12:45:01 -04:00
meribold
aad9581d11 Fix typo in heading that seems to result in git-annex-drop not showing up in man -k output 2023-01-22 15:25:35 +00:00
Joey Hess
45c338204f
Merge branch 'master' of ssh://git-annex.branchable.com 2023-01-20 11:23:24 -04:00
Joey Hess
5645017a03
comment 2023-01-20 11:23:04 -04:00
Joey Hess
6f95f821cb
remove --fast from man page
git-annex move does not actually behave any differently with --fast than
without it. (git-annex copy does)

(cherry picked from commit f74904ee2c)
2023-01-20 11:11:31 -04:00
Joey Hess
a46c385aec
move/copy: started implementing --from src --to dest
This is not in a usable state, but I have a possible plan for how to do
it.

Sponsored-by: Dartmouth College's DANDI project
2023-01-20 11:10:38 -04:00
nobodyinperson
f14346bf07 2023-01-20 10:29:33 +00:00
jpds
73cc3fcd12 Added a comment 2023-01-19 16:28:19 +00:00
jpds
dce215e11a Added a comment 2023-01-19 15:09:01 +00:00
jpds
c071ea267d Added a comment 2023-01-18 22:57:58 +00:00
jpds
da6504ee13 2023-01-18 22:45:06 +00:00
Joey Hess
f74904ee2c
remove --fast from man page
git-annex move does not actually behave any differently with --fast than
without it. (git-annex copy does)
2023-01-18 15:15:41 -04:00
Joey Hess
a6c1d9752b
move/copy: option parsing for --from with --to
Allowing --from and --to as an alternative to --from or --to
is hard to do with optparse-applicative!

The obvious approach of (pfrom <|> pto <|> pfromandto) does not work
when pfromandto uses the same option names as pfrom and pto do.
It compiles but the generated parser does not work for all desired
combinations.

Instead, have to parse optionally from and optionally to. When neither
is provided, the parser succeeds, but it's a result that can't be
handled. So, have to giveup after option parsing. There does not seem to
be a way to make an optparse-applicative Parser give up internally
either.

Also, need seek' because I first tried making fto be a where binding,
but that resulted in a hang when git-annex move was run without --from
or --to. I think because startConcurrency was not expecting the stages
value to contain an exception and so ended up blocking.

Sponsored-by: Dartmouth College's DANDI project
2023-01-18 14:42:39 -04:00
yarikoptic
ea44f2416c Added a comment 2023-01-18 17:55:50 +00:00
Joey Hess
2a92f5cc2c
comment 2023-01-18 13:05:47 -04:00
Joey Hess
f8bc208e89
findkeys: New command, very similar to git-annex find but operating on keys
I've long been asked for `git-annex find --all` or something like that,
but pushed back on it because I feel that the command is analagous to
find(1) and so it would be surprising for it to list keys rather than
files. So instead, add a new findkeys subcommand.

Note that the use of withKeyOptions is rather strange because usually
that is used to fall back to --all rather than listing files, but here
it's made to default to --all like behavior and never list files.

A performance thing that could be improved is that withKeyOptions
always reads and caches location logs. But findkeys with no options does
not need them, so it could be made faster. That caching does speed up
options like --in though. This is really just a subset of a more general
performance thing that --all reads location logs sometimes unncessarily.
Anyway, it needs to read the location log in order to checkDead,
and it seems good that findkeys does skip dead keys.

Also, cleaned up comments on git-annex-find man page asking for --all
option.

Sponsored-by: Dartmouth College's DANDI project
2023-01-17 14:51:57 -04:00
Joey Hess
a522a41a42
fix misleading helper function name
noworktreeitems was false for NoWorkTreeItems which was hard to understand.

Sponsored-by: Dartmouth College's DANDI project
2023-01-17 14:25:38 -04:00
Joey Hess
ce241f9aa9
comment 2023-01-17 13:10:28 -04:00
Joey Hess
eb5b072e2e
comment 2023-01-17 13:08:49 -04:00
daven.quinn@d0ed4e0e5e4462d9a74a5d5a8fbd1b17f85db13e
be6aec3100 Added a comment: comment 1 response 2023-01-16 21:45:35 +00:00
yarikoptic
e42ac8844e Added a comment 2023-01-16 21:43:40 +00:00
yarikoptic
fdca11e815 Added a comment 2023-01-16 21:32:09 +00:00
yarikoptic
4890c70a1b Added a comment 2023-01-16 21:27:12 +00:00
Joey Hess
fdd0b4bae0
comment 2023-01-16 16:12:16 -04:00
Joey Hess
c172855a7f
Merge branch 'master' of ssh://git-annex.branchable.com 2023-01-16 15:53:14 -04:00
Joey Hess
e97da33773
comment 2023-01-16 15:52:52 -04:00
yarikoptic
35435cd955 initial report on difficulty moving frozen file 2023-01-16 19:36:07 +00:00
Joey Hess
527e70fc69
comment 2023-01-16 15:26:21 -04:00
Joey Hess
cae12e0ccd
comment 2023-01-16 15:15:21 -04:00
Joey Hess
8bb078e0df
response 2023-01-16 15:13:00 -04:00
Joey Hess
9be56daf07
comment 2023-01-16 15:07:46 -04:00
Joey Hess
ec0107098d
close wontfix with submitter agreement 2023-01-16 14:40:30 -04:00
Joey Hess
321850a67d
close as dup 2023-01-16 14:37:44 -04:00
Joey Hess
62dd19e391
comment 2023-01-16 14:28:46 -04:00
Joey Hess
086cb30eb1
comment 2023-01-16 14:16:13 -04:00
Joey Hess
f87c74566a
close wontfix 2023-01-16 14:08:37 -04:00