improve concurrency of move/copy --from --to

Use separate stages for download and upload. In the common case where
it downloads the file from one remote and then uploads to the other,
those are by far the most expensive operations, and there's a decent
chance the two remotes bottleneck on different resources.

Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads
will be started both downloading from the src remote. They will probably
finish at the same time. Then two threads will be started uploading to
the dst remote. They will probably take the same time as well. Before
this change, it would alternate back and forth, bottlenecking on src and dst.
With this change, as soon as the two threads start uploading to dst, two
more threads are able to start, downloading from src. So bandwidth to
both remotes is saturated more often.

Other commands that use transferStages only send in one direction at a
time. So the worker threads for the other direction will sit idle, and
there will be no change in their behavior.

Sponsored-by: Dartmouth College's DANDI project
This commit is contained in:
Joey Hess 2023-01-24 13:45:01 -04:00
parent 57987ed2cd
commit 579d9b60c1
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
12 changed files with 74 additions and 35 deletions

View file

@ -212,7 +212,7 @@ instance DeferredParseClass SyncOptions where
seek :: SyncOptions -> CommandSeek
seek o = do
prepMerge
startConcurrency downloadStages (seek' o)
startConcurrency transferStages (seek' o)
seek' :: SyncOptions -> CommandSeek
seek' o = do