git-annex/doc/todo/bwlimit.mdwn
Joey Hess 64cac1a721
avoid potentially very long bwlimit delay at start
I first saw this getting with -J2 over ssh, but later saw it also
without the -J2. It was resuming, and the calulated unboundDelay was
many minutes. The first update of the meter jumped to some large value,
because of the resuming, and so it thought the BW was super fast.

Avoid by waiting until the second meter update.

Might be a good idea to also guard for the delay being many seconds
and avoid waiting. But how many? If BW is legitimately super fast, and a
remote happens to read more than a 32kb or so chunk at a time, it could
in theory download megabytes or gigabytes of data before the first meter
update. It would actually be appropriate then to delay for a long time,
if the desired BW was low. Could make up some numbers that are sane now,
but tech may improve.

(BTW, pleased to see bwlimit does work with -J. I had worried that
it might not, if the meter update happened in a different thread than
the downloading, but it's done in the same thread.)

Sponsored-by: Brett Eisenberg on Patreon
2021-09-22 19:23:30 -04:00

38 lines
1.9 KiB
Markdown

Add a git config to limit the bandwidth of transfers to/from remotes.
rsync has --bwlimit, so used to work, but is not used with modern
git-annex for p2p transfers. (bup also has a --bwlimit)
This should be possible to implement in a way that works for any remote
that streams to/from a bytestring, by just pausing for a fraction of a
second when it's running too fast. The way the progress reporting interface
works, it will probably work to put the delay in there. --[[Joey]]
[[confirmed]]
> Implemented and works well. [[done]] --[[Joey]]
> Note: A local git remote, when resuming an interrupted
> transfer, has to hash the file (with default annex.verify settings),
> and that hashing updates the progress bar, and so the bwlimit can kick
> in and slow down that initial hashing, before any data copying begins.
> This seems perhaps ok; if you've bwlimited a local git remote,
> remote you're wanting to limit disk IO. Only reason it might not be ok
> is if the intent is to limit IO to the disk containing the remote
> but not the one containing the annex repo. (This also probably
> holds for the directory special remote.)
> Other remotes, including git over ssh, when resuming don't have that
> problem. Looks like chunked special remotes narrowly avoid it, just
> because their implementation choose to not do incremental verification
> when resuming. It might be worthwhile to differentiate between progress
> updates for incremental verification setup and for actual transfers, and
> only rate limit the latter, just to avoid fragility in the code.
> I have not done so yet though, and am closing this..
> --[[Joey]]
> (One other small caveat is that it pauses after each chunk, which means
> it pauses unncessarily after the last chunk of the file. It doesn't know
> it's the last chunk, and it would be hard to teach it. And the chunks
> tend to be 32kb or so, and the pauses a small fraction of a second. So
> mentioning this only for completeness.) --[[Joey]]