Added a comment: parallelizing checksum and get
This commit is contained in:
parent
2cf3d68fe0
commit
afeb00b19a
1 changed files with 24 additions and 0 deletions
|
@ -0,0 +1,24 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="anarcat"
|
||||||
|
avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
|
||||||
|
subject="parallelizing checksum and get"
|
||||||
|
date="2019-03-07T18:21:22Z"
|
||||||
|
content="""
|
||||||
|
one thing I would definitely like to see parallelize is CPU and network. right now `git annex get` will:
|
||||||
|
|
||||||
|
1. download file A
|
||||||
|
2. checksum file A
|
||||||
|
3. download file B
|
||||||
|
4. checksum file B
|
||||||
|
|
||||||
|
... serially. If parallelism (`-J2`) is enabled, the following happens, assuming files are roughly the same size:
|
||||||
|
|
||||||
|
1. download file A and B
|
||||||
|
2. checksum file A and B
|
||||||
|
|
||||||
|
This is not much of an improvement... We can get away with maximizing the bandwidth usage *if* file transfers are somewhat interleaved (because of size differences) but the above degenerate case happens actually quite often. The alternative (`-J3` or more) might just download more files in parallel, which is not optimal.
|
||||||
|
|
||||||
|
So could we at least batch the checksum jobs separately from downloads? This would already be an improvement and maximize resource usage while at the same time reducing total transfer time.
|
||||||
|
|
||||||
|
Thanks! :)
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue