Added a comment: parallelizing checksum and get

2019-03-07 18:21:22 +00:00 · 2019-03-07 18:21:22 +00:00 · afeb00b19a
commit afeb00b19a
parent 2cf3d68fe0
1 changed files with 24 additions and 0 deletions
--- a/doc/forum/does_git-annex_parallelize_different_remotes63/comment_5_5098db1fad3290cba49ea1c1163cc168._comment
+++ b/doc/forum/does_git-annex_parallelize_different_remotes63/comment_5_5098db1fad3290cba49ea1c1163cc168._comment
@ -0,0 +1,24 @@
+[[!comment format=mdwn
+ username="anarcat"
+ avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
+ subject="parallelizing checksum and get"
+ date="2019-03-07T18:21:22Z"
+ content="""
+one thing I would definitely like to see parallelize is CPU and network. right now `git annex get` will:
+
+ 1. download file A
+ 2. checksum file A
+ 3. download file B
+ 4. checksum file B
+
+... serially. If parallelism (`-J2`) is enabled, the following happens, assuming files are roughly the same size:
+
+ 1. download file A and B
+ 2. checksum file A and B
+
+This is not much of an improvement... We can get away with maximizing the bandwidth usage *if* file transfers are somewhat interleaved (because of size differences) but the above degenerate case happens actually quite often. The alternative (`-J3` or more) might just download more files in parallel, which is not optimal.
+
+So could we at least batch the checksum jobs separately from downloads? This would already be an improvement and maximize resource usage while at the same time reducing total transfer time.
+
+Thanks! :)
+"""]]