From afeb00b19a1c2ea7d00b87e8fbae80dc55717318 Mon Sep 17 00:00:00 2001 From: anarcat Date: Thu, 7 Mar 2019 18:21:22 +0000 Subject: [PATCH] Added a comment: parallelizing checksum and get --- ..._5098db1fad3290cba49ea1c1163cc168._comment | 24 +++++++++++++++++++ 1 file changed, 24 insertions(+) create mode 100644 doc/forum/does_git-annex_parallelize_different_remotes__63__/comment_5_5098db1fad3290cba49ea1c1163cc168._comment diff --git a/doc/forum/does_git-annex_parallelize_different_remotes__63__/comment_5_5098db1fad3290cba49ea1c1163cc168._comment b/doc/forum/does_git-annex_parallelize_different_remotes__63__/comment_5_5098db1fad3290cba49ea1c1163cc168._comment new file mode 100644 index 0000000000..fd12a7ef88 --- /dev/null +++ b/doc/forum/does_git-annex_parallelize_different_remotes__63__/comment_5_5098db1fad3290cba49ea1c1163cc168._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="anarcat" + avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7" + subject="parallelizing checksum and get" + date="2019-03-07T18:21:22Z" + content=""" +one thing I would definitely like to see parallelize is CPU and network. right now `git annex get` will: + + 1. download file A + 2. checksum file A + 3. download file B + 4. checksum file B + +... serially. If parallelism (`-J2`) is enabled, the following happens, assuming files are roughly the same size: + + 1. download file A and B + 2. checksum file A and B + +This is not much of an improvement... We can get away with maximizing the bandwidth usage *if* file transfers are somewhat interleaved (because of size differences) but the above degenerate case happens actually quite often. The alternative (`-J3` or more) might just download more files in parallel, which is not optimal. + +So could we at least batch the checksum jobs separately from downloads? This would already be an improvement and maximize resource usage while at the same time reducing total transfer time. + +Thanks! :) +"""]]