From afeb00b19a1c2ea7d00b87e8fbae80dc55717318 Mon Sep 17 00:00:00 2001
From: anarcat <anarcat@web>
Date: Thu, 7 Mar 2019 18:21:22 +0000
Subject: [PATCH] Added a comment: parallelizing checksum and get

---
 ..._5098db1fad3290cba49ea1c1163cc168._comment | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)
 create mode 100644 doc/forum/does_git-annex_parallelize_different_remotes__63__/comment_5_5098db1fad3290cba49ea1c1163cc168._comment

diff --git a/doc/forum/does_git-annex_parallelize_different_remotes__63__/comment_5_5098db1fad3290cba49ea1c1163cc168._comment b/doc/forum/does_git-annex_parallelize_different_remotes__63__/comment_5_5098db1fad3290cba49ea1c1163cc168._comment
new file mode 100644
index 0000000000..fd12a7ef88
--- /dev/null
+++ b/doc/forum/does_git-annex_parallelize_different_remotes__63__/comment_5_5098db1fad3290cba49ea1c1163cc168._comment
@@ -0,0 +1,24 @@
+[[!comment format=mdwn
+ username="anarcat"
+ avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
+ subject="parallelizing checksum and get"
+ date="2019-03-07T18:21:22Z"
+ content="""
+one thing I would definitely like to see parallelize is CPU and network. right now `git annex get` will:
+
+ 1. download file A
+ 2. checksum file A
+ 3. download file B
+ 4. checksum file B
+
+... serially. If parallelism (`-J2`) is enabled, the following happens, assuming files are roughly the same size:
+
+ 1. download file A and B
+ 2. checksum file A and B
+
+This is not much of an improvement... We can get away with maximizing the bandwidth usage *if* file transfers are somewhat interleaved (because of size differences) but the above degenerate case happens actually quite often. The alternative (`-J3` or more) might just download more files in parallel, which is not optimal.
+
+So could we at least batch the checksum jobs separately from downloads? This would already be an improvement and maximize resource usage while at the same time reducing total transfer time.
+
+Thanks! :)
+"""]]