better HTTP connection reuse

Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2018-04-04 15:15:12 -04:00
parent 2ec07bc29f
commit 9b98d3f630
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
12 changed files with 61 additions and 53 deletions

View file

@ -12,6 +12,11 @@ and pipelining requests, so it makes mass url downloads a lot slower than
if git-annex used http-conduit to do url downloads itself. [[users/yoh]]
has requested http pipelining.
(git-annex was creating a new http manager each time it hit an url,
except for in the S3 remote which reused a single manager. That's now been
improved, so all http-conduit use in git-annex reuses a http manager, and
so will do http pipelining.)
For file: ftp: and more unusual urls, http-conduit can't support them.
git-annex does support those urls, and people rely on that, so it would
still need to use wget or curl for those.