git-annex/doc/todo/stop_using_curl_and_wget.mdwn
Joey Hess c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00

52 lines
2.3 KiB
Markdown

Currently git-annex uses wget and curl for downloading urls.
Which is used depends on the situation, since both have their limitations
and quirks.
This often confuses users, who expect annex.web-options to only apply
to whichever program git-annex was running, and put in an option that
breaks the other program. Or, configure a netrc file, which wget uses by
default, but curl does not.
Also, using these external programs avoids keeping a http connection open
and pipelining requests, so it makes mass url downloads a lot slower than
if git-annex used http-conduit to do url downloads itself. [[users/yoh]]
has requested http pipelining.
(git-annex was creating a new http manager each time it hit an url,
except for in the S3 remote which reused a single manager. That's now been
improved, so all http-conduit use in git-annex reuses a http manager, and
so will do http pipelining.)
For file: ftp: and more unusual urls, http-conduit can't support them.
git-annex does support those urls, and people rely on that, so it would
still need to use wget or curl for those.
wget is also not shipped with git-annex on Windows or OSX, only curl is,
and it would be good to only use one of the programs, not both, when
handing those unusual urls.
See also, [[support_.netrc_for_fsck_--from_web]]. That some users rely on
git-annex using wget and a netrc file is kind of problimatic if switching
to http-conduit which does not support it. Maybe require users to set
`annex.web-download-command` if they want to make it use something that
supports netrc?
--[[Joey]]
> Implemented Utility.Url.downloadC that is the (nontrivial)
> download a file with resume support using http-conduit.
> It falls back to curl to handle urls that http-conduit does not support.
> Now we only have to decide what to do about the above edge cases..
> > Let's drop use of wget entirely, as it was only using it because I
> > preferred wget's progress bar to curl's. The user can still force wget
> > with annex.web-download-command.
> >
> > That leaves users who have a .netrc file or want to use
> > annex.web-options. Since curl requires --netrc in order to use the
> > .netrc file, require users who want to use the .netrc to
> > set "annex.web-options = --netrc". When "annex.web-options" is
> > set, always use curl (unless overridden by annex.web-download-command).
> > Otherwise, use conduit.
[[done]] --[[Joey]]