Commit graph

128 commits

Author SHA1 Message Date
Joey Hess
8ea5f3ff99
explict export lists
Eliminated some dead code. In other cases, exported a currently unused
function, since it was a logical part of the API.

Of course this improves the API documentation. It may also sometimes
let ghc optimize code better, since it can know a function is internal
to a module.

364 modules still to go, according to
git grep -E 'module [A-Za-z.]+ where'
2019-11-21 16:08:37 -04:00
Joey Hess
890330f0fe
make --json-error-messages capture url download errors
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.

(When curl is used, its errors are still not caught.)
2019-11-12 13:52:38 -04:00
Joey Hess
05d52f9699
fix display of http exceptions 2019-08-10 11:09:25 -04:00
Joey Hess
19defc7932
fix reversion
4af55c42bf reordered the exception
catching, preventing following ftp redirect
2019-08-04 14:32:06 -04:00
Joey Hess
4af55c42bf
factored out downloadConduit from download
useful when an API provides a Request to download
2019-08-04 12:31:54 -04:00
Joey Hess
7fd650355e
merge from http-client-restricted
I made some improvements to its API after splitting it out of git-annex,
so merge those back in.

This is groundwork for removing the embedded copy of it and depending on
it.

Also moved the managerResponseTimeout disabling to Annex.Url as it's
git-annex specific.

This commit was sponsored by Ethan Aubin on Patreon.
2019-07-17 16:48:50 -04:00
Joey Hess
9a5ddda511
remove many old version ifdefs
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.

The only remaining version ifdefs in the entire code base are now a couple
for aws!

This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of  git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.

This commit was sponsored by Ilya Shlyakhter.
2019-07-05 15:09:37 -04:00
Joey Hess
759fd9ea68
avoid url resume from 0
When downloading an url and the destination file exists but is empty,
avoid using http range to resume, since a range "bytes=0-" is an unusual
edge case that it's best to avoid relying on working.

This is known to fix a case where importfeed downloaded a partial feed from
such a server. Since importfeed uses withTmpFile, the destination always exists
empty, so it would particularly tickle such problem servers. Resuming from 0
is otherwise possible, but unlikely.
2019-06-20 12:26:17 -04:00
Joey Hess
fe49747fc8
add missing case
and fix name shadowing warning
2019-06-04 11:24:32 -04:00
Joey Hess
6136e299a2
add back support for following http to ftp redirects
Did not test build with http-client < 0.5 and while I tried to support
it, the ifdefed parts may needs some fixes.
2019-05-30 16:04:59 -04:00
Joey Hess
67c06f5121
add back support for ftp urls
Add back support for ftp urls, which was disabled as part of the fix for
security hole CVE-2018-10857 (except for configurations which enabled curl
and bypassed public IP address restrictions). Now it will work if allowed
by annex.security.allowed-ip-addresses.
2019-05-30 14:51:34 -04:00
Joey Hess
850d19d038
add dropFromEnd 2018-11-23 11:24:05 -04:00
Joey Hess
ff9bd9620e
Fix resume of download of url when the whole file content is already actually downloaded
Don't much like that there's no way to distinguish between having the whole
content and having an old version of the file that's bigger, but of course
resuming a http transfer can always yield the wrong result if the file on
the http server is changing, and git-annex will detect that when it
verifies the downloaded content.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-12 16:08:47 -04:00
Joey Hess
45e09ea7f3
debug the full adjusted Request
So that the user-agent etc are included in the debug.
2018-10-04 13:45:27 -04:00
Joey Hess
303d10cee6
Improve display when git config download from a http remote fails.
The error message displayed used to only come from curl/wget and perhaps
was clearer than the one displayed now that http-client is used. In any
case, it does make sense to hide it because git-annex prints its own
warning message.

This commit was sponsored by Jake Vosloo on Patreon.
2018-10-03 12:31:09 -04:00
Joey Hess
502c5a4917
remove support for old http-client version
git-annex already bumped to a newer version for the http security fix.

This commit was sponsored by mo on Patreon.
2018-10-03 12:00:07 -04:00
Joey Hess
c88e8c8249
unify error display 2018-10-03 11:56:52 -04:00
Joey Hess
26a02cb386
display error when an invalid url is downloaded
download is documented as displaying an error when download fails, but
it didn't when the url was not valid at all. That leads to confusing
behavior.

Also, display the url with --debug
2018-09-25 13:38:20 -04:00
Yaroslav Halchenko
b976eb5353
BF(minor): missing space after "Unsupported url scheme" msg before the scheme 2018-09-18 18:19:20 -04:00
Joey Hess
b3c9c59d3d
--debug urls
When git-annex used wget and curl, --debug would show urls. So there can't
be any new security problem with doing so.

This commit was sponsored by John Pellman on Patreon.
2018-09-14 12:46:39 -04:00
Joey Hess
e1ab01f94d
Fix reversion in display of http 404 errors.
Switch to using http-client for large file downloads caused the reversion;
the code for displaying a 404 response was instead displaying the raw html
document, which is not useful.

This commit was sponsored by Ryan Newton on Patreon.
2018-07-31 12:15:26 -04:00
Joey Hess
50609da787
fix User-Agent reversion
Send User-Agent and any configured annex.http-headers when downloading with
http, fixes reversion introduced when switching to http-client.

This commit was sponsored by mo on Patreon.
2018-07-16 11:56:47 -04:00
Joey Hess
22f49f216e
get android building the security fix
Had to update http-client and network, with follow-on dep changes.

This commit was sponsored by Brock Spratlen on Patreon.
2018-06-21 10:23:04 -04:00
Joey Hess
3c0a538335
allow ftp urls by default
They're no worse than http certianly. And, the backport of these
security fixes has to deal with wget, which supports http https and ftp
and has no way to turn off individual schemes, so this will make that
easier.
2018-06-18 15:37:17 -04:00
Joey Hess
b54b2cdc0e
prevent http connections to localhost and private ips by default
Security fix!

* git-annex will refuse to download content from http servers on
  localhost, or any private IP addresses, to prevent accidental
  exposure of internal data. This can be overridden with the
  annex.security.allowed-http-addresses setting.
* Since curl's interface does not have a way to prevent it from accessing
  localhost or private IP addresses, curl defaults to not being used
  for url downloads, even if annex.web-options enabled it before.
  Only when annex.security.allowed-http-addresses=all will curl be used.

Since S3 and WebDav use the Manager, the same policies apply to them too.

youtube-dl is not handled yet, and a http proxy configuration can bypass
these checks too. Those cases are still TBD.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-06-17 13:30:28 -04:00
Joey Hess
28720c795f
limit url downloads to whitelisted schemes
Security fix! Allowing any schemes, particularly file: and
possibly others like scp: allowed file exfiltration by anyone who had
write access to the git repository, since they could add an annexed file
using such an url, or using an url that redirected to such an url,
and wait for the victim to get it into their repository and send them a copy.

* Added annex.security.allowed-url-schemes setting, which defaults
  to only allowing http and https URLs. Note especially that file:/
  is no longer enabled by default.

* Removed annex.web-download-command, since its interface does not allow
  supporting annex.security.allowed-url-schemes across redirects.
  If you used this setting, you may want to instead use annex.web-options
  to pass options to curl.

With annex.web-download-command removed, nearly all url accesses in
git-annex are made via Utility.Url via http-client or curl. http-client
only supports http and https, so no problem there.
(Disabling one and not the other is not implemented.)

Used curl --proto to limit the allowed url schemes.

Note that this will cause git annex fsck --from web to mark files using
a disallowed url scheme as not being present in the web. That seems
acceptable; fsck --from web also does that when a web server is not available.

youtube-dl already disabled file: itself (probably for similar
reasons). The scheme check was also added to youtube-dl urls for
completeness, although that check won't catch any redirects it might
follow. But youtube-dl goes off and does its own thing with other
protocols anyway, so that's fine.

Special remotes that support other domain-specific url schemes are not
affected by this change. In the bittorrent remote, aria2c can still
download magnet: links. The download of the .torrent file is
otherwise now limited by annex.security.allowed-url-schemes.

This does not address any external special remotes that might download
an url themselves. Current thinking is all external special remotes will
need to be audited for this problem, although many of them will use
http libraries that only support http and not curl's menagarie.

The related problem of accessing private localhost and LAN urls is not
addressed by this commit.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-16 11:57:50 -04:00
Joey Hess
caaedb2993
fix http-client gzip decompression bug
Prevent haskell http-client from decompressing gzip files, so downloads of
such files works the same as it used to with wget and curl.

Explicitly setting accept-encoding to "identity" is probably not needed,
but that's what wget sends (curl does not send the header), and since
http-client is trying to be excessively smart, it seems we need to set
hAcceptEncoding to something to prevent it from inserting its own,
and this seems better than some hack like "".

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-05-21 15:10:25 -04:00
Joey Hess
6a63920732
fix build 2018-05-21 11:00:23 -04:00
Joey Hess
5204e1dd9d
Workaround for bug in an old version of cryptonite that broke https downloads, by using curl for downloads when git-annex is built with it.
This commit was supported by the NSF-funded DataLad project.
2018-05-20 14:12:37 -04:00
Joey Hess
86958fda5d
fix build with old http-client 2018-05-10 00:22:23 -04:00
Joey Hess
db720f6a9c
Display error message when http download fails.
* Display error message when http download fails.

  There's nothing in the http-client library to nicely format a http
  exception, so in some cases it has to fall back to using show on it.
  Seems better than just saying "it failed" or only showing the http
  status code.

* Avoid forward retry when 0 bytes were received.

  forwardRetry was comparing Nothing to Just 0, and so thought there had
  been progress made when 0 bytes were received.

This commit was supported by the NSF-funded DataLad project.
2018-05-08 16:11:45 -04:00
Joey Hess
3c6e60dc69
fix build with old http-conduit 2018-04-24 21:23:40 -04:00
Joey Hess
558a0a9328
deal with conduit 1.3 change
I don't know if this will build with older conduit, it may need an
ifdef.
2018-04-22 13:14:55 -04:00
Joey Hess
e5a404ebe2
fix build with old version of http-client 2018-04-09 13:04:23 -04:00
Joey Hess
c8f2d302dc
run curl when configured to do it at runtime, even if not available at build time 2018-04-06 21:17:36 -04:00
Joey Hess
c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00
Joey Hess
36e6b8abbf
Fix resuming a download when using curl.
Noticed a bug; when using curl a workaround for its empty file behavior
overwrote the file content, so it never resumed and always started over.
2018-04-06 16:09:53 -04:00
Joey Hess
0f6775f1ff
refactor sinkResponseFile and add downloadC
Remote.S3 and Remote.Helper.Http both had similar code to sink a
http-conduit Response to a file; refactor out sinkResponseFile.

downloadC downloads an url to a file using http-conduit, and supports
resuming. Falls back to curl to handle urls that http-conduit does not
support. This is not used yet, but the goal is to replace download with
it.

git-annex.cabal: conduit-extra was not actually used for a long time,
remove the dep. conduit moves into the main dependency list, but since
http-conduit was already in there, and it depends on conduit, that's not
really adding a new build dep.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 16:07:08 -04:00
Joey Hess
9b98d3f630
better HTTP connection reuse
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
2018-04-04 15:39:40 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
70344d25c0
type signature works for both old and new versions of ifdef 2017-12-11 12:49:23 -04:00
Joey Hess
c6e4bc0a22
fix regression in addurl --file caused by youtube-dl support
Now youtubeDlCheck downloads the beginning of the url's content and
checks if it's html, only when it is does it pass it off the youtube-dl
to check if it supports it.

This means more work is done for urls that youtube-dl does support,
but is probably more efficient for other urls, since it only downloads
the first chunk of content, while youtube-dl probably downloads more.

As well as the reported bug, this also fixes behavior when an url
was added with youtube-dl, but the url content has now changed from
a html page to something else. Remote.Web.checkKey used to wrongly
succeed in that situation, since youtube-dl said sure it can download
that something else.

This commit was supported by the NSF-funded DataLad project.
2017-12-06 13:22:31 -04:00
Joey Hess
93d5951f11
remove redundant pattern match 2017-09-24 16:17:58 -04:00
Joey Hess
01068d8280
fix build with old http-client 2017-09-13 15:35:42 -04:00
Joey Hess
2ca1d3cc01
deal with box.com horrible infinite redirect behavior
webdav: Checking if a non-existent file is present on Box.com triggered a
bug in its webdav support that generates an infinite series of redirects.

It seems to redirect foo to foo/ to foo/index.php to
foo/index.php/index.php ... Why a webdav endpoint would behave this way
who knows.

Deal with such problems by assuming such behavior means the file is not
present.

Can't simply disable following redirects, because the webdav endpoint could
legitimately be redirected to a new endpoint. So, when this happens
10 redirects have to be followed, before it gives up and assumes this means
the file does not exist.

This commit was supported by the NSF-funded DataLad project.
2017-09-12 15:13:42 -04:00
Joey Hess
0a2f7c261f
fix build with old http-client versions 2017-08-17 11:00:48 -04:00
Joey Hess
69dcb08d7a
Disable http-client's default 30 second response timeout when HEADing an url to check if it exists. Some web servers take quite a long time to answer a HEAD request. 2017-08-15 13:56:12 -04:00
Joey Hess
1c4e5f65fc
Drop support for building with old versions of directory, feed, and http-types. 2017-03-10 15:57:41 -04:00
Joey Hess
ca49a84ba5
Drop support for building with old versions of dns and http-conduit. 2017-03-10 15:49:14 -04:00