Commit graph

90 commits

Author SHA1 Message Date
Joey Hess
9b98d3f630
better HTTP connection reuse
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
2018-04-04 15:39:40 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
70344d25c0
type signature works for both old and new versions of ifdef 2017-12-11 12:49:23 -04:00
Joey Hess
c6e4bc0a22
fix regression in addurl --file caused by youtube-dl support
Now youtubeDlCheck downloads the beginning of the url's content and
checks if it's html, only when it is does it pass it off the youtube-dl
to check if it supports it.

This means more work is done for urls that youtube-dl does support,
but is probably more efficient for other urls, since it only downloads
the first chunk of content, while youtube-dl probably downloads more.

As well as the reported bug, this also fixes behavior when an url
was added with youtube-dl, but the url content has now changed from
a html page to something else. Remote.Web.checkKey used to wrongly
succeed in that situation, since youtube-dl said sure it can download
that something else.

This commit was supported by the NSF-funded DataLad project.
2017-12-06 13:22:31 -04:00
Joey Hess
93d5951f11
remove redundant pattern match 2017-09-24 16:17:58 -04:00
Joey Hess
01068d8280
fix build with old http-client 2017-09-13 15:35:42 -04:00
Joey Hess
2ca1d3cc01
deal with box.com horrible infinite redirect behavior
webdav: Checking if a non-existent file is present on Box.com triggered a
bug in its webdav support that generates an infinite series of redirects.

It seems to redirect foo to foo/ to foo/index.php to
foo/index.php/index.php ... Why a webdav endpoint would behave this way
who knows.

Deal with such problems by assuming such behavior means the file is not
present.

Can't simply disable following redirects, because the webdav endpoint could
legitimately be redirected to a new endpoint. So, when this happens
10 redirects have to be followed, before it gives up and assumes this means
the file does not exist.

This commit was supported by the NSF-funded DataLad project.
2017-09-12 15:13:42 -04:00
Joey Hess
0a2f7c261f
fix build with old http-client versions 2017-08-17 11:00:48 -04:00
Joey Hess
69dcb08d7a
Disable http-client's default 30 second response timeout when HEADing an url to check if it exists. Some web servers take quite a long time to answer a HEAD request. 2017-08-15 13:56:12 -04:00
Joey Hess
1c4e5f65fc
Drop support for building with old versions of directory, feed, and http-types. 2017-03-10 15:57:41 -04:00
Joey Hess
ca49a84ba5
Drop support for building with old versions of dns and http-conduit. 2017-03-10 15:49:14 -04:00
Joey Hess
7a0d6d81a0
make curl show http errors to stderr
* Run curl with -S, so HTTP errors are displayed, even when
  it's otherwise silent.
* When downloading in --json or --quiet mode, use curl in preference
  to wget, since curl is able to display only errors to stderr, unlike
  wget.

This does mean that downloadQuiet is only silent on stdout, not necessarily
on stderr, which affects a couple other calls of it. For example,
downloading the .git/config of a http remote may show an error message now,
perhaps with slightly suboptimal formatting due to other output.
2017-02-20 16:09:32 -04:00
Joey Hess
8dd3635acf
improve layout 2017-02-20 15:44:14 -04:00
Joey Hess
4a397b5313
Run wget with -nv instead of -q, so it will display HTTP errors.
This adds one extra line of output when a download is successful,
after the progress bar. I don't much like that, but wget does not provide a
way to show HTTP errors without it.
2017-02-20 15:25:02 -04:00
Joey Hess
9dabe85bb5
whitespace 2016-12-28 00:17:36 -04:00
Joey Hess
59fead6da3
Pass annex.web-options to wget and curl after other options, so that eg --no-show-progress can be set by the user to disable the default --show-progress. 2016-12-13 11:56:23 -04:00
Alper Nebi Yasak
93a22a1c97
Remove http-conduit (<2.2.0) constraint
Since https://github.com/aristidb/aws/issues/206 is resolved, this
constraint is no longer necessary. However, http-conduit (>=2.2.0)
requires http-client (>=0.5.0) which introduces some breaking changes.
This commit also implements those changes depending on the version.
Fixes: https://git-annex.branchable.com/bugs/Build_with_aws_head_fails/

Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
2016-12-10 10:45:52 -04:00
Joey Hess
de7b2ffa72
avoid deprecation warning from parseUrl 2016-09-07 12:02:38 -04:00
Joey Hess
79704528c0
Support checking presence of content at a http url that redirects to a ftp url. 2016-07-12 16:41:45 -04:00
Joey Hess
1f6f9a8d34
When annex.http-headers is used to set the User-Agent header, avoid sending User-Agent: git-annex 2016-01-11 12:10:38 -04:00
Joey Hess
45c9440cf9
refactor 2015-10-15 10:34:19 -04:00
Joey Hess
cdbce512bd deal with more backward-compatible breaking renamings in conduit
This is the kind of annoying thing that makes me not want to use a library.
conduitManagerSettings was a perfectly fine name and could have been kept
forever.
2015-10-02 15:18:54 -04:00
Joey Hess
9e3ac97608 avoid deprecation warnings when built with http-client >= 0.4.18
Since I want git-annex to keep building on debian stable, I need to still
support the old http-client, which required explicit calls to
closeManager, or use of withManager to get Managers to close at appropriate
times. This is not needed in the new version, and so they added a
deprecation warning. IMHO much too early, because look at the mess I had to
go through to avoid that deprecation warning while supporting both
versions..
2015-10-01 13:48:56 -04:00
Joey Hess
0f5d6c09ac importfeed --relaxed: Avoid hitting the urls of items in the feed. 2015-08-19 12:24:55 -04:00
Joey Hess
a6c56fb459 improve url parsing more
Now can handle eg, "http://[::1]/download/cdrom-fontzip[foo]", where
the first [] need to stay unescaped, but the rest have to be escaped.
2015-06-14 13:54:24 -04:00
Joey Hess
829007d629 Improve url parsing to handle some urls containing illegal [] characters in their paths.
Ie, "https://archive.org/download/zoom-2/Zoom - Release 2 (1996)(Active Software)[!].iso"
2015-06-14 13:39:44 -04:00
Joey Hess
eb33569f9d remove Params constructor from Utility.SafeCommand
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.

In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.

Problems noticed while doing this conversion:

	* Some uses of Params "oneword" which was entirely unnecessary
	  overhead.
	* A few places that built up a list of parameters with ++
	  and then used Params to split it!

Test suite passes.
2015-06-01 13:52:23 -04:00
Joey Hess
6466dbc950 FlexibleContexts needed by ghc 7.10 2015-05-10 15:37:55 -04:00
Joey Hess
d9bfc78444 avoid using relative path from temp dir to dest file
That failed on OSX. The temp dir was
/var/folders/fb/pnwjj52n7fg0r9mnvpsfll180000gr/T/downloadurl
and the relative path
../../../../../../Volumes/Visitors/joeyh/git-annex/r/.git/...
Didn't work. I have no clue why, how did OSX manage to break this?

But, the relative path is longer most of the time anyway, so let's
just use the absolute path.
2015-05-07 18:47:24 -04:00
Joey Hess
cf786f42a4 Support checking ftp urls for file presence. 2015-05-05 14:05:02 -04:00
Joey Hess
0b18228516 Work around wget bug #784348 which could cause it to clobber git-annex symlinks when downloading from ftp. 2015-05-05 13:53:06 -04:00
Joey Hess
c65e71e6a5 cleanup 2015-04-09 12:57:30 -04:00
Joey Hess
b2ad3403c6 make downloadQuiet quiet again
This was broken in commit c64ede23cd
2015-04-03 20:38:20 -04:00
Joey Hess
b8f0b7309f Work around curl bug when asked to download an empty url to a file.
In this situation, curl -o exits successfully without creating the output
file.

There was already a workaround for curl file:/// but I did not realize this
also affected regular url downloads.

To fix it, pre-create the destination file before starting curl.
Since we cannot always know the size of an url before trying to download
it, let's always do this.

Note that since curl is told -C -, we have to consider if this
makes curl try to do a ranged download, which might fail on some servers
where a regular download would have succeeded. My testing indicates
this isn't a problem; since the file is empty, curl seems to not try to
do a ranged download.

Original report: https://github.com/datalad/datalad/issues/79
Curl bug report: https://github.com/bagder/curl/issues/183
2015-03-27 10:22:36 -04:00
Joey Hess
e8c376e0ad import Data.Default in Common 2015-01-28 16:11:28 -04:00
Joey Hess
587f6a919b addurl: When a Content-Disposition header suggests a filename to use, addurl will consider using it, if it's reasonable and doesn't conflict with an existing file. (--file overrides this) 2015-01-22 14:52:52 -04:00
Joey Hess
91f1b2bdcf excess indent 2015-01-22 13:47:06 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
4f657aa14e add getFileSize, which can get the real size of a large file on Windows
Avoid using fileSize which maxes out at just 2 gb on Windows.
Instead, use hFileSize, which doesn't have a bounded size.
Fixes support for files > 2 gb on Windows.

Note that the InodeCache code only needs to compare a file size,
so it doesn't matter it the file size wraps. So it has been
left as-is. This was necessary both to avoid invalidating existing inode
caches, and because the code passed FileStatus around and would have become
more expensive if it called getFileSize.

This commit was sponsored by Christian Dietrich.
2015-01-20 17:09:24 -04:00
Joey Hess
c64ede23cd Use wget -q --show-progress for less verbose wget output, when built with wget 1.16. 2014-12-16 14:04:40 -04:00
Joey Hess
8b15af309a add compat cruft for old versions of http-types and http-conduit 2014-08-17 15:39:46 -04:00
Joey Hess
6ab0737a75 work around default Accept-Encoding in http-client 2014-08-15 18:02:17 -04:00
Joey Hess
e0227dfedf memoize construction of the Request -> Request function to apply the UrlOptions 2014-08-15 17:47:21 -04:00
Joey Hess
dd619c7166 Switched from the old haskell HTTP library to http-conduit.
The hoary old HTTP library was only used when checking if an url exists,
when curl was not available. It had many problems, including not supporting
https at all.

Now, this is done using http-conduit for all urls that it supports. Falls
back to curl for any url that http-conduit doesn't like (probably ftp etc,
but could also be an url that its parser chokes on for whatever reason).

This adds a new dependency on http-conduit, but webdav support already
indirectly depended on that, and the s3-aws branch also uses it.

This opens up the possibility of using http-conduit for large file
downloads, but for now I've left it using wget/curl.

This commit was sponsored by Paul Tötterman.
2014-08-15 17:37:42 -04:00
Joey Hess
c784ef4586 unify exception handling into Utility.Exception
Removed old extensible-exceptions, only needed for very old ghc.

Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.

Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.

However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.

Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
2014-08-07 22:03:29 -04:00
Joey Hess
2427832bed relicense general utility library code to BSD
Omitted a couple of files what have had significant contributions from
others.
2014-05-10 11:01:27 -03:00
Joey Hess
16387edd00 avoid exception when curl exits nonzero (due to eg, bad domain name) 2014-03-27 13:01:57 -04:00
Joey Hess
003fc2b7e1
add UrlOptions sum type 2014-02-24 22:00:25 -04:00
Joey Hess
c69d6eb035 Make annex.web-options be used in several places that call curl. 2014-02-24 21:29:37 -04:00