Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.
Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.
Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.
This commit was supported by the NSF-funded DataLad project.
Fourth or fifth try at this and finally found a way to make it work.
Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.
Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.
Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.
This commit was sponsored by Jochen Bartl on Patreon.
Now youtubeDlCheck downloads the beginning of the url's content and
checks if it's html, only when it is does it pass it off the youtube-dl
to check if it supports it.
This means more work is done for urls that youtube-dl does support,
but is probably more efficient for other urls, since it only downloads
the first chunk of content, while youtube-dl probably downloads more.
As well as the reported bug, this also fixes behavior when an url
was added with youtube-dl, but the url content has now changed from
a html page to something else. Remote.Web.checkKey used to wrongly
succeed in that situation, since youtube-dl said sure it can download
that something else.
This commit was supported by the NSF-funded DataLad project.
webdav: Checking if a non-existent file is present on Box.com triggered a
bug in its webdav support that generates an infinite series of redirects.
It seems to redirect foo to foo/ to foo/index.php to
foo/index.php/index.php ... Why a webdav endpoint would behave this way
who knows.
Deal with such problems by assuming such behavior means the file is not
present.
Can't simply disable following redirects, because the webdav endpoint could
legitimately be redirected to a new endpoint. So, when this happens
10 redirects have to be followed, before it gives up and assumes this means
the file does not exist.
This commit was supported by the NSF-funded DataLad project.
* Run curl with -S, so HTTP errors are displayed, even when
it's otherwise silent.
* When downloading in --json or --quiet mode, use curl in preference
to wget, since curl is able to display only errors to stderr, unlike
wget.
This does mean that downloadQuiet is only silent on stdout, not necessarily
on stderr, which affects a couple other calls of it. For example,
downloading the .git/config of a http remote may show an error message now,
perhaps with slightly suboptimal formatting due to other output.
This adds one extra line of output when a download is successful,
after the progress bar. I don't much like that, but wget does not provide a
way to show HTTP errors without it.
This is the kind of annoying thing that makes me not want to use a library.
conduitManagerSettings was a perfectly fine name and could have been kept
forever.
Since I want git-annex to keep building on debian stable, I need to still
support the old http-client, which required explicit calls to
closeManager, or use of withManager to get Managers to close at appropriate
times. This is not needed in the new version, and so they added a
deprecation warning. IMHO much too early, because look at the mess I had to
go through to avoid that deprecation warning while supporting both
versions..
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.
In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.
Problems noticed while doing this conversion:
* Some uses of Params "oneword" which was entirely unnecessary
overhead.
* A few places that built up a list of parameters with ++
and then used Params to split it!
Test suite passes.
That failed on OSX. The temp dir was
/var/folders/fb/pnwjj52n7fg0r9mnvpsfll180000gr/T/downloadurl
and the relative path
../../../../../../Volumes/Visitors/joeyh/git-annex/r/.git/...
Didn't work. I have no clue why, how did OSX manage to break this?
But, the relative path is longer most of the time anyway, so let's
just use the absolute path.
In this situation, curl -o exits successfully without creating the output
file.
There was already a workaround for curl file:/// but I did not realize this
also affected regular url downloads.
To fix it, pre-create the destination file before starting curl.
Since we cannot always know the size of an url before trying to download
it, let's always do this.
Note that since curl is told -C -, we have to consider if this
makes curl try to do a ranged download, which might fail on some servers
where a regular download would have succeeded. My testing indicates
this isn't a problem; since the file is empty, curl seems to not try to
do a ranged download.
Original report: https://github.com/datalad/datalad/issues/79
Curl bug report: https://github.com/bagder/curl/issues/183
Avoid using fileSize which maxes out at just 2 gb on Windows.
Instead, use hFileSize, which doesn't have a bounded size.
Fixes support for files > 2 gb on Windows.
Note that the InodeCache code only needs to compare a file size,
so it doesn't matter it the file size wraps. So it has been
left as-is. This was necessary both to avoid invalidating existing inode
caches, and because the code passed FileStatus around and would have become
more expensive if it called getFileSize.
This commit was sponsored by Christian Dietrich.
The hoary old HTTP library was only used when checking if an url exists,
when curl was not available. It had many problems, including not supporting
https at all.
Now, this is done using http-conduit for all urls that it supports. Falls
back to curl for any url that http-conduit doesn't like (probably ftp etc,
but could also be an url that its parser chokes on for whatever reason).
This adds a new dependency on http-conduit, but webdav support already
indirectly depended on that, and the s3-aws branch also uses it.
This opens up the possibility of using http-conduit for large file
downloads, but for now I've left it using wget/curl.
This commit was sponsored by Paul Tötterman.
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.