webdav: Checking if a non-existent file is present on Box.com triggered a
bug in its webdav support that generates an infinite series of redirects.
It seems to redirect foo to foo/ to foo/index.php to
foo/index.php/index.php ... Why a webdav endpoint would behave this way
who knows.
Deal with such problems by assuming such behavior means the file is not
present.
Can't simply disable following redirects, because the webdav endpoint could
legitimately be redirected to a new endpoint. So, when this happens
10 redirects have to be followed, before it gives up and assumes this means
the file does not exist.
This commit was supported by the NSF-funded DataLad project.
* Run curl with -S, so HTTP errors are displayed, even when
it's otherwise silent.
* When downloading in --json or --quiet mode, use curl in preference
to wget, since curl is able to display only errors to stderr, unlike
wget.
This does mean that downloadQuiet is only silent on stdout, not necessarily
on stderr, which affects a couple other calls of it. For example,
downloading the .git/config of a http remote may show an error message now,
perhaps with slightly suboptimal formatting due to other output.
This adds one extra line of output when a download is successful,
after the progress bar. I don't much like that, but wget does not provide a
way to show HTTP errors without it.
This is the kind of annoying thing that makes me not want to use a library.
conduitManagerSettings was a perfectly fine name and could have been kept
forever.
Since I want git-annex to keep building on debian stable, I need to still
support the old http-client, which required explicit calls to
closeManager, or use of withManager to get Managers to close at appropriate
times. This is not needed in the new version, and so they added a
deprecation warning. IMHO much too early, because look at the mess I had to
go through to avoid that deprecation warning while supporting both
versions..
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.
In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.
Problems noticed while doing this conversion:
* Some uses of Params "oneword" which was entirely unnecessary
overhead.
* A few places that built up a list of parameters with ++
and then used Params to split it!
Test suite passes.
That failed on OSX. The temp dir was
/var/folders/fb/pnwjj52n7fg0r9mnvpsfll180000gr/T/downloadurl
and the relative path
../../../../../../Volumes/Visitors/joeyh/git-annex/r/.git/...
Didn't work. I have no clue why, how did OSX manage to break this?
But, the relative path is longer most of the time anyway, so let's
just use the absolute path.
In this situation, curl -o exits successfully without creating the output
file.
There was already a workaround for curl file:/// but I did not realize this
also affected regular url downloads.
To fix it, pre-create the destination file before starting curl.
Since we cannot always know the size of an url before trying to download
it, let's always do this.
Note that since curl is told -C -, we have to consider if this
makes curl try to do a ranged download, which might fail on some servers
where a regular download would have succeeded. My testing indicates
this isn't a problem; since the file is empty, curl seems to not try to
do a ranged download.
Original report: https://github.com/datalad/datalad/issues/79
Curl bug report: https://github.com/bagder/curl/issues/183
Avoid using fileSize which maxes out at just 2 gb on Windows.
Instead, use hFileSize, which doesn't have a bounded size.
Fixes support for files > 2 gb on Windows.
Note that the InodeCache code only needs to compare a file size,
so it doesn't matter it the file size wraps. So it has been
left as-is. This was necessary both to avoid invalidating existing inode
caches, and because the code passed FileStatus around and would have become
more expensive if it called getFileSize.
This commit was sponsored by Christian Dietrich.
The hoary old HTTP library was only used when checking if an url exists,
when curl was not available. It had many problems, including not supporting
https at all.
Now, this is done using http-conduit for all urls that it supports. Falls
back to curl for any url that http-conduit doesn't like (probably ftp etc,
but could also be an url that its parser chokes on for whatever reason).
This adds a new dependency on http-conduit, but webdav support already
indirectly depended on that, and the s3-aws branch also uses it.
This opens up the possibility of using http-conduit for large file
downloads, but for now I've left it using wget/curl.
This commit was sponsored by Paul Tötterman.
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
addurl: Improve message when adding url with wrong size to existing file.
Before the message suggested the url didn't exist.
Fixed handling of URL keys that have no recorded size. Before, if the key
has no size, the url also had to not declare any size, which was unlikely
and wrong, or it was taken to not exist. This probably would mostly affect
keys that were added to the annex with addurl --relaxed.
Overridable with --user-agent option.
Not yet done for S3 or WebDAV due to limitations of libraries used --
nether allows a user-agent header to be specified.
This commit sponsored by Michael Zehrer.