This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.
In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.
Problems noticed while doing this conversion:
* Some uses of Params "oneword" which was entirely unnecessary
overhead.
* A few places that built up a list of parameters with ++
and then used Params to split it!
Test suite passes.
That failed on OSX. The temp dir was
/var/folders/fb/pnwjj52n7fg0r9mnvpsfll180000gr/T/downloadurl
and the relative path
../../../../../../Volumes/Visitors/joeyh/git-annex/r/.git/...
Didn't work. I have no clue why, how did OSX manage to break this?
But, the relative path is longer most of the time anyway, so let's
just use the absolute path.
In this situation, curl -o exits successfully without creating the output
file.
There was already a workaround for curl file:/// but I did not realize this
also affected regular url downloads.
To fix it, pre-create the destination file before starting curl.
Since we cannot always know the size of an url before trying to download
it, let's always do this.
Note that since curl is told -C -, we have to consider if this
makes curl try to do a ranged download, which might fail on some servers
where a regular download would have succeeded. My testing indicates
this isn't a problem; since the file is empty, curl seems to not try to
do a ranged download.
Original report: https://github.com/datalad/datalad/issues/79
Curl bug report: https://github.com/bagder/curl/issues/183
Avoid using fileSize which maxes out at just 2 gb on Windows.
Instead, use hFileSize, which doesn't have a bounded size.
Fixes support for files > 2 gb on Windows.
Note that the InodeCache code only needs to compare a file size,
so it doesn't matter it the file size wraps. So it has been
left as-is. This was necessary both to avoid invalidating existing inode
caches, and because the code passed FileStatus around and would have become
more expensive if it called getFileSize.
This commit was sponsored by Christian Dietrich.
The hoary old HTTP library was only used when checking if an url exists,
when curl was not available. It had many problems, including not supporting
https at all.
Now, this is done using http-conduit for all urls that it supports. Falls
back to curl for any url that http-conduit doesn't like (probably ftp etc,
but could also be an url that its parser chokes on for whatever reason).
This adds a new dependency on http-conduit, but webdav support already
indirectly depended on that, and the s3-aws branch also uses it.
This opens up the possibility of using http-conduit for large file
downloads, but for now I've left it using wget/curl.
This commit was sponsored by Paul Tötterman.
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
addurl: Improve message when adding url with wrong size to existing file.
Before the message suggested the url didn't exist.
Fixed handling of URL keys that have no recorded size. Before, if the key
has no size, the url also had to not declare any size, which was unlikely
and wrong, or it was taken to not exist. This probably would mostly affect
keys that were added to the annex with addurl --relaxed.
Overridable with --user-agent option.
Not yet done for S3 or WebDAV due to limitations of libraries used --
nether allows a user-agent header to be specified.
This commit sponsored by Michael Zehrer.
<RichiH> i richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn % rm /home/richih/work/git/debconf-share/.git/annex/tmp/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG
<RichiH> richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn % git annex get P8060008.JPG
<RichiH> get P8060008.JPG (from website...) --2013-08-21 21:42:45-- http://annex.debconf.org/debconf-share/.git//annex/objects/1a4/67d/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG
<RichiH> Resolving annex.debconf.org (annex.debconf.org)... 5.153.231.227, 2001:41c8:1000:19::227:2
<RichiH> Connecting to annex.debconf.org (annex.debconf.org)|5.153.231.227|:80... connected.
<RichiH> HTTP request sent, awaiting response... 404 Not Found
<RichiH> 2013-08-21 21:42:45 ERROR 404: Not Found.
<RichiH> File `/home/richih/work/git/debconf-share/.git/annex/tmp/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG' already there; not retrieving.
<RichiH> Unable to access these remotes: website
<RichiH> Try making some of these repositories available:
<RichiH> 3e0356ac-0743-11e3-83a5-1be63124a102 -- website (annex.debconf.org)
<RichiH> a7495021-9f2d-474e-80c7-34d29d09fec6 -- chrysn@hephaistos:~/data/projects/debconf13/debconf-share
<RichiH> eb8990f7-84cd-4e6b-b486-a5e71efbd073 -- joeyh passport usb drive
<RichiH> f415f118-f428-4c68-be66-c91501da3a93 -- joeyh laptop
<RichiH> failed
<RichiH> git-annex: get: 1 failed
<RichiH> richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn %
I was not able to reproduce the failure, but I did reproduce that
wget -O http://404/ results in an empty file being written.
* addurl: Escape invalid characters in urls, rather than failing to
use an invalid url.
* addurl: Properly handle url-escaped characters in file:// urls.