Fix resume of download of url when the whole file content is already actually downloaded

Don't much like that there's no way to distinguish between having the whole
content and having an old version of the file that's bigger, but of course
resuming a http transfer can always yield the wrong result if the file on
the http server is changing, and git-annex will detect that when it
verifies the downloaded content.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
This commit is contained in:
Joey Hess 2018-11-12 16:08:47 -04:00
parent c24bdfd689
commit ff9bd9620e
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 40 additions and 1 deletions

View file

@ -8,6 +8,8 @@ git-annex (7.20181106) UNRELEASED; urgency=medium
* Fix bash completion of "git annex" to propertly handle files with
spaces and other problem characters. (Completion of "git-annex"
already did.)
* Fix resume of download of url when the whole file content is
already actually downloaded.
-- Joey Hess <id@joeyh.name> Tue, 06 Nov 2018 12:44:27 -0400

View file

@ -348,7 +348,14 @@ download' noerror meterupdate url file uo =
-- This could be improved by fixing
-- https://github.com/aristidb/http-types/issues/87
Just crh -> crh == B8.fromString ("bytes */" ++ show sz)
Nothing -> False
-- Some http servers send no Content-Range header when
-- the range extends beyond the end of the file.
-- There is no way to distinguish between the file
-- being the same size on the http server, vs
-- it being shorter than the file we already have.
-- So assume we have the whole content of the file
-- already, the same as wget and curl do.
Nothing -> True
-- Resume download from where a previous download was interrupted,
-- when supported by the http server. The server may also opt to

View file

@ -98,3 +98,5 @@ git-annex version: 6.20181011+git124-g94aa0e2f6-1~ndall+1
"""]]
Not sure why it ended up not moved into the proper location but I think upon redownload, size should be verified, if "Full" - try to proceed to checksum verification etc.
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2018-11-12T19:50:04Z"
content="""
I was able to reproduce this with an apache web server. It seems apache
doesn't send back a Content-Range header when the requested range is empty,
though it does otherwise.
Both wget and curl seem to accept that as indicating that nothing more
needs to be downloaded.
joey@darkstar:~>wget -c http://localhost/~joey/foo -O foo
--2018-11-12 15:57:48-- http://localhost/~joey/foo
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:80... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable
The file is already fully retrieved; nothing to do.
Although, it's worth noting that the http server does the same thing
if a range larger than the url's size is requested.. And in this case wget
will behave the same as the above but hasn't actually downloaded the
current content of the file. So this seems like an ugly corner of http
that the two situations cannot be distinguished.
I suppose I'll make git-annex behave the same as wget and curl do.
"""]]