Fix resume of download of url when the whole file content is already actually downloaded

Don't much like that there's no way to distinguish between having the whole content and having an old version of the file that's bigger, but of course resuming a http transfer can always yield the wrong result if the file on the http server is changing, and git-annex will detect that when it verifies the downloaded content. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-12 16:08:47 -04:00 · 2018-11-12 16:08:47 -04:00 · ff9bd9620e
commit ff9bd9620e
parent c24bdfd689
4 changed files with 40 additions and 1 deletions
--- a/2
+++ b/2
@ -8,6 +8,8 @@ git-annex (7.20181106) UNRELEASED; urgency=medium
  * Fix bash completion of "git annex" to propertly handle files with
    spaces and other problem characters. (Completion of "git-annex"
    already did.)
+  * Fix resume of download of url when the whole file content is
+    already actually downloaded.

 -- Joey Hess <id@joeyh.name>  Tue, 06 Nov 2018 12:44:27 -0400

--- a/Utility/Url.hs
+++ b/Utility/Url.hs
@ -348,7 +348,14 @@ download' noerror meterupdate url file uo =
 			-- This could be improved by fixing
 			-- https://github.com/aristidb/http-types/issues/87
 			Just crh -> crh == B8.fromString ("bytes */" ++ show sz)
-			Nothing -> False
+			-- Some http servers send no Content-Range header when
+			-- the range extends beyond the end of the file.
+			-- There is no way to distinguish between the file
+			-- being the same size on the http server, vs
+			-- it being shorter than the file we already have.
+			-- So assume we have the whole content of the file
+			-- already, the same as wget and curl do.
+			Nothing -> True

 	-- Resume download from where a previous download was interrupted, 
 	-- when supported by the http server. The server may also opt to
--- a/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download.mdwn
+++ b/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download.mdwn
@ -98,3 +98,5 @@ git-annex version: 6.20181011+git124-g94aa0e2f6-1~ndall+1
 """]]

 Not sure why it ended up not moved into the proper location but I think upon redownload, size should be verified, if "Full" - try to proceed to checksum verification etc.
+
+> [[fixed|done]] --[[Joey]]
--- a/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download/comment_2_786158af6b36de925afce8ba6102ae62._comment
+++ b/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download/comment_2_786158af6b36de925afce8ba6102ae62._comment
@ -0,0 +1,28 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2018-11-12T19:50:04Z"
+ content="""
+I was able to reproduce this with an apache web server. It seems apache
+doesn't send back a Content-Range header when the requested range is empty,
+though it does otherwise.
+
+Both wget and curl seem to accept that as indicating that nothing more
+needs to be downloaded.
+
+	joey@darkstar:~>wget  -c http://localhost/~joey/foo -O foo
+	--2018-11-12 15:57:48--  http://localhost/~joey/foo
+	Resolving localhost (localhost)... ::1, 127.0.0.1
+	Connecting to localhost (localhost)|::1|:80... connected.
+	HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable
+	
+	    The file is already fully retrieved; nothing to do.
+
+Although, it's worth noting that the http server does the same thing
+if a range larger than the url's size is requested.. And in this case wget
+will behave the same as the above but hasn't actually downloaded the
+current content of the file. So this seems like an ugly corner of http
+that the two situations cannot be distinguished.
+
+I suppose I'll make git-annex behave the same as wget and curl do.
+"""]]