From ff9bd9620e92e264ed2adb72ea6fc8c975622682 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 12 Nov 2018 16:08:47 -0400 Subject: [PATCH] Fix resume of download of url when the whole file content is already actually downloaded Don't much like that there's no way to distinguish between having the whole content and having an old version of the file that's bigger, but of course resuming a http transfer can always yield the wrong result if the file on the http server is changing, and git-annex will detect that when it verifies the downloaded content. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project. --- CHANGELOG | 2 ++ Utility/Url.hs | 9 +++++- ...ied_by_full_and_correct_full_download.mdwn | 2 ++ ..._786158af6b36de925afce8ba6102ae62._comment | 28 +++++++++++++++++++ 4 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download/comment_2_786158af6b36de925afce8ba6102ae62._comment diff --git a/CHANGELOG b/CHANGELOG index cce447bd8e..60c0b75bb9 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -8,6 +8,8 @@ git-annex (7.20181106) UNRELEASED; urgency=medium * Fix bash completion of "git annex" to propertly handle files with spaces and other problem characters. (Completion of "git-annex" already did.) + * Fix resume of download of url when the whole file content is + already actually downloaded. -- Joey Hess Tue, 06 Nov 2018 12:44:27 -0400 diff --git a/Utility/Url.hs b/Utility/Url.hs index 41ac2be674..c29db74352 100644 --- a/Utility/Url.hs +++ b/Utility/Url.hs @@ -348,7 +348,14 @@ download' noerror meterupdate url file uo = -- This could be improved by fixing -- https://github.com/aristidb/http-types/issues/87 Just crh -> crh == B8.fromString ("bytes */" ++ show sz) - Nothing -> False + -- Some http servers send no Content-Range header when + -- the range extends beyond the end of the file. + -- There is no way to distinguish between the file + -- being the same size on the http server, vs + -- it being shorter than the file we already have. + -- So assume we have the whole content of the file + -- already, the same as wget and curl do. + Nothing -> True -- Resume download from where a previous download was interrupted, -- when supported by the http server. The server may also opt to diff --git a/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download.mdwn b/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download.mdwn index ac953f6780..43080af33d 100644 --- a/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download.mdwn +++ b/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download.mdwn @@ -98,3 +98,5 @@ git-annex version: 6.20181011+git124-g94aa0e2f6-1~ndall+1 """]] Not sure why it ended up not moved into the proper location but I think upon redownload, size should be verified, if "Full" - try to proceed to checksum verification etc. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download/comment_2_786158af6b36de925afce8ba6102ae62._comment b/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download/comment_2_786158af6b36de925afce8ba6102ae62._comment new file mode 100644 index 0000000000..b5b985449c --- /dev/null +++ b/doc/bugs/annex_is_not_satisfied_by_full_and_correct_full_download/comment_2_786158af6b36de925afce8ba6102ae62._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2018-11-12T19:50:04Z" + content=""" +I was able to reproduce this with an apache web server. It seems apache +doesn't send back a Content-Range header when the requested range is empty, +though it does otherwise. + +Both wget and curl seem to accept that as indicating that nothing more +needs to be downloaded. + + joey@darkstar:~>wget -c http://localhost/~joey/foo -O foo + --2018-11-12 15:57:48-- http://localhost/~joey/foo + Resolving localhost (localhost)... ::1, 127.0.0.1 + Connecting to localhost (localhost)|::1|:80... connected. + HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable + + The file is already fully retrieved; nothing to do. + +Although, it's worth noting that the http server does the same thing +if a range larger than the url's size is requested.. And in this case wget +will behave the same as the above but hasn't actually downloaded the +current content of the file. So this seems like an ugly corner of http +that the two situations cannot be distinguished. + +I suppose I'll make git-annex behave the same as wget and curl do. +"""]]