diff --git a/CHANGELOG b/CHANGELOG index 7d2f1453a7..8b050c6db6 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -6,6 +6,10 @@ git-annex (7.20190616) UNRELEASED; urgency=medium * Other commands also run their cleanup phase using a separate job pool than their perform phase, which may make some of them somewhat faster when running concurrently as well. + * When downloading an url and the destination file exists but is empty, + avoid using http range to resume, since a range "bytes=0-" is an unusual + edge case that it's best to avoid relying on working. This is known to + fix a case where importfeed downloaded a partial feed from such a server. -- Joey Hess Sat, 15 Jun 2019 12:38:25 -0400 diff --git a/Utility/Url.hs b/Utility/Url.hs index 865bc87629..2a80523fca 100644 --- a/Utility/Url.hs +++ b/Utility/Url.hs @@ -375,13 +375,13 @@ download' noerror meterupdate url file uo = ftpport = 21 downloadconduit req = catchMaybeIO (getFileSize file) >>= \case - Nothing -> runResourceT $ do + Just sz | sz > 0 -> resumeconduit req' sz + _ -> runResourceT $ do liftIO $ debugM "url" (show req') resp <- http req' (httpManager uo) if responseStatus resp == ok200 then store zeroBytesProcessed WriteMode resp else showrespfailure resp - Just sz -> resumeconduit req' sz where req' = applyRequest uo $ req -- Override http-client's default decompression of gzip diff --git a/doc/bugs/importfeed___34__parsing_the_feed_failed__34___without_further_info.mdwn b/doc/bugs/importfeed___34__parsing_the_feed_failed__34___without_further_info.mdwn index 789f597d4b..937ec5a9e3 100644 --- a/doc/bugs/importfeed___34__parsing_the_feed_failed__34___without_further_info.mdwn +++ b/doc/bugs/importfeed___34__parsing_the_feed_failed__34___without_further_info.mdwn @@ -49,3 +49,5 @@ ok ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) <3 + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/importfeed___34__parsing_the_feed_failed__34___without_further_info/comment_1_25df292c558fb470b5db4a67461ce788._comment b/doc/bugs/importfeed___34__parsing_the_feed_failed__34___without_further_info/comment_1_25df292c558fb470b5db4a67461ce788._comment new file mode 100644 index 0000000000..e0f053330d --- /dev/null +++ b/doc/bugs/importfeed___34__parsing_the_feed_failed__34___without_further_info/comment_1_25df292c558fb470b5db4a67461ce788._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2019-06-20T15:18:49Z" + content=""" +Somehow git-annex receives a truncated file from the web server, +so it is unable to parse it. + +That only happens when using the haskell http library to download. +When git-annex is configured to use curl, it works. + +So, workaround: + + git -c annex.security.allowed-http-addresses=all -c annex.web-options=-4 annex importfeed \ + https://www.deutschlandfunk.de/podcast-deutschlandfunk-der-tag.3417.de.podcast.xml + +git-annex addurl downloads the complete file, so the problem does not +seem to be with the haskell http library, but something to do with how +importfeed is using it that causes a truncation. + +Aha, importfeed uses withTmpFile, so the destination file exists with 0 +size. This triggers a resume code path. And it looks to me like this web +server may not handle resume very well, it appears to send ~32kb +of data and not the whole file in that case. + +So, the obvious fix is to not resume when the destination file is empty, +and I've done that. +"""]]