avoid url resume from 0
When downloading an url and the destination file exists but is empty, avoid using http range to resume, since a range "bytes=0-" is an unusual edge case that it's best to avoid relying on working. This is known to fix a case where importfeed downloaded a partial feed from such a server. Since importfeed uses withTmpFile, the destination always exists empty, so it would particularly tickle such problem servers. Resuming from 0 is otherwise possible, but unlikely.
This commit is contained in:
parent
06ea1c4228
commit
759fd9ea68
4 changed files with 36 additions and 2 deletions
|
@ -6,6 +6,10 @@ git-annex (7.20190616) UNRELEASED; urgency=medium
|
|||
* Other commands also run their cleanup phase using a separate job pool
|
||||
than their perform phase, which may make some of them somewhat faster
|
||||
when running concurrently as well.
|
||||
* When downloading an url and the destination file exists but is empty,
|
||||
avoid using http range to resume, since a range "bytes=0-" is an unusual
|
||||
edge case that it's best to avoid relying on working. This is known to
|
||||
fix a case where importfeed downloaded a partial feed from such a server.
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Sat, 15 Jun 2019 12:38:25 -0400
|
||||
|
||||
|
|
|
@ -375,13 +375,13 @@ download' noerror meterupdate url file uo =
|
|||
ftpport = 21
|
||||
|
||||
downloadconduit req = catchMaybeIO (getFileSize file) >>= \case
|
||||
Nothing -> runResourceT $ do
|
||||
Just sz | sz > 0 -> resumeconduit req' sz
|
||||
_ -> runResourceT $ do
|
||||
liftIO $ debugM "url" (show req')
|
||||
resp <- http req' (httpManager uo)
|
||||
if responseStatus resp == ok200
|
||||
then store zeroBytesProcessed WriteMode resp
|
||||
else showrespfailure resp
|
||||
Just sz -> resumeconduit req' sz
|
||||
where
|
||||
req' = applyRequest uo $ req
|
||||
-- Override http-client's default decompression of gzip
|
||||
|
|
|
@ -49,3 +49,5 @@ ok
|
|||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
<3
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
||||
|
|
|
@ -0,0 +1,28 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-06-20T15:18:49Z"
|
||||
content="""
|
||||
Somehow git-annex receives a truncated file from the web server,
|
||||
so it is unable to parse it.
|
||||
|
||||
That only happens when using the haskell http library to download.
|
||||
When git-annex is configured to use curl, it works.
|
||||
|
||||
So, workaround:
|
||||
|
||||
git -c annex.security.allowed-http-addresses=all -c annex.web-options=-4 annex importfeed \
|
||||
https://www.deutschlandfunk.de/podcast-deutschlandfunk-der-tag.3417.de.podcast.xml
|
||||
|
||||
git-annex addurl downloads the complete file, so the problem does not
|
||||
seem to be with the haskell http library, but something to do with how
|
||||
importfeed is using it that causes a truncation.
|
||||
|
||||
Aha, importfeed uses withTmpFile, so the destination file exists with 0
|
||||
size. This triggers a resume code path. And it looks to me like this web
|
||||
server may not handle resume very well, it appears to send ~32kb
|
||||
of data and not the whole file in that case.
|
||||
|
||||
So, the obvious fix is to not resume when the destination file is empty,
|
||||
and I've done that.
|
||||
"""]]
|
Loading…
Reference in a new issue