avoid url resume from 0
When downloading an url and the destination file exists but is empty, avoid using http range to resume, since a range "bytes=0-" is an unusual edge case that it's best to avoid relying on working. This is known to fix a case where importfeed downloaded a partial feed from such a server. Since importfeed uses withTmpFile, the destination always exists empty, so it would particularly tickle such problem servers. Resuming from 0 is otherwise possible, but unlikely.
This commit is contained in:
parent
06ea1c4228
commit
759fd9ea68
4 changed files with 36 additions and 2 deletions
|
@ -6,6 +6,10 @@ git-annex (7.20190616) UNRELEASED; urgency=medium
|
||||||
* Other commands also run their cleanup phase using a separate job pool
|
* Other commands also run their cleanup phase using a separate job pool
|
||||||
than their perform phase, which may make some of them somewhat faster
|
than their perform phase, which may make some of them somewhat faster
|
||||||
when running concurrently as well.
|
when running concurrently as well.
|
||||||
|
* When downloading an url and the destination file exists but is empty,
|
||||||
|
avoid using http range to resume, since a range "bytes=0-" is an unusual
|
||||||
|
edge case that it's best to avoid relying on working. This is known to
|
||||||
|
fix a case where importfeed downloaded a partial feed from such a server.
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Sat, 15 Jun 2019 12:38:25 -0400
|
-- Joey Hess <id@joeyh.name> Sat, 15 Jun 2019 12:38:25 -0400
|
||||||
|
|
||||||
|
|
|
@ -375,13 +375,13 @@ download' noerror meterupdate url file uo =
|
||||||
ftpport = 21
|
ftpport = 21
|
||||||
|
|
||||||
downloadconduit req = catchMaybeIO (getFileSize file) >>= \case
|
downloadconduit req = catchMaybeIO (getFileSize file) >>= \case
|
||||||
Nothing -> runResourceT $ do
|
Just sz | sz > 0 -> resumeconduit req' sz
|
||||||
|
_ -> runResourceT $ do
|
||||||
liftIO $ debugM "url" (show req')
|
liftIO $ debugM "url" (show req')
|
||||||
resp <- http req' (httpManager uo)
|
resp <- http req' (httpManager uo)
|
||||||
if responseStatus resp == ok200
|
if responseStatus resp == ok200
|
||||||
then store zeroBytesProcessed WriteMode resp
|
then store zeroBytesProcessed WriteMode resp
|
||||||
else showrespfailure resp
|
else showrespfailure resp
|
||||||
Just sz -> resumeconduit req' sz
|
|
||||||
where
|
where
|
||||||
req' = applyRequest uo $ req
|
req' = applyRequest uo $ req
|
||||||
-- Override http-client's default decompression of gzip
|
-- Override http-client's default decompression of gzip
|
||||||
|
|
|
@ -49,3 +49,5 @@ ok
|
||||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||||
|
|
||||||
<3
|
<3
|
||||||
|
|
||||||
|
> [[fixed|done]] --[[Joey]]
|
||||||
|
|
|
@ -0,0 +1,28 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 1"""
|
||||||
|
date="2019-06-20T15:18:49Z"
|
||||||
|
content="""
|
||||||
|
Somehow git-annex receives a truncated file from the web server,
|
||||||
|
so it is unable to parse it.
|
||||||
|
|
||||||
|
That only happens when using the haskell http library to download.
|
||||||
|
When git-annex is configured to use curl, it works.
|
||||||
|
|
||||||
|
So, workaround:
|
||||||
|
|
||||||
|
git -c annex.security.allowed-http-addresses=all -c annex.web-options=-4 annex importfeed \
|
||||||
|
https://www.deutschlandfunk.de/podcast-deutschlandfunk-der-tag.3417.de.podcast.xml
|
||||||
|
|
||||||
|
git-annex addurl downloads the complete file, so the problem does not
|
||||||
|
seem to be with the haskell http library, but something to do with how
|
||||||
|
importfeed is using it that causes a truncation.
|
||||||
|
|
||||||
|
Aha, importfeed uses withTmpFile, so the destination file exists with 0
|
||||||
|
size. This triggers a resume code path. And it looks to me like this web
|
||||||
|
server may not handle resume very well, it appears to send ~32kb
|
||||||
|
of data and not the whole file in that case.
|
||||||
|
|
||||||
|
So, the obvious fix is to not resume when the destination file is empty,
|
||||||
|
and I've done that.
|
||||||
|
"""]]
|
Loading…
Reference in a new issue