avoid url resume from 0
When downloading an url and the destination file exists but is empty, avoid using http range to resume, since a range "bytes=0-" is an unusual edge case that it's best to avoid relying on working. This is known to fix a case where importfeed downloaded a partial feed from such a server. Since importfeed uses withTmpFile, the destination always exists empty, so it would particularly tickle such problem servers. Resuming from 0 is otherwise possible, but unlikely.
This commit is contained in:
		
					parent
					
						
							
								06ea1c4228
							
						
					
				
			
			
				commit
				
					
						759fd9ea68
					
				
			
		
					 4 changed files with 36 additions and 2 deletions
				
			
		| 
						 | 
					@ -6,6 +6,10 @@ git-annex (7.20190616) UNRELEASED; urgency=medium
 | 
				
			||||||
  * Other commands also run their cleanup phase using a separate job pool
 | 
					  * Other commands also run their cleanup phase using a separate job pool
 | 
				
			||||||
    than their perform phase, which may make some of them somewhat faster
 | 
					    than their perform phase, which may make some of them somewhat faster
 | 
				
			||||||
    when running concurrently as well.
 | 
					    when running concurrently as well.
 | 
				
			||||||
 | 
					  * When downloading an url and the destination file exists but is empty,
 | 
				
			||||||
 | 
					    avoid using http range to resume, since a range "bytes=0-" is an unusual
 | 
				
			||||||
 | 
					    edge case that it's best to avoid relying on working. This is known to
 | 
				
			||||||
 | 
					    fix a case where importfeed downloaded a partial feed from such a server.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 -- Joey Hess <id@joeyh.name>  Sat, 15 Jun 2019 12:38:25 -0400
 | 
					 -- Joey Hess <id@joeyh.name>  Sat, 15 Jun 2019 12:38:25 -0400
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -375,13 +375,13 @@ download' noerror meterupdate url file uo =
 | 
				
			||||||
	ftpport = 21
 | 
						ftpport = 21
 | 
				
			||||||
 | 
					
 | 
				
			||||||
	downloadconduit req = catchMaybeIO (getFileSize file) >>= \case
 | 
						downloadconduit req = catchMaybeIO (getFileSize file) >>= \case
 | 
				
			||||||
		Nothing -> runResourceT $ do
 | 
							Just sz | sz > 0 -> resumeconduit req' sz
 | 
				
			||||||
 | 
							_ -> runResourceT $ do
 | 
				
			||||||
			liftIO $ debugM "url" (show req')
 | 
								liftIO $ debugM "url" (show req')
 | 
				
			||||||
			resp <- http req' (httpManager uo)
 | 
								resp <- http req' (httpManager uo)
 | 
				
			||||||
			if responseStatus resp == ok200
 | 
								if responseStatus resp == ok200
 | 
				
			||||||
				then store zeroBytesProcessed WriteMode resp
 | 
									then store zeroBytesProcessed WriteMode resp
 | 
				
			||||||
				else showrespfailure resp
 | 
									else showrespfailure resp
 | 
				
			||||||
		Just sz -> resumeconduit req' sz
 | 
					 | 
				
			||||||
	  where
 | 
						  where
 | 
				
			||||||
		req' = applyRequest uo $ req
 | 
							req' = applyRequest uo $ req
 | 
				
			||||||
			-- Override http-client's default decompression of gzip
 | 
								-- Override http-client's default decompression of gzip
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -49,3 +49,5 @@ ok
 | 
				
			||||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
 | 
					### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
<3
 | 
					<3
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					> [[fixed|done]] --[[Joey]]
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -0,0 +1,28 @@
 | 
				
			||||||
 | 
					[[!comment format=mdwn
 | 
				
			||||||
 | 
					 username="joey"
 | 
				
			||||||
 | 
					 subject="""comment 1"""
 | 
				
			||||||
 | 
					 date="2019-06-20T15:18:49Z"
 | 
				
			||||||
 | 
					 content="""
 | 
				
			||||||
 | 
					Somehow git-annex receives a truncated file from the web server,
 | 
				
			||||||
 | 
					so it is unable to parse it.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					That only happens when using the haskell http library to download.
 | 
				
			||||||
 | 
					When git-annex is configured to use curl, it works.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					So, workaround:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
						git -c annex.security.allowed-http-addresses=all -c annex.web-options=-4 annex importfeed \
 | 
				
			||||||
 | 
							https://www.deutschlandfunk.de/podcast-deutschlandfunk-der-tag.3417.de.podcast.xml
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					git-annex addurl downloads the complete file, so the problem does not
 | 
				
			||||||
 | 
					seem to be with the haskell http library, but something to do with how
 | 
				
			||||||
 | 
					importfeed is using it that causes a truncation.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Aha, importfeed uses withTmpFile, so the destination file exists with 0
 | 
				
			||||||
 | 
					size. This triggers a resume code path. And it looks to me like this web
 | 
				
			||||||
 | 
					server may not handle resume very well, it appears to send ~32kb
 | 
				
			||||||
 | 
					of data and not the whole file in that case.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					So, the obvious fix is to not resume when the destination file is empty,
 | 
				
			||||||
 | 
					and I've done that.
 | 
				
			||||||
 | 
					"""]]
 | 
				
			||||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue