don't rely on exception for http 416
Fix a bug that could make resuming a download from the web fail when the entire content of the file is actually already present locally. What a mess that Request can throw exceptions or not, depending on how it's configured. Makes it very hard if you need to handle some specific http status codes in a function like this! Implementing everything two ways did not seem appealing, if possible at all, so I decided to override the Request if it did come configured to throw exception on non-2xx http status. Other exceptions, like from http-client-restricted, or due to a redirect to a non-http url, still get thrown. This commit was sponsored by Luke Shumaker on Patreon.
This commit is contained in:
parent
3991c8e43d
commit
b90b9b936d
4 changed files with 60 additions and 18 deletions
|
@ -15,6 +15,9 @@ git-annex (8.20201117) UNRELEASED; urgency=medium
|
||||||
* Fix build on Windows.
|
* Fix build on Windows.
|
||||||
* Prevent windows assistant from trying (and failing) to upgrade
|
* Prevent windows assistant from trying (and failing) to upgrade
|
||||||
itself, which has never been supported on windows.
|
itself, which has never been supported on windows.
|
||||||
|
* Fix a bug that could make resuming a download from the web fail
|
||||||
|
when the entire content of the file is actually already present
|
||||||
|
locally.
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Mon, 16 Nov 2020 09:38:32 -0400
|
-- Joey Hess <id@joeyh.name> Mon, 16 Nov 2020 09:38:32 -0400
|
||||||
|
|
||||||
|
|
|
@ -452,7 +452,12 @@ download' nocurlerror meterupdate url file uo =
|
||||||
{- Download a perhaps large file using conduit, with auto-resume
|
{- Download a perhaps large file using conduit, with auto-resume
|
||||||
- of incomplete downloads.
|
- of incomplete downloads.
|
||||||
-
|
-
|
||||||
- Does not catch exceptions.
|
- A Request can be configured to throw exceptions for non-2xx http
|
||||||
|
- status codes, or not. That configuration is overridden by this,
|
||||||
|
- and if it is unable to download, it throws an exception containing
|
||||||
|
- a user-visible explanation of the problem. (However, exceptions
|
||||||
|
- thrown for reasons other than http status codes will still be thrown
|
||||||
|
- as usual.)
|
||||||
-}
|
-}
|
||||||
downloadConduit :: MeterUpdate -> Request -> FilePath -> UrlOptions -> IO ()
|
downloadConduit :: MeterUpdate -> Request -> FilePath -> UrlOptions -> IO ()
|
||||||
downloadConduit meterupdate req file uo =
|
downloadConduit meterupdate req file uo =
|
||||||
|
@ -480,17 +485,16 @@ downloadConduit meterupdate req file uo =
|
||||||
filter ((/= hAcceptEncoding) . fst)
|
filter ((/= hAcceptEncoding) . fst)
|
||||||
(requestHeaders req)
|
(requestHeaders req)
|
||||||
, decompress = const False
|
, decompress = const False
|
||||||
|
-- Avoid throwing exceptions non-2xx http status codes,
|
||||||
|
-- since we rely on parsing the Response to handle
|
||||||
|
-- several such codes.
|
||||||
|
, checkResponse = \_ _ -> return ()
|
||||||
}
|
}
|
||||||
|
|
||||||
-- Resume download from where a previous download was interrupted,
|
-- Resume download from where a previous download was interrupted,
|
||||||
-- when supported by the http server. The server may also opt to
|
-- when supported by the http server. The server may also opt to
|
||||||
-- send the whole file rather than resuming.
|
-- send the whole file rather than resuming.
|
||||||
resumedownload sz = catchJust
|
resumedownload sz = join $ runResourceT $ do
|
||||||
(matchStatusCodeHeadersException (alreadydownloaded sz))
|
|
||||||
dl
|
|
||||||
(const noop)
|
|
||||||
where
|
|
||||||
dl = join $ runResourceT $ do
|
|
||||||
let req'' = req' { requestHeaders = resumeFromHeader sz : requestHeaders req }
|
let req'' = req' { requestHeaders = resumeFromHeader sz : requestHeaders req }
|
||||||
liftIO $ debugM "url" (show req'')
|
liftIO $ debugM "url" (show req'')
|
||||||
resp <- http req'' (httpManager uo)
|
resp <- http req'' (httpManager uo)
|
||||||
|
@ -502,6 +506,8 @@ downloadConduit meterupdate req file uo =
|
||||||
then do
|
then do
|
||||||
store zeroBytesProcessed WriteMode resp
|
store zeroBytesProcessed WriteMode resp
|
||||||
return (return ())
|
return (return ())
|
||||||
|
else if alreadydownloaded sz resp
|
||||||
|
then return (return ())
|
||||||
else do
|
else do
|
||||||
rf <- extractFromResourceT (respfailure resp)
|
rf <- extractFromResourceT (respfailure resp)
|
||||||
if responseStatus resp == unauthorized401
|
if responseStatus resp == unauthorized401
|
||||||
|
@ -510,8 +516,9 @@ downloadConduit meterupdate req file uo =
|
||||||
Just ba -> retryauthed ba
|
Just ba -> retryauthed ba
|
||||||
else return $ giveup rf
|
else return $ giveup rf
|
||||||
|
|
||||||
alreadydownloaded sz s h = s == requestedRangeNotSatisfiable416
|
alreadydownloaded sz resp
|
||||||
&& case lookup hContentRange h of
|
| responseStatus resp /= requestedRangeNotSatisfiable416 = False
|
||||||
|
| otherwise = case lookup hContentRange (responseHeaders resp) of
|
||||||
-- This could be improved by fixing
|
-- This could be improved by fixing
|
||||||
-- https://github.com/aristidb/http-types/issues/87
|
-- https://github.com/aristidb/http-types/issues/87
|
||||||
Just crh -> crh == B8.fromString ("bytes */" ++ show sz)
|
Just crh -> crh == B8.fromString ("bytes */" ++ show sz)
|
||||||
|
|
|
@ -41,3 +41,5 @@ Stock git-annex on Fedora 32:
|
||||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||||
|
|
||||||
Yes! git-annex has been working great for me so far, and is powering the bioinformatics chat podcast (https://bioinformatics.chat/). Thanks!
|
Yes! git-annex has been working great for me so far, and is powering the bioinformatics chat podcast (https://bioinformatics.chat/). Thanks!
|
||||||
|
|
||||||
|
> [[fixed|done]] --[[Joey]]
|
||||||
|
|
|
@ -0,0 +1,30 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 1"""
|
||||||
|
date="2020-11-19T16:56:45Z"
|
||||||
|
content="""
|
||||||
|
To reproduce this, interrupt git-annex after it downloads the whole file,
|
||||||
|
but before it moves it from the download location into the annex. (Or,
|
||||||
|
let it get the file, then move the object back to the temp object location.)
|
||||||
|
|
||||||
|
This is a tricky case, because if the total file size is not
|
||||||
|
known when resuming the download, how can it detect if it's got it all
|
||||||
|
already? And git-annex does not always know the total file size, eg when
|
||||||
|
git-annex addurl --relaxed is used, and then git-annex get is later used
|
||||||
|
to download the content.
|
||||||
|
|
||||||
|
What git-annex already tried to do to detect this is,
|
||||||
|
when it got a 416 it looks for a Content-Range header "bytes */$size"
|
||||||
|
where $size is the same as the size of the file on disk.
|
||||||
|
|
||||||
|
That relied on the http library throwing an exception for the 416.
|
||||||
|
Thing is, http may or may not throw exceptions for non-2xx
|
||||||
|
responses, depending on the input Request. IMHO that is a very bad design,
|
||||||
|
it leads to this kind of bug, rather than making it evident with the data
|
||||||
|
types what is going on.
|
||||||
|
|
||||||
|
Currently downloadConduit takes a Request, and assumes it throws exceptions
|
||||||
|
for 416, but not for 401. Both can't be right.
|
||||||
|
|
||||||
|
Ok, fixed this mess..
|
||||||
|
"""]]
|
Loading…
Add table
Reference in a new issue