don't rely on exception for http 416

Fix a bug that could make resuming a download from the web fail when the
entire content of the file is actually already present locally.

What a mess that Request can throw exceptions or not, depending on how
it's configured. Makes it very hard if you need to handle some specific
http status codes in a function like this! Implementing everything two
ways did not seem appealing, if possible at all, so I decided to
override the Request if it did come configured to throw exception on
non-2xx http status. Other exceptions, like from http-client-restricted,
or due to a redirect to a non-http url, still get thrown.

This commit was sponsored by Luke Shumaker on Patreon.
This commit is contained in:
Joey Hess 2020-11-19 14:03:00 -04:00
parent 3991c8e43d
commit b90b9b936d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 60 additions and 18 deletions

View file

@ -15,6 +15,9 @@ git-annex (8.20201117) UNRELEASED; urgency=medium
* Fix build on Windows.
* Prevent windows assistant from trying (and failing) to upgrade
itself, which has never been supported on windows.
* Fix a bug that could make resuming a download from the web fail
when the entire content of the file is actually already present
locally.
-- Joey Hess <id@joeyh.name> Mon, 16 Nov 2020 09:38:32 -0400

View file

@ -452,7 +452,12 @@ download' nocurlerror meterupdate url file uo =
{- Download a perhaps large file using conduit, with auto-resume
- of incomplete downloads.
-
- Does not catch exceptions.
- A Request can be configured to throw exceptions for non-2xx http
- status codes, or not. That configuration is overridden by this,
- and if it is unable to download, it throws an exception containing
- a user-visible explanation of the problem. (However, exceptions
- thrown for reasons other than http status codes will still be thrown
- as usual.)
-}
downloadConduit :: MeterUpdate -> Request -> FilePath -> UrlOptions -> IO ()
downloadConduit meterupdate req file uo =
@ -480,28 +485,29 @@ downloadConduit meterupdate req file uo =
filter ((/= hAcceptEncoding) . fst)
(requestHeaders req)
, decompress = const False
-- Avoid throwing exceptions non-2xx http status codes,
-- since we rely on parsing the Response to handle
-- several such codes.
, checkResponse = \_ _ -> return ()
}
-- Resume download from where a previous download was interrupted,
-- when supported by the http server. The server may also opt to
-- send the whole file rather than resuming.
resumedownload sz = catchJust
(matchStatusCodeHeadersException (alreadydownloaded sz))
dl
(const noop)
where
dl = join $ runResourceT $ do
let req'' = req' { requestHeaders = resumeFromHeader sz : requestHeaders req }
liftIO $ debugM "url" (show req'')
resp <- http req'' (httpManager uo)
if responseStatus resp == partialContent206
resumedownload sz = join $ runResourceT $ do
let req'' = req' { requestHeaders = resumeFromHeader sz : requestHeaders req }
liftIO $ debugM "url" (show req'')
resp <- http req'' (httpManager uo)
if responseStatus resp == partialContent206
then do
store (toBytesProcessed sz) AppendMode resp
return (return ())
else if responseStatus resp == ok200
then do
store (toBytesProcessed sz) AppendMode resp
store zeroBytesProcessed WriteMode resp
return (return ())
else if responseStatus resp == ok200
then do
store zeroBytesProcessed WriteMode resp
return (return ())
else if alreadydownloaded sz resp
then return (return ())
else do
rf <- extractFromResourceT (respfailure resp)
if responseStatus resp == unauthorized401
@ -510,8 +516,9 @@ downloadConduit meterupdate req file uo =
Just ba -> retryauthed ba
else return $ giveup rf
alreadydownloaded sz s h = s == requestedRangeNotSatisfiable416
&& case lookup hContentRange h of
alreadydownloaded sz resp
| responseStatus resp /= requestedRangeNotSatisfiable416 = False
| otherwise = case lookup hContentRange (responseHeaders resp) of
-- This could be improved by fixing
-- https://github.com/aristidb/http-types/issues/87
Just crh -> crh == B8.fromString ("bytes */" ++ show sz)

View file

@ -41,3 +41,5 @@ Stock git-annex on Fedora 32:
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes! git-annex has been working great for me so far, and is powering the bioinformatics chat podcast (https://bioinformatics.chat/). Thanks!
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-11-19T16:56:45Z"
content="""
To reproduce this, interrupt git-annex after it downloads the whole file,
but before it moves it from the download location into the annex. (Or,
let it get the file, then move the object back to the temp object location.)
This is a tricky case, because if the total file size is not
known when resuming the download, how can it detect if it's got it all
already? And git-annex does not always know the total file size, eg when
git-annex addurl --relaxed is used, and then git-annex get is later used
to download the content.
What git-annex already tried to do to detect this is,
when it got a 416 it looks for a Content-Range header "bytes */$size"
where $size is the same as the size of the file on disk.
That relied on the http library throwing an exception for the 416.
Thing is, http may or may not throw exceptions for non-2xx
responses, depending on the input Request. IMHO that is a very bad design,
it leads to this kind of bug, rather than making it evident with the data
types what is going on.
Currently downloadConduit takes a Request, and assumes it throws exceptions
for 416, but not for 401. Both can't be right.
Ok, fixed this mess..
"""]]