Disable http-client's default 30 second response timeout when HEADing an url to check if it exists. Some web servers take quite a long time to answer a HEAD request.

This commit is contained in:
Joey Hess 2017-08-15 13:56:12 -04:00
parent e5109468e2
commit 69dcb08d7a
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 32 additions and 3 deletions

View file

@ -11,6 +11,9 @@ git-annex (6.20170521) UNRELEASED; urgency=medium
directories, by forking a worker process and only deleting the test
directory once it exits.
* move, copy: Support --batch.
* Disable http-client's default 30 second response timeout when HEADing
an url to check if it exists. Some web servers take quite a long time
to answer a HEAD request.
-- Joey Hess <id@joeyh.name> Sat, 17 Jun 2017 13:02:24 -0400

View file

@ -441,13 +441,11 @@ withS3HandleMaybe c gc u a = do
Just creds -> do
awscreds <- liftIO $ genCredentials creds
let awscfg = AWS.Configuration AWS.Timestamp awscreds debugMapper
bracketIO (newManager httpcfg) closeManager $ \mgr ->
bracketIO (newManager managerSettings) closeManager $ \mgr ->
a $ Just $ S3Handle mgr awscfg s3cfg
Nothing -> a Nothing
where
s3cfg = s3Configuration c
httpcfg = managerSettings
{ managerResponseTimeout = responseTimeoutNone }
s3Configuration :: RemoteConfig -> S3.S3Configuration AWS.NormalQuery
s3Configuration c = cfg

View file

@ -56,6 +56,7 @@ managerSettings = tlsManagerSettings
#else
managerSettings = conduitManagerSettings
#endif
{ managerResponseTimeout = responseTimeoutNone }
type URLString = String

View file

@ -47,3 +47,5 @@ git-annex: drop: 1 failed
[[!meta author=yoh]]
> [[done]] --[[Joey]]

View file

@ -0,0 +1,25 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2017-08-15T17:28:20Z"
content="""
The normal reason for this to happen is if the size of the file
on the website has changed. git-annex checks the reported size and if it
differs from the versioned file, it knows that the website no longer
contains the same file.
In this case, it seems to be a cgi program generating a zip file, and the
program actually generated two different zip files when I hit it twice with
wget. (So if git-annex actually did drop the only copy of the version you
downloaded, you'd not be able to download it again. Not that git-annex can know
that; this kind of thing is why trusting the web is not a good idea..) They did
have the same size, but it looks like the web server is not sending a size
header anyway.
The actual problem is the web server takes a long time to answer a HEAD request
for this URL. It takes 35 seconds before curl is able to HEAD it. I suspect
it's generating the 300 mb zip file before it gets around to finishing
the HEAD request. Not the greatest server behavior, all around.
That breaks http-client due to its default 30 second timeout. So, will remove that timeout then.
"""]]