Use http-conduit for url downloads by default, annex.web-options enables curl

* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2018-04-06 17:00:46 -04:00
parent 0791c24221
commit c34152777b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
17 changed files with 104 additions and 181 deletions

View file

@ -939,16 +939,14 @@ saveState nocommit = doSideAction $ do
{- Downloads content from any of a list of urls. -}
downloadUrl :: Key -> MeterUpdate -> [Url.URLString] -> FilePath -> Annex Bool
downloadUrl k p urls file = meteredFile file (Just p) k $
downloadUrl k p urls file =
-- Poll the file to handle configurations where an external
-- download command is used.
meteredFile file (Just p) k $
go =<< annexWebDownloadCommand <$> Annex.getGitConfig
where
go Nothing = do
a <- ifM commandProgressDisabled
( return Url.downloadQuiet
, return Url.download
)
Url.withUrlOptions $ \uo ->
liftIO $ anyM (\u -> a u file uo) urls
go Nothing = Url.withUrlOptions $ \uo ->
liftIO $ anyM (\u -> Url.download p u file uo) urls
go (Just basecmd) = anyM (downloadcmd basecmd) urls
downloadcmd basecmd url =
progressCommand "sh" [Param "-c", Param $ gencmd url basecmd]

View file

@ -1,5 +1,5 @@
{- Url downloading, with git-annex user agent and configured http
- headers and wget/curl options.
- headers and curl options.
-
- Copyright 2013-2018 Joey Hess <id@joeyh.name>
-

View file

@ -36,6 +36,7 @@ import Utility.Tmp.Dir
import Utility.UserInfo
import Utility.Gpg
import Utility.FileMode
import Utility.Metered
import qualified Utility.Lsof as Lsof
import qualified BuildInfo
import qualified Utility.Url as Url
@ -322,8 +323,8 @@ downloadDistributionInfo = do
liftIO $ withTmpDir "git-annex.tmp" $ \tmpdir -> do
let infof = tmpdir </> "info"
let sigf = infof ++ ".sig"
ifM (Url.downloadQuiet distributionInfoUrl infof uo
<&&> Url.downloadQuiet distributionInfoSigUrl sigf uo
ifM (Url.download nullMeterUpdate distributionInfoUrl infof uo
<&&> Url.download nullMeterUpdate distributionInfoSigUrl sigf uo
<&&> verifyDistributionSig gpgcmd sigf)
( parseInfoFile <$> readFileStrict infof
, return Nothing

View file

@ -69,14 +69,6 @@ preferredBundledPrograms = catMaybes
, Just "rsync"
#ifndef mingw32_HOST_OS
, Just "sh"
#endif
#ifndef mingw32_HOST_OS
#ifndef darwin_HOST_OS
-- wget on OSX has been problematic, looking for certs in the wrong
-- places. Don't ship it, use curl or the OSX's own wget if it has
-- one.
, ifset BuildInfo.wget "wget"
#endif
#endif
, BuildInfo.lsof
, BuildInfo.gcrypt

View file

@ -6,14 +6,11 @@ module Build.Configure where
import Build.TestConfig
import Build.Version
import Utility.PartialPrelude
import Utility.Process
import Utility.SafeCommand
import Utility.ExternalSHA
import Utility.Env.Basic
import Utility.Exception
import qualified Git.Version
import Utility.DottedVersion
import Utility.Directory
import Control.Monad.IfElse
@ -34,8 +31,6 @@ tests =
, TestCase "xargs -0" $ testCmd "xargs_0" "xargs -0 </dev/null"
, TestCase "rsync" $ testCmd "rsync" "rsync --version >/dev/null"
, TestCase "curl" $ testCmd "curl" "curl --version >/dev/null"
, TestCase "wget" $ testCmd "wget" "wget --version >/dev/null"
, TestCase "wget unclutter options" checkWgetUnclutter
, TestCase "bup" $ testCmd "bup" "bup --version >/dev/null"
, TestCase "nice" $ testCmd "nice" "nice true >/dev/null"
, TestCase "ionice" $ testCmd "ionice" "ionice -c3 true >/dev/null"
@ -106,19 +101,6 @@ getGitVersion = go =<< getEnv "FORCE_GIT_VERSION"
error $ "installed git version " ++ show v ++ " is too old! (Need " ++ show oldestallowed ++ " or newer)"
return $ Config "gitversion" $ StringConfig $ show v
checkWgetUnclutter :: Test
checkWgetUnclutter = Config "wgetunclutter" . BoolConfig
. maybe False (>= normalize "1.16")
<$> getWgetVersion
getWgetVersion :: IO (Maybe DottedVersion)
getWgetVersion = catchDefaultIO Nothing $
extract <$> readProcess "wget" ["--version"]
where
extract s = case lines s of
[] -> Nothing
(l:_) -> normalize <$> headMaybe (drop 2 $ words l)
getSshConnectionCaching :: Test
getSshConnectionCaching = Config "sshconnectioncaching" . BoolConfig <$>
boolSystem "sh" [Param "-c", Param "ssh -o ControlPersist=yes -V >/dev/null 2>/dev/null"]

View file

@ -9,7 +9,8 @@
- for that.
-
- To build the installer, git-annex should already be built to
- ./git-annex.exe and the necessary utility programs (rsync and wget)
- ./git-annex.exe and the necessary utility programs
- (specifically rsync)
- already installed in PATH from msys32.
-
- Copyright 2013-2015 Joey Hess <id@joeyh.name>

View file

@ -1,6 +1,13 @@
git-annex (6.20180317) UNRELEASED; urgency=medium
* Added adb special remote which allows exporting files to Android devices.
* For url downloads, git-annex now defaults to using a http library,
rather than wget or curl. But, if annex.web-options is set, it will
use curl. To use the .netrc file, run:
git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
git-annex builds).
* Enable HTTP connection reuse across multiple files for improved speed.
* Fix calculation of estimated completion for progress meter.
* OSX app: Work around libz/libPng/ImageIO.framework version skew
by not bundling libz, assuming OSX includes a suitable libz.1.dylib.
@ -12,8 +19,6 @@ git-annex (6.20180317) UNRELEASED; urgency=medium
don't copy the data metadata from the old version of the file,
instead use the mtime of the file.
* Avoid running annex.http-headers-command more than once.
* Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit.
* info: Added "combined size of repositories containing these files"
stat when run on a directory.
* info: Changed sorting of numcopies stats table, so it's ordered

View file

@ -27,6 +27,7 @@ import Types.KeySource
import Types.UrlContents
import Annex.FileMatcher
import Logs.Location
import Messages.Progress
import Utility.Metered
import Utility.FileSystemEncoding
import Utility.HtmlDetect
@ -260,9 +261,8 @@ downloadWeb o url urlinfo file =
go =<< downloadWith' downloader urlkey webUUID url (AssociatedFile (Just file))
where
urlkey = addSizeUrlKey urlinfo $ Backend.URL.fromUrl url Nothing
downloader f p = do
showOutput
downloadUrl urlkey p [url] f
downloader f p = metered (Just p) urlkey (pure Nothing) $
\_ p' -> downloadUrl urlkey p' [url] f
go Nothing = return Nothing
-- If we downloaded a html file, try to use youtube-dl to
-- extract embedded media.

View file

@ -33,6 +33,7 @@ import Logs.Web
import Logs.File
import qualified Utility.Format
import Utility.Tmp
import Utility.Metered
import Command.AddUrl (addUrlFile, downloadRemoteFile, parseDownloadOptions, DownloadOptions(..))
import Annex.UUID
import Backend.URL (fromUrl)
@ -148,12 +149,10 @@ findDownloads u = go =<< downloadFeed u
downloadFeed :: URLString -> Annex (Maybe Feed)
downloadFeed url
| Url.parseURIRelaxed url == Nothing = giveup "invalid feed url"
| otherwise = do
showOutput
Url.withUrlOptions $ \uo ->
| otherwise = Url.withUrlOptions $ \uo ->
liftIO $ withTmpFile "feed" $ \f h -> do
hClose h
ifM (Url.download url f uo)
ifM (Url.download nullMeterUpdate url f uo)
( parseFeedString <$> readFileStrict f
, return Nothing
)

View file

@ -193,13 +193,13 @@ downloadTorrentFile u = do
( return True
, do
showAction "downloading torrent file"
showOutput
createAnnexDirectory (parentDir torrent)
if isTorrentMagnetUrl u
then do
tmpdir <- tmpTorrentDir u
let metadir = tmpdir </> "meta"
createAnnexDirectory metadir
showOutput
ok <- downloadMagnetLink u metadir torrent
liftIO $ removeDirectoryRecursive metadir
return ok
@ -208,7 +208,7 @@ downloadTorrentFile u = do
withTmpFileIn misctmp "torrent" $ \f h -> do
liftIO $ hClose h
ok <- Url.withUrlOptions $
liftIO . Url.download u f
liftIO . Url.download nullMeterUpdate u f
when ok $
liftIO $ renameFile f torrent
return ok

View file

@ -252,7 +252,7 @@ tryGitConfigRead autoinit r
v <- liftIO $ withTmpFile "git-annex.tmp" $ \tmpfile h -> do
hClose h
let url = Git.repoLocation r ++ "/config"
ifM (Url.downloadQuiet url tmpfile uo)
ifM (Url.download nullMeterUpdate url tmpfile uo)
( Just <$> pipedconfig "git" [Param "config", Param "--null", Param "--list", Param "--file", File tmpfile]
, return Nothing
)

View file

@ -17,6 +17,7 @@ import Annex.Content
import Config.Cost
import Logs.Web
import Annex.UUID
import Messages.Progress
import Utility.Metered
import qualified Annex.Url as Url
import Annex.YoutubeDl
@ -74,13 +75,14 @@ downloadKey key _af dest p = unVerified $ get =<< getWebUrls key
get [] = do
warning "no known url"
return False
get urls = do
showOutput -- make way for download progress bar
untilTrue urls $ \u -> do
get urls = untilTrue urls $ \u -> do
let (u', downloader) = getDownloader u
case downloader of
YoutubeDownloader -> youtubeDlTo key u' dest
_ -> downloadUrl key p [u'] dest
YoutubeDownloader -> do
showOutput
youtubeDlTo key u' dest
_ -> metered (Just p) key (pure Nothing) $ \_ p' ->
downloadUrl key p' [u'] dest
downloadKeyCheap :: Key -> AssociatedFile -> FilePath -> Annex Bool
downloadKeyCheap _ _ _ = return False

View file

@ -25,8 +25,6 @@ module Utility.Url (
getUrlInfo,
assumeUrlExists,
download,
downloadQuiet,
downloadC,
sinkResponseFile,
downloadPartial,
parseURIRelaxed,
@ -35,7 +33,6 @@ module Utility.Url (
) where
import Common
import Utility.Tmp.Dir
import Utility.Metered
import qualified BuildInfo
@ -72,23 +69,30 @@ type UserAgent = String
data UrlOptions = UrlOptions
{ userAgent :: Maybe UserAgent
, reqHeaders :: Headers
, reqParams :: [CommandParam]
, urlDownloader :: UrlDownloader
, applyRequest :: Request -> Request
, httpManager :: Manager
}
data UrlDownloader
= DownloadWithConduit
| DownloadWithCurl [CommandParam]
defUrlOptions :: IO UrlOptions
defUrlOptions = UrlOptions
<$> pure Nothing
<*> pure []
<*> pure []
<*> pure DownloadWithConduit
<*> pure id
<*> newManager managerSettings
mkUrlOptions :: Maybe UserAgent -> Headers -> [CommandParam] -> Manager -> UrlOptions
mkUrlOptions defuseragent reqheaders reqparams manager =
UrlOptions useragent reqheaders reqparams applyrequest manager
UrlOptions useragent reqheaders urldownloader applyrequest manager
where
urldownloader = if null reqparams
then DownloadWithConduit
else DownloadWithCurl reqparams
applyrequest = \r -> r { requestHeaders = requestHeaders r ++ addedheaders }
addedheaders = uaheader ++ otherheaders
useragent = maybe defuseragent (Just . B8.toString . snd)
@ -105,11 +109,16 @@ mkUrlOptions defuseragent reqheaders reqparams manager =
(' ':v') -> (h', B8.fromString v')
_ -> (h', B8.fromString v)
addUserAgent :: UrlOptions -> [CommandParam] -> [CommandParam]
addUserAgent uo ps = case userAgent uo of
Nothing -> ps
-- --user-agent works for both wget and curl commands
Just ua -> ps ++ [Param "--user-agent", Param ua]
curlParams :: UrlOptions -> [CommandParam] -> [CommandParam]
curlParams uo ps = ps ++ uaparams ++ headerparams ++ addedparams
where
uaparams = case userAgent uo of
Nothing -> []
Just ua -> [Param "--user-agent", Param ua]
headerparams = concatMap (\h -> [Param "-H", Param h]) (reqHeaders uo)
addedparams = case urlDownloader uo of
DownloadWithConduit -> []
DownloadWithCurl l -> l
{- Checks that an url exists and could be successfully downloaded,
- also checking that its size, if available, matches a specified size. -}
@ -144,8 +153,8 @@ assumeUrlExists = UrlInfo True Nothing Nothing
- also returning its size and suggested filename if available. -}
getUrlInfo :: URLString -> UrlOptions -> IO UrlInfo
getUrlInfo url uo = case parseURIRelaxed url of
Just u -> case parseUrlConduit (show u) of
Just req -> catchJust
Just u -> case (urlDownloader uo, parseUrlConduit (show u)) of
(DownloadWithConduit, Just req) -> catchJust
-- When http redirects to a protocol which
-- conduit does not support, it will throw
-- a StatusCodeException with found302.
@ -155,7 +164,7 @@ getUrlInfo url uo = case parseURIRelaxed url of
`catchNonAsync` (const dne)
-- http-conduit does not support file:, ftp:, etc urls,
-- so fall back to reading files and using curl.
Nothing
_
| uriScheme u == "file:" -> do
let f = unEscapeString (uriPath u)
s <- catchMaybeIO $ getFileStatus f
@ -171,12 +180,12 @@ getUrlInfo url uo = case parseURIRelaxed url of
dne = return $ UrlInfo False Nothing Nothing
found sz f = return $ UrlInfo True sz f
curlparams = addUserAgent uo $
curlparams = curlParams uo $
[ Param "-s"
, Param "--head"
, Param "-L", Param url
, Param "-w", Param "%{http_code}"
] ++ concatMap (\h -> [Param "-H", Param h]) (reqHeaders uo) ++ (reqParams uo)
]
extractlencurl s = case lastMaybe $ filter ("Content-Length:" `isPrefixOf`) (lines s) of
Just l -> case lastMaybe $ words l of
@ -184,13 +193,11 @@ getUrlInfo url uo = case parseURIRelaxed url of
_ -> Nothing
_ -> Nothing
extractlen = readish . B8.toString <=< firstheader hContentLength
extractlen = readish . B8.toString
<=< lookup hContentLength . responseHeaders
extractfilename = contentDispositionFilename . B8.toString
<=< firstheader hContentDisposition
firstheader h = headMaybe . map snd .
filter (\p -> fst p == h) . responseHeaders
<=< lookup hContentDisposition . responseHeaders
existsconduit req = do
let req' = headRequest (applyRequest uo req)
@ -240,102 +247,25 @@ headRequest r = r
(requestHeaders r)
}
{- Download a perhaps large file, with auto-resume of incomplete downloads.
-
- Uses wget or curl program for its progress bar and resuming support.
- Which program to use is determined at run time depending on which is
- in path and which works best in a particular situation.
-}
download :: URLString -> FilePath -> UrlOptions -> IO Bool
download = download' False
{- No output to stdout. -}
downloadQuiet :: URLString -> FilePath -> UrlOptions -> IO Bool
downloadQuiet = download' True
download' :: Bool -> URLString -> FilePath -> UrlOptions -> IO Bool
download' quiet url file uo = do
case parseURIRelaxed url of
Just u
| uriScheme u == "file:" -> curl
-- curl is preferred in quiet mode, because
-- it displays http errors to stderr, while wget
-- does not display them in quiet mode
| quiet -> ifM (inPath "curl") (curl, wget)
-- wget is preferred mostly because it has a better
-- progress bar
| otherwise -> ifM (inPath "wget") (wget , curl)
_ -> return False
where
headerparams = map (\h -> Param $ "--header=" ++ h) (reqHeaders uo)
wget = go "wget" $ headerparams ++ quietopt "-q" ++ wgetparams
{- Regular wget needs --clobber to continue downloading an existing
- file. On Android, busybox wget is used, which does not
- support, or need that option.
-
- When the wget version is new enough, pass options for
- a less cluttered download display. Using -nv rather than -q
- avoids most clutter while still displaying http errors.
-}
#ifndef __ANDROID__
wgetparams = concat
[ if BuildInfo.wgetunclutter && not quiet
then [Param "-nv", Param "--show-progress"]
else []
, [ Param "--clobber", Param "-c", Param "-O"]
]
#else
wgetparams = [Param "-c", Param "-O"]
#endif
{- Uses the -# progress display, because the normal
- one is very confusing when resuming, showing
- the remainder to download as the whole file,
- and not indicating how much percent was
- downloaded before the resume. -}
curl = do
-- curl does not create destination file
-- if the url happens to be empty, so pre-create.
unlessM (doesFileExist file) $
writeFile file ""
go "curl" $ headerparams ++ quietopt "-sS" ++
[ Param "-f"
, Param "-L"
, Param "-C", Param "-"
, Param "-#"
, Param "-o"
]
{- Run wget in a temp directory because it has been buggy
- and overwritten files in the current directory, even though
- it was asked to write to a file elsewhere. -}
go cmd opts = withTmpDir "downloadurl" $ \tmp -> do
absfile <- absPath file
let ps = addUserAgent uo $ opts++reqParams uo++[File absfile, File url]
boolSystem' cmd ps $ \p -> p { cwd = Just tmp }
quietopt s
| quiet = [Param s]
| otherwise = []
{- Download a perhaps large file, with auto-resume of incomplete downloads.
-
- By default, conduit is used for the download, except for file: urls,
- which are copied. If the url scheme is not supported by conduit, falls
- back to using curl.
-}
downloadC :: MeterUpdate -> URLString -> FilePath -> UrlOptions -> IO Bool
downloadC meterupdate url file uo = go `catchNonAsync` (const $ return False)
download :: MeterUpdate -> URLString -> FilePath -> UrlOptions -> IO Bool
download meterupdate url file uo = go `catchNonAsync` (const $ return False)
where
go = case parseURIRelaxed url of
Just u -> case parseUrlConduit (show u) of
Just req -> catchJust
Just u -> case (urlDownloader uo, parseUrlConduit (show u)) of
(DownloadWithConduit, Just req) -> catchJust
-- When http redirects to a protocol which
-- conduit does not support, it will throw
-- a StatusCodeException with found302.
(matchStatusCodeException (== found302))
(downloadconduit req)
(const downloadcurl)
Nothing
_
| uriScheme u == "file:" -> do
let src = unEscapeString (uriPath u)
withMeteredFile src meterupdate $
@ -371,7 +301,6 @@ downloadC meterupdate url file uo = go `catchNonAsync` (const $ return False)
dl = runResourceT $ do
let req' = req { requestHeaders = resumeFromHeader sz : requestHeaders req }
resp <- http req' (httpManager uo)
liftIO $ print ("XXX", responseStatus resp)
if responseStatus resp == partialContent206
then store (BytesProcessed sz) AppendMode resp
else if responseStatus resp == ok200
@ -387,20 +316,13 @@ downloadC meterupdate url file uo = go `catchNonAsync` (const $ return False)
-- if the url happens to be empty, so pre-create.
unlessM (doesFileExist file) $
writeFile file ""
let headerparams = map (\h -> Param $ "--header=" ++ h) (reqHeaders uo)
let opts =
let ps = curlParams uo
[ Param "-sS"
, Param "-f"
, Param "-L"
, Param "-C", Param "-"
, Param "-o"
]
boolSystem "curl" $ addUserAgent uo $ concat
[ headerparams
, opts
, reqParams uo
, [File file, File url]
]
boolSystem "curl" (ps ++ [Param "-o", File file, File url])
{- Sinks a Response's body to a file. The file can either be opened in
- WriteMode or AppendMode. Updates the meter as data is received.

2
debian/control vendored
View file

@ -83,7 +83,6 @@ Build-Depends:
libimage-magick-perl,
git (>= 1:1.8.1),
rsync,
wget,
curl,
openssh-client,
git-remote-gcrypt (>= 0.20130908-6),
@ -101,7 +100,6 @@ Section: utils
Depends: ${misc:Depends}, ${shlibs:Depends},
git (>= 1:1.8.1),
rsync,
wget,
curl,
openssh-client (>= 1:5.6p1)
Recommends:

View file

@ -1356,8 +1356,11 @@ Here are all the supported configuration settings.
* `annex.web-options`
Options to pass when running wget or curl.
For example, to force IPv4 only, set it to "-4"
Setting this makes git-annex use curl to download urls
(rather than the default built-in url downloader).
For example, to force IPv4 only, set it to "-4".
Or to make curl use your ~/.netrc file, set it to "--netrc".
* `annex.youtube-dl-options`
@ -1387,7 +1390,6 @@ Here are all the supported configuration settings.
* `annex.web-download-command`
Use to specify a command to run to download a file from the web.
(The default is to use wget or curl.)
In the command line, %url is replaced with the url to download,
and %file is replaced with the file that it should be saved to.

View file

@ -37,3 +37,16 @@ supports netrc?
> download a file with resume support using http-conduit.
> It falls back to curl to handle urls that http-conduit does not support.
> Now we only have to decide what to do about the above edge cases..
> > Let's drop use of wget entirely, as it was only using it because I
> > preferred wget's progress bar to curl's. The user can still force wget
> > with annex.web-download-command.
> >
> > That leaves users who have a .netrc file or want to use
> > annex.web-options. Since curl requires --netrc in order to use the
> > .netrc file, require users who want to use the .netrc to
> > set "annex.web-options = --netrc". When "annex.web-options" is
> > set, always use curl (unless overridden by annex.web-download-command).
> > Otherwise, use conduit.
[[done]] --[[Joey]]

View file

@ -8,3 +8,11 @@ is gone permanently).
I do not want to encode my credentials into the URLs (eg.
username:password@example.com) because my password changes frequently and I would
have to update all of the URLs.
> git-annex 6.20180406 and onwards use http-conduit for everything
> by default. To use the .netrc file, run:
git config annex.web-options --netrc
> That will make git-annex use curl for all web accesses, and configures
> curl to use your netrc file. [[done]] --[[Joey]]