addurl --preserve-filename and a few related changes

* addurl --preserve-filename: New option, uses server-provided filename
  without any sanitization, but with some security checking.

  Not yet implemented for remotes other than the web.

* addurl, importfeed: Avoid adding filenames with leading '.', instead
  it will be replaced with '_'.

  This might be considered a security fix, but a CVE seems unwattanted.
  It was possible for addurl to create a dotfile, which could change
  behavior of some program. It was also possible for a web server to say
  the file name was ".git" or "foo/.git". That would not overrwrite the
  .git directory, but would cause addurl to fail; of course git won't
  add "foo/.git".

sanitizeFilePath is too opinionated to remain in Utility, so moved it.

The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:

	isDrive will never succeed, because "c:" gets munged to "c_"
	".." gets sanitized now
	".git" gets sanitized now
	It will never be null, because sanitizeFilePath keeps the length
	the same, and splitDirectories never returns a null path.

Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
This commit is contained in:
Joey Hess 2020-05-08 16:09:29 -04:00
parent 54599207f7
commit 6952060665
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 132 additions and 39 deletions

View file

@ -0,0 +1,56 @@
{- handling untrusted filepaths
-
- Copyright 2010-2020 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
module Annex.UntrustedFilePath where
import Data.Char
import System.FilePath
{- Given a string that we'd like to use as the basis for FilePath, but that
- was provided by a third party and is not to be trusted, returns the closest
- sane FilePath.
-
- All spaces and punctuation and other wacky stuff are replaced
- with '_', except for '.'
-
- "../" becomes ".._", which is safe.
- "/foo" becomes "_foo", which is safe.
- "c:foo" becomes "c_foo", which is safe even on windows.
-
- Leading '.' is also replaced with '_', so ".git/foo" becomes "_git_foo"
- and so no dotfiles that might control a program are inadvertently created.
-}
sanitizeFilePath :: String -> FilePath
sanitizeFilePath = leadingdot . map sanitize
where
sanitize c
| c == '.' = c
| isSpace c || isPunctuation c || isSymbol c || isControl c || c == '/' = '_'
| otherwise = c
leadingdot ('.':s) = '_':s
leadingdot s = s
escapeSequenceInFilePath :: FilePath -> Bool
escapeSequenceInFilePath f = '\ESC' `elem` f
{- ../ is a path traversal, no matter where it appears.
-
- An absolute path is, of course.
-}
pathTraversalInFilePath :: FilePath -> Bool
pathTraversalInFilePath f
| isAbsolute f = True
| any (== "..") (splitPath f) = True
-- On windows, C:foo with no directory is not considered absolute
| hasDrive f = True
| otherwise = False
gitDirectoryInFilePath :: FilePath -> Bool
gitDirectoryInFilePath = any (== ".git")
. map dropTrailingPathSeparator
. splitPath

View file

@ -16,6 +16,10 @@ git-annex (8.20200502) UNRELEASED; urgency=medium
* git-lfs repos that encrypt the annexed content but not the git repo * git-lfs repos that encrypt the annexed content but not the git repo
only need --force passed to initremote, allow enableremote and only need --force passed to initremote, allow enableremote and
autoenable of such remotes without forcing again. autoenable of such remotes without forcing again.
* addurl, importfeed: Avoid adding filenames with leading '.', instead
it will be replaced with '_'.
* addurl --preserve-filename: New option, uses server-provided filename
without any sanitization, but with some security checking.
-- Joey Hess <id@joeyh.name> Mon, 04 May 2020 12:46:11 -0400 -- Joey Hess <id@joeyh.name> Mon, 04 May 2020 12:46:11 -0400

View file

@ -1,6 +1,6 @@
{- git-annex command {- git-annex command
- -
- Copyright 2011-2017 Joey Hess <id@joeyh.name> - Copyright 2011-2020 Joey Hess <id@joeyh.name>
- -
- Licensed under the GNU AGPL version 3 or higher. - Licensed under the GNU AGPL version 3 or higher.
-} -}
@ -23,6 +23,7 @@ import Annex.CheckIgnore
import Annex.Perms import Annex.Perms
import Annex.UUID import Annex.UUID
import Annex.YoutubeDl import Annex.YoutubeDl
import Annex.UntrustedFilePath
import Logs.Web import Logs.Web
import Types.KeySource import Types.KeySource
import Types.UrlContents import Types.UrlContents
@ -52,6 +53,7 @@ data DownloadOptions = DownloadOptions
{ relaxedOption :: Bool { relaxedOption :: Bool
, rawOption :: Bool , rawOption :: Bool
, fileOption :: Maybe FilePath , fileOption :: Maybe FilePath
, preserveFilenameOption :: Bool
} }
optParser :: CmdParamsDesc -> Parser AddUrlOptions optParser :: CmdParamsDesc -> Parser AddUrlOptions
@ -77,7 +79,7 @@ optParser desc = AddUrlOptions
) )
parseDownloadOptions :: Bool -> Parser DownloadOptions parseDownloadOptions :: Bool -> Parser DownloadOptions
parseDownloadOptions withfileoption = DownloadOptions parseDownloadOptions withfileoptions = DownloadOptions
<$> switch <$> switch
( long "relaxed" ( long "relaxed"
<> help "skip size check" <> help "skip size check"
@ -86,12 +88,18 @@ parseDownloadOptions withfileoption = DownloadOptions
( long "raw" ( long "raw"
<> help "disable special handling for torrents, youtube-dl, etc" <> help "disable special handling for torrents, youtube-dl, etc"
) )
<*> if withfileoption <*> (if withfileoptions
then optional (strOption then optional (strOption
( long "file" <> metavar paramFile ( long "file" <> metavar paramFile
<> help "specify what file the url is added to" <> help "specify what file the url is added to"
)) ))
else pure Nothing else pure Nothing)
<*> (if withfileoptions
then switch
( long "preserve-filename"
<> help "use filename provided by server as-is"
)
else pure False)
seek :: AddUrlOptions -> CommandSeek seek :: AddUrlOptions -> CommandSeek
seek o = startConcurrency commandStages $ do seek o = startConcurrency commandStages $ do
@ -207,16 +215,35 @@ startWeb addunlockedmatcher o urlstring = go $ fromMaybe bad $ parseURI urlstrin
file <- adjustFile o <$> case fileOption (downloadOptions o) of file <- adjustFile o <$> case fileOption (downloadOptions o) of
Just f -> pure f Just f -> pure f
Nothing -> case Url.urlSuggestedFile urlinfo of Nothing -> case Url.urlSuggestedFile urlinfo of
Nothing -> pure $ url2file url (pathdepthOption o) pathmax Just sf | not (null sf) -> if preserveFilenameOption (downloadOptions o)
Just sf -> do then do
let f = truncateFilePath pathmax $ checkPreserveFileNameSecurity sf
sanitizeFilePath sf return sf
ifM (liftIO $ doesFileExist f <||> doesDirectoryExist f) else do
( pure $ url2file url (pathdepthOption o) pathmax let f = truncateFilePath pathmax $
, pure f sanitizeFilePath sf
) ifM (liftIO $ doesFileExist f <||> doesDirectoryExist f)
( pure $ url2file url (pathdepthOption o) pathmax
, pure f
)
_ -> pure $ url2file url (pathdepthOption o) pathmax
performWeb addunlockedmatcher o urlstring file urlinfo performWeb addunlockedmatcher o urlstring file urlinfo
-- sanitizeFilePath avoids all these security problems
-- (and probably others, but at least this catches the most egrarious ones).
checkPreserveFileNameSecurity :: FilePath -> Annex ()
checkPreserveFileNameSecurity f = do
checksecurity escapeSequenceInFilePath False "escape sequence"
checksecurity pathTraversalInFilePath True "path traversal"
checksecurity gitDirectoryInFilePath True "contains a .git directory"
where
checksecurity p canshow d = when (p f) $
giveup $ concat
[ "--preserve-filename was used, but the filename "
, if canshow then "(" ++ f ++ ") " else ""
, "has a security problem (" ++ d ++ "), not adding."
]
performWeb :: AddUnlockedMatcher -> AddUrlOptions -> URLString -> FilePath -> Url.UrlInfo -> CommandPerform performWeb :: AddUnlockedMatcher -> AddUrlOptions -> URLString -> FilePath -> Url.UrlInfo -> CommandPerform
performWeb addunlockedmatcher o url file urlinfo = ifAnnexed (toRawFilePath file) addurl geturl performWeb addunlockedmatcher o url file urlinfo = ifAnnexed (toRawFilePath file) addurl geturl
where where

View file

@ -40,6 +40,7 @@ import Logs.MetaData
import Annex.MetaData import Annex.MetaData
import Annex.FileMatcher import Annex.FileMatcher
import Command.AddUrl (addWorkTree) import Command.AddUrl (addWorkTree)
import Annex.UntrustedFilePath
cmd :: Command cmd :: Command
cmd = notBareRepo $ cmd = notBareRepo $

View file

@ -13,7 +13,7 @@ module Types.UrlContents (
) where ) where
import Utility.Url import Utility.Url
import Utility.Path import Annex.UntrustedFilePath
import System.FilePath import System.FilePath
@ -35,13 +35,7 @@ newtype SafeFilePath = SafeFilePath FilePath
mkSafeFilePath :: FilePath -> SafeFilePath mkSafeFilePath :: FilePath -> SafeFilePath
mkSafeFilePath p = SafeFilePath $ if null p' then "file" else p' mkSafeFilePath p = SafeFilePath $ if null p' then "file" else p'
where where
p' = joinPath $ filter safe $ map sanitizeFilePath $ splitDirectories p p' = joinPath $ map sanitizeFilePath $ splitDirectories p
safe s
| isDrive s = False
| s == ".." = False
| s == ".git" = False
| null s = False
| otherwise = True
fromSafeFilePath :: SafeFilePath -> FilePath fromSafeFilePath :: SafeFilePath -> FilePath
fromSafeFilePath (SafeFilePath p) = p fromSafeFilePath (SafeFilePath p) = p

View file

@ -1,6 +1,6 @@
{- path manipulation {- path manipulation
- -
- Copyright 2010-2014 Joey Hess <id@joeyh.name> - Copyright 2010-2020 Joey Hess <id@joeyh.name>
- -
- License: BSD-2-clause - License: BSD-2-clause
-} -}
@ -24,7 +24,6 @@ module Utility.Path (
inPath, inPath,
searchPath, searchPath,
dotfile, dotfile,
sanitizeFilePath,
splitShortExtensions, splitShortExtensions,
prop_upFrom_basics, prop_upFrom_basics,
@ -35,7 +34,6 @@ module Utility.Path (
import System.FilePath import System.FilePath
import Data.List import Data.List
import Data.Maybe import Data.Maybe
import Data.Char
import Control.Applicative import Control.Applicative
import Prelude import Prelude
@ -276,22 +274,6 @@ dotfile file
where where
f = takeFileName file f = takeFileName file
{- Given a string that we'd like to use as the basis for FilePath, but that
- was provided by a third party and is not to be trusted, returns the closest
- sane FilePath.
-
- All spaces and punctuation and other wacky stuff are replaced
- with '_', except for '.'
- "../" will thus turn into ".._", which is safe.
-}
sanitizeFilePath :: String -> FilePath
sanitizeFilePath = map sanitize
where
sanitize c
| c == '.' = c
| isSpace c || isPunctuation c || isSymbol c || isControl c || c == '/' = '_'
| otherwise = c
{- Similar to splitExtensions, but knows that some things in FilePaths {- Similar to splitExtensions, but knows that some things in FilePaths
- after a dot are too long to be extensions. -} - after a dot are too long to be extensions. -}
splitShortExtensions :: FilePath -> (FilePath, [String]) splitShortExtensions :: FilePath -> (FilePath, [String])

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-05-08T19:56:27Z"
content="""
Implemented git-annex addurl --preserve-filename, which will do what you
want.
Leaving this bug open because I only implemented it for web urls, not yet
for torrents and other special remotes that have their own url scheme.
The sanitization for those is currently done at a lower level than addurl,
and so that will take a bit more work to implement.
(importfeed does not, I think, need to implement this option, because
the filenames are based on information from the rss feed, and it's
perfectly fine to sanitize eg a podcast episode title to get a reasonable
filename.)
"""]]

View file

@ -57,6 +57,16 @@ be used to get better filenames.
If the file already exists, addurl will record that it can be downloaded If the file already exists, addurl will record that it can be downloaded
from the specified url(s). from the specified url(s).
* `--preserve-filename`
When the web server (or torrent, etc) provides a filename, use it as-is,
avoiding sanitizing unusual characters, or truncating it to length, or any
other modifications.
git-annex will still check the filename for safety, and if the filename
has a security problem such as path traversal or an escape sequence,
it will refuse to add it.
* `--pathdepth=N` * `--pathdepth=N`
Rather than basing the filename on the whole url, this causes a path to Rather than basing the filename on the whole url, this causes a path to

View file

@ -661,6 +661,7 @@ Executable git-annex
Annex.TaggedPush Annex.TaggedPush
Annex.Tmp Annex.Tmp
Annex.Transfer Annex.Transfer
Annex.UntrustedFilePath
Annex.UpdateInstead Annex.UpdateInstead
Annex.UUID Annex.UUID
Annex.Url Annex.Url