make hashFile support paths with newlines
git hash-object --stdin-paths is a newline protocol so it cannot support them. It would help to not use absPath, when the problem is that the repository itself is in a path with a newline. But, there's a reason it used absPath, which is that git hash-object --stdin-paths actually chdirs to the top of the repository on startup! That is not documented, and I think is a bug in git. I considered making the path relative to the top of the repo, but then what if this is a git bug and gets fixed? git-annex would break horribly. So instead, keep the absPath, but when the path contains a newline, fall back to running git hash-object once per file, which avoids the problem with newlines and --stdin-paths. It will be slower, but this is an edge case. (Similar slow code paths are already used elsewhere when dealing with filenames with newlines and other parts of git that use line-based protocols.) Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
parent
e7ed9b7cbb
commit
a6bebe3c0f
2 changed files with 60 additions and 11 deletions
|
@ -1,6 +1,6 @@
|
|||
{- git hash-object interface
|
||||
-
|
||||
- Copyright 2011-2019 Joey Hess <id@joeyh.name>
|
||||
- Copyright 2011-2023 Joey Hess <id@joeyh.name>
|
||||
-
|
||||
- Licensed under the GNU AGPL version 3 or higher.
|
||||
-}
|
||||
|
@ -21,26 +21,47 @@ import qualified Data.ByteString as S
|
|||
import qualified Data.ByteString.Char8 as S8
|
||||
import qualified Data.ByteString.Lazy as L
|
||||
import Data.ByteString.Builder
|
||||
import Data.Char
|
||||
|
||||
type HashObjectHandle = CoProcess.CoProcessHandle
|
||||
data HashObjectHandle = HashObjectHandle CoProcess.CoProcessHandle Repo [CommandParam]
|
||||
|
||||
hashObjectStart :: Bool -> Repo -> IO HashObjectHandle
|
||||
hashObjectStart writeobject = gitCoProcessStart True $ catMaybes
|
||||
[ Just (Param "hash-object")
|
||||
, if writeobject then Just (Param "-w") else Nothing
|
||||
, Just (Param "--stdin-paths")
|
||||
, Just (Param "--no-filters")
|
||||
]
|
||||
hashObjectStart writeobject repo = do
|
||||
h <- gitCoProcessStart True (ps ++ [Param "--stdin-paths"]) repo
|
||||
return (HashObjectHandle h repo ps)
|
||||
where
|
||||
ps = catMaybes
|
||||
[ Just (Param "hash-object")
|
||||
, if writeobject then Just (Param "-w") else Nothing
|
||||
, Just (Param "--no-filters")
|
||||
]
|
||||
|
||||
hashObjectStop :: HashObjectHandle -> IO ()
|
||||
hashObjectStop = CoProcess.stop
|
||||
hashObjectStop (HashObjectHandle h _ _) = CoProcess.stop h
|
||||
|
||||
{- Injects a file into git, returning the Sha of the object. -}
|
||||
hashFile :: HashObjectHandle -> RawFilePath -> IO Sha
|
||||
hashFile h file = CoProcess.query h send receive
|
||||
hashFile hdl@(HashObjectHandle h _ _) file = do
|
||||
-- git hash-object chdirs to the top of the repository on
|
||||
-- start, so if the filename is relative, it will
|
||||
-- not work. This seems likely to be a git bug.
|
||||
-- So, make the filename absolute, which will work now
|
||||
-- and also if git's behavior later changes.
|
||||
file' <- absPath file
|
||||
if newline `S.elem` file'
|
||||
then hashFile' hdl file
|
||||
else CoProcess.query h (send file') receive
|
||||
where
|
||||
send to = S8.hPutStrLn to =<< absPath file
|
||||
send file' to = S8.hPutStrLn to file'
|
||||
receive from = getSha "hash-object" $ S8.hGetLine from
|
||||
newline = fromIntegral (ord '\n')
|
||||
|
||||
{- Runs git hash-object once per call, rather than using a running
|
||||
- one, so is slower. But, is able to handle newlines in the filepath,
|
||||
- which --stdin-paths cannot. -}
|
||||
hashFile' :: HashObjectHandle -> RawFilePath -> IO Sha
|
||||
hashFile' (HashObjectHandle _ repo ps) file = getSha "hash-object" $
|
||||
pipeReadStrict (ps ++ [File (fromRawFilePath file)]) repo
|
||||
|
||||
class HashableBlob t where
|
||||
hashableBlobToHandle :: Handle -> t -> IO ()
|
||||
|
|
|
@ -0,0 +1,28 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2023-03-13T16:20:54Z"
|
||||
content="""
|
||||
Unfortunately, `git hash-object --stdin-paths` does not support
|
||||
-z or anything like that. It is a newline based protocol.
|
||||
|
||||
Ok, made git-annex fall back to running git hash-object once
|
||||
per file when the filenames contain newlines to work around that.
|
||||
|
||||
BTW, another problem I noticed is that the repository decription
|
||||
written to uuid.log contains a newline, which prevents parsing that line of
|
||||
the log correctly. This can also be seen by passing a value
|
||||
with a newline to `git-annex describe`. It would also happen in the
|
||||
case with the newline directory if it didn't fail earlier.
|
||||
|
||||
Another log file that has a similar problem BTW is config.log,
|
||||
which can get a newline in a value with eg
|
||||
`git annex config --set annex.largefiles "xxx\nyyy"`
|
||||
and the result is that reading the value back out omits
|
||||
the part after the newline.
|
||||
|
||||
Also, a newline in a subdirectory inside the repository breaks
|
||||
adding files in that directory with `git-annex add`.git-
|
||||
|
||||
Newlines in filenames seem to work ok though...
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue