only stage regular files from the journal

git-annex only writes regular files there, but other things may drop junk
like empty .DAV directories around the tree. And trying to hash such things
can have weird and hard to understand effects. So it seems best to do a
small amount of work in statting the journal file to make sure it's a
regular file.

Sponsored-by: Jack Hill on Patreon
This commit is contained in:
Joey Hess 2023-10-10 13:22:02 -04:00
parent 35206e32f2
commit c268dc5878
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 43 additions and 2 deletions

View file

@ -49,6 +49,7 @@ import Data.ByteString.Builder
import Control.Concurrent (threadDelay) import Control.Concurrent (threadDelay)
import Control.Concurrent.MVar import Control.Concurrent.MVar
import qualified System.FilePath.ByteString as P import qualified System.FilePath.ByteString as P
import System.PosixCompat.Files (isRegularFile)
import Annex.Common hiding (append) import Annex.Common hiding (append)
import Types.BranchState import Types.BranchState
@ -726,13 +727,14 @@ stageJournal jl commitindex = withIndex $ withOtherTmp $ \tmpdir -> do
genstream dir h jh jlogh streamer = readDirectory jh >>= \case genstream dir h jh jlogh streamer = readDirectory jh >>= \case
Nothing -> return () Nothing -> return ()
Just file -> do Just file -> do
unless (dirCruft file) $ do let path = dir P.</> toRawFilePath file
let path = dir P.</> toRawFilePath file unless (dirCruft file) $ whenM (isfile path) $ do
sha <- Git.HashObject.hashFile h path sha <- Git.HashObject.hashFile h path
hPutStrLn jlogh file hPutStrLn jlogh file
streamer $ Git.UpdateIndex.updateIndexLine streamer $ Git.UpdateIndex.updateIndexLine
sha TreeFile (asTopFilePath $ fileJournal $ toRawFilePath file) sha TreeFile (asTopFilePath $ fileJournal $ toRawFilePath file)
genstream dir h jh jlogh streamer genstream dir h jh jlogh streamer
isfile file = isRegularFile <$> R.getFileStatus file
-- Clean up the staged files, as listed in the temp log file. -- Clean up the staged files, as listed in the temp log file.
-- The temp file is used to avoid needing to buffer all the -- The temp file is used to avoid needing to buffer all the
-- filenames in memory. -- filenames in memory.

View file

@ -1,3 +1,9 @@
git-annex (10.20230927) UNRELEASED; urgency=medium
* Ignore directories and other unusual files in .git/annex/journal/
-- Joey Hess <id@joeyh.name> Tue, 10 Oct 2023 13:17:31 -0400
git-annex (10.20230926) upstream; urgency=medium git-annex (10.20230926) upstream; urgency=medium
* Fix more breakage caused by git's fix for CVE-2022-24765, this time * Fix more breakage caused by git's fix for CVE-2022-24765, this time

View file

@ -93,3 +93,4 @@ git-annex: fd:17: Data.ByteString.hGetLine: end of file
Yes, it's one of my favourite opensource tools. Yes, it's one of my favourite opensource tools.
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,32 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2023-10-10T17:00:37Z"
content="""
Reproduced as follows:
joey@darkstar:~/tmp/bench>git init --bare dav
Initialized empty Git repository in /home/joey/tmp/bench/dav/
joey@darkstar:~/tmp/bench>cd dav
joey@darkstar:~/tmp/bench/dav>git-annex init --version=9
init ok
(recording state in git...)
joey@darkstar:~/tmp/bench/dav>for s in $(find -type d); do mkdir $s/.DAV;done
joey@darkstar:~/tmp/bench/dav>git-annex init --version=9
init fatal: Unable to add (null) to database
So it's these empty directories indeed. (Empty .DAV files don't cause this.)
In particular, it's any empty directory in .git/annex/journal. Which is
supposed to only contain files that git-annex wrote there. Staging the journal
is why git hash-object gets involved.
mkdir .DAV
echo .DAV | git hash-object -w --stdin-paths
fatal: Unable to add .DAV to database
Still unclear why git ends up with "(null)" in the error message.
While it will slow git-annex down a tiny bit to check if it's a regular file,
it seems better for git-annex to be robust against this kind of pollution.
"""]]