fix add overwrite race with git-annex add to annex

This is not a complete fix for all such races, only the one where a
large file gets changed while adding and gets added to git rather than
to the annex.

addLink needs to go away, any caller of it is probably subject to the
same kind of race. (Also, addLink itself fails to check gitignore when
symlinks are not supported.)

ingestAdd no longer checks gitignore. (It didn't check it consistently
before either, since there were cases where it did not run git add!)
When git-annex import calls it, it's already checked gitignore itself
earlier. When git-annex add calls it, it's usually on files found
by withFilesNotInGit, which handles checking ignores.

There was one other case, when git-annex add --batch calls it. In that
case, old git-annex behaved rather badly, it would seem to add the file,
but git add would later fail, leaving the file as an unstaged annex symlink.
That behavior has also been fixed.

Sponsored-by: Brett Eisenberg on Patreon
This commit is contained in:
Joey Hess 2022-06-14 13:20:42 -04:00
parent b471438c51
commit dd6dec4eb1
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 47 additions and 16 deletions

View file

@ -145,19 +145,19 @@ checkLockedDownWritePerms file displayfile = checkContentWritePerm file >>= retu
{- Ingests a locked down file into the annex. Updates the work tree and {- Ingests a locked down file into the annex. Updates the work tree and
- index. -} - index. -}
ingestAdd :: CheckGitIgnore -> MeterUpdate -> Maybe LockedDown -> Annex (Maybe Key) ingestAdd :: MeterUpdate -> Maybe LockedDown -> Annex (Maybe Key)
ingestAdd ci meterupdate ld = ingestAdd' ci meterupdate ld Nothing ingestAdd meterupdate ld = ingestAdd' meterupdate ld Nothing
ingestAdd' :: CheckGitIgnore -> MeterUpdate -> Maybe LockedDown -> Maybe Key -> Annex (Maybe Key) ingestAdd' :: MeterUpdate -> Maybe LockedDown -> Maybe Key -> Annex (Maybe Key)
ingestAdd' _ _ Nothing _ = return Nothing ingestAdd' _ Nothing _ = return Nothing
ingestAdd' ci meterupdate ld@(Just (LockedDown cfg source)) mk = do ingestAdd' meterupdate ld@(Just (LockedDown cfg source)) mk = do
(mk', mic) <- ingest meterupdate ld mk (mk', mic) <- ingest meterupdate ld mk
case mk' of case mk' of
Nothing -> return Nothing Nothing -> return Nothing
Just k -> do Just k -> do
let f = keyFilename source let f = keyFilename source
if lockingFile cfg if lockingFile cfg
then addLink ci f k mic then addSymlink f k mic
else do else do
mode <- liftIO $ catchMaybeIO $ mode <- liftIO $ catchMaybeIO $
fileMode <$> R.getFileStatus (contentLocation source) fileMode <$> R.getFileStatus (contentLocation source)
@ -322,6 +322,11 @@ makeLink file key mcache = flip catchNonAsync (restoreFile file key) $ do
- and is faster (staging the symlink runs hash-object commands each time). - and is faster (staging the symlink runs hash-object commands each time).
- Also, using git add allows it to skip gitignored files, unless forced - Also, using git add allows it to skip gitignored files, unless forced
- to include them. - to include them.
-
- FIXME: Using git add opens a race where the file can be changed
- before git adds it, causing a large file to be added directly to git.
- addSymlink avoids that, but does not check git ignore. Things need to
- be converted to use it, first checking git ignore themselves.
-} -}
addLink :: CheckGitIgnore -> RawFilePath -> Key -> Maybe InodeCache -> Annex () addLink :: CheckGitIgnore -> RawFilePath -> Key -> Maybe InodeCache -> Annex ()
addLink ci file key mcache = ifM (coreSymlinks <$> Annex.getGitConfig) addLink ci file key mcache = ifM (coreSymlinks <$> Annex.getGitConfig)
@ -330,11 +335,14 @@ addLink ci file key mcache = ifM (coreSymlinks <$> Annex.getGitConfig)
ps <- gitAddParams ci ps <- gitAddParams ci
Annex.Queue.addCommand [] "add" (ps++[Param "--"]) Annex.Queue.addCommand [] "add" (ps++[Param "--"])
[fromRawFilePath file] [fromRawFilePath file]
, do , addSymlink file key mcache
l <- makeLink file key mcache
addAnnexLink l file
) )
addSymlink :: RawFilePath -> Key -> Maybe InodeCache -> Annex ()
addSymlink file key mcache = do
l <- makeLink file key mcache
addAnnexLink l file
{- Parameters to pass to git add, forcing addition of ignored files. {- Parameters to pass to git add, forcing addition of ignored files.
- -
- Note that, when git add is being run on an ignored file that is already - Note that, when git add is being run on an ignored file that is already

View file

@ -13,6 +13,12 @@ git-annex (10.20220526) UNRELEASED; urgency=medium
content, but where dropping failed due to eg a network problem, content, but where dropping failed due to eg a network problem,
in cases where numcopies checks prevented the resumed in cases where numcopies checks prevented the resumed
move from dropping the object from the source repository. move from dropping the object from the source repository.
* add: When several files are being added, replacing an annex symlink
of a file that was already processed with a new large file could
sometimes cause that large file to be added to git. This race has been
fixed.
* add --batch: Fix handling of a file that is skipped due to being
gitignored.
-- Joey Hess <id@joeyh.name> Wed, 01 Jun 2022 13:23:05 -0400 -- Joey Hess <id@joeyh.name> Wed, 01 Jun 2022 13:23:05 -0400

View file

@ -24,6 +24,7 @@ import Config.Smudge
import Utility.OptParse import Utility.OptParse
import Utility.InodeCache import Utility.InodeCache
import Annex.InodeSentinal import Annex.InodeSentinal
import Annex.CheckIgnore
import qualified Utility.RawFilePath as R import qualified Utility.RawFilePath as R
cmd :: Command cmd :: Command
@ -98,7 +99,11 @@ seek o = startConcurrency commandStages $ do
| updateOnly o -> | updateOnly o ->
giveup "--update --batch is not supported" giveup "--update --batch is not supported"
| otherwise -> batchOnly Nothing (addThese o) $ | otherwise -> batchOnly Nothing (addThese o) $
batchFiles fmt (gofile True) batchFiles fmt $ \v@(_si, file) ->
ifM (checkIgnored (checkGitIgnoreOption o) file)
( stop
, gofile True v
)
NoBatch -> do NoBatch -> do
-- Avoid git ls-files complaining about files that -- Avoid git ls-files complaining about files that
-- are not known to git yet, since this will add -- are not known to git yet, since this will add
@ -169,7 +174,7 @@ start o si file addunlockedmatcher = do
starting "add" (ActionItemTreeFile file) si $ starting "add" (ActionItemTreeFile file) si $
if isSymbolicLink s if isSymbolicLink s
then next $ addFile Small (checkGitIgnoreOption o) file then next $ addFile Small (checkGitIgnoreOption o) file
else perform o file addunlockedmatcher else perform file addunlockedmatcher
addpresent key = addpresent key =
liftIO (catchMaybeIO $ R.getSymbolicLinkStatus file) >>= \case liftIO (catchMaybeIO $ R.getSymbolicLinkStatus file) >>= \case
Just s | isSymbolicLink s -> fixuplink key Just s | isSymbolicLink s -> fixuplink key
@ -186,8 +191,8 @@ start o si file addunlockedmatcher = do
Database.Keys.addAssociatedFile key =<< inRepo (toTopFilePath file) Database.Keys.addAssociatedFile key =<< inRepo (toTopFilePath file)
next $ addFile Large (checkGitIgnoreOption o) file next $ addFile Large (checkGitIgnoreOption o) file
perform :: AddOptions -> RawFilePath -> AddUnlockedMatcher -> CommandPerform perform :: RawFilePath -> AddUnlockedMatcher -> CommandPerform
perform o file addunlockedmatcher = withOtherTmp $ \tmpdir -> do perform file addunlockedmatcher = withOtherTmp $ \tmpdir -> do
lockingfile <- not <$> addUnlocked addunlockedmatcher lockingfile <- not <$> addUnlocked addunlockedmatcher
(MatchingFile (FileInfo file file Nothing)) (MatchingFile (FileInfo file file Nothing))
True True
@ -199,7 +204,7 @@ perform o file addunlockedmatcher = withOtherTmp $ \tmpdir -> do
ld <- lockDown cfg (fromRawFilePath file) ld <- lockDown cfg (fromRawFilePath file)
let sizer = keySource <$> ld let sizer = keySource <$> ld
v <- metered Nothing sizer Nothing $ \_meter meterupdate -> v <- metered Nothing sizer Nothing $ \_meter meterupdate ->
ingestAdd (checkGitIgnoreOption o) meterupdate ld ingestAdd meterupdate ld
finish v finish v
where where
finish (Just key) = next $ cleanup key True finish (Just key) = next $ cleanup key True

View file

@ -246,7 +246,7 @@ startLocal o addunlockedmatcher largematcher mode (srcfile, destfile) =
} }
} }
ifM (checkFileMatcher largematcher destfile) ifM (checkFileMatcher largematcher destfile)
( ingestAdd' (checkGitIgnoreOption o) nullMeterUpdate (Just ld') (Just k) ( ingestAdd' nullMeterUpdate (Just ld') (Just k)
>>= maybe >>= maybe
stop stop
(\addedk -> next $ Command.Add.cleanup addedk True) (\addedk -> next $ Command.Add.cleanup addedk True)

View file

@ -29,6 +29,14 @@ be that it's faster. It also sometimes relies on git add to check gitignore,
although sometimes redundandly, some of the callers of it may rely on that although sometimes redundandly, some of the callers of it may rely on that
and have to be changed to check it first themselves. and have to be changed to check it first themselves.
Since adding a file to the annex also involves locking it down and
detecting modifications made while generating the key, update-index is
sufficient.
> Update: This is done for `git-annex add`, using addSymlink. But addLink
> is still in use elsewhere, and those other users might also be subject to
> similar races.
When it's adding a file unlocked, it already stages the pointer file using When it's adding a file unlocked, it already stages the pointer file using
update-index instead so there is no overwrite problem there. update-index instead so there is no overwrite problem there.
@ -44,5 +52,9 @@ added to the annex after all. Test case for this:
git-annex add git-annex add
git diff --cached 1 git diff --cached 1
Unsure how to fix this case yet? Unsure how to fix this case yet? Maybe it needs to cache the inode,
hash the file content, then verifiy the inode did not change during
hashing, and then also use update-index.
--[[Joey]] --[[Joey]]