avoid bad commits after interrupted direct mode sync (or merge)

It was possible for a interrupted sync or merge in direct mode to
leave the work tree out of sync with the last recorded commit.
This would result in the next commit seeing files missing from the work
tree, and committing their removal.

Now, a direct mode merge happens not only in a throwaway work tree, but using
a temporary index file, and without any commits or index changes
being made until the real work tree has been updated. If the merge is
interrupted, the work tree may have some updated files, but worst case a
commit will redundantly commit changes that come from the merge.

This commit was sponsored by Tony Cantor.
This commit is contained in:
Joey Hess 2014-06-09 18:01:30 -04:00
parent 193e702d73
commit d6711800ad
8 changed files with 142 additions and 47 deletions

View file

@ -17,7 +17,6 @@ import qualified Git.LsFiles as LsFiles
import qualified Git.UpdateIndex as UpdateIndex import qualified Git.UpdateIndex as UpdateIndex
import qualified Git.Merge import qualified Git.Merge
import qualified Git.Ref import qualified Git.Ref
import qualified Git.Sha
import qualified Git import qualified Git
import Git.Types (BlobType(..)) import Git.Types (BlobType(..))
import Config import Config
@ -38,12 +37,7 @@ autoMergeFrom branch currbranch = do
Just b -> go =<< inRepo (Git.Ref.sha b) Just b -> go =<< inRepo (Git.Ref.sha b)
where where
go old = ifM isDirect go old = ifM isDirect
( do ( mergeDirect currbranch old branch (resolveMerge old branch)
d <- fromRepo gitAnnexMergeDir
r <- inRepo (mergeDirect d branch)
<||> resolveMerge old branch
mergeDirectCleanup d (fromMaybe Git.Sha.emptyTree old) Git.Ref.headRef
return r
, inRepo (Git.Merge.mergeNonInteractive branch) , inRepo (Git.Merge.mergeNonInteractive branch)
<||> resolveMerge old branch <||> resolveMerge old branch
) )
@ -70,9 +64,11 @@ autoMergeFrom branch currbranch = do
- -
- In indirect mode, the merge is resolved in the work tree and files - In indirect mode, the merge is resolved in the work tree and files
- staged, to clean up from a conflicted merge that was run in the work - staged, to clean up from a conflicted merge that was run in the work
- tree. In direct mode, the work tree is not touched here; files are - tree. The resolution is committed.
- staged to the index, and written to the gitAnnexMergeDir, and later -
- mergeDirectCleanup handles updating the work tree. - In direct mode, the work tree is not touched here, and no commit is made;
- files are staged to the index, and written to the gitAnnexMergeDir, and
- later mergeDirectCleanup handles updating the work tree.
-} -}
resolveMerge :: Maybe Git.Ref -> Git.Ref -> Annex Bool resolveMerge :: Maybe Git.Ref -> Git.Ref -> Annex Bool
resolveMerge us them = do resolveMerge us them = do
@ -92,14 +88,13 @@ resolveMerge us them = do
unlessM isDirect $ unlessM isDirect $
cleanConflictCruft mergedfs top cleanConflictCruft mergedfs top
Annex.Queue.flush Annex.Queue.flush
whenM isDirect $ unlessM isDirect $ do
void preCommitDirect void $ inRepo $ Git.Command.runBool
void $ inRepo $ Git.Command.runBool [ Param "commit"
[ Param "commit" , Param "--no-verify"
, Param "--no-verify" , Param "-m"
, Param "-m" , Param "git-annex automatic merge conflict fix"
, Param "git-annex automatic merge conflict fix" ]
]
showLongNote "Merge conflict was automatically resolved; you may want to examine the result." showLongNote "Merge conflict was automatically resolved; you may want to examine the result."
return merged return merged

View file

@ -34,6 +34,8 @@ import Annex.Perms
import Annex.ReplaceFile import Annex.ReplaceFile
import Annex.Exception import Annex.Exception
import Annex.VariantFile import Annex.VariantFile
import Git.Index
import Annex.Index
{- Uses git ls-files to find files that need to be committed, and stages {- Uses git ls-files to find files that need to be committed, and stages
- them into the index. Returns True if some changes were staged. -} - them into the index. Returns True if some changes were staged. -}
@ -141,21 +143,63 @@ addDirect file cache = do
) )
{- In direct mode, git merge would usually refuse to do anything, since it {- In direct mode, git merge would usually refuse to do anything, since it
- sees present direct mode files as type changed files. To avoid this, - sees present direct mode files as type changed files.
- merge is run with the work tree set to a temp directory. -
- So, to handle a merge, it's run with the work tree set to a temp
- directory, and the merge is staged into a copy of the index.
- Then the work tree is updated to reflect the merge, and
- finally, the merge is committed and the real index updated.
-} -}
mergeDirect :: FilePath -> Git.Ref -> Git.Repo -> IO Bool mergeDirect :: Maybe Git.Ref -> Maybe Git.Ref -> Git.Branch -> Annex Bool -> Annex Bool
mergeDirect d branch g = do mergeDirect startbranch oldref branch resolvemerge = do
whenM (doesDirectoryExist d) $ -- Use the index lock file as the temp index file.
removeDirectoryRecursive d -- This is actually what git does when updating the index,
createDirectoryIfMissing True d -- and so it will prevent other git processes from making
let g' = g { location = Local { gitdir = Git.localGitDir g, worktree = Just d } } -- any changes to the index while our merge is in progress.
Git.Merge.mergeNonInteractive branch g' reali <- fromRepo indexFile
tmpi <- fromRepo indexFileLock
liftIO $ copyFile reali tmpi
{- Cleans up after a direct mode merge. The merge must have been committed, d <- fromRepo gitAnnexMergeDir
- and the commit sha passed in, along with the old sha of the tree liftIO $ do
- before the merge. Uses git diff-tree to find files that changed between whenM (doesDirectoryExist d) $
- the two shas, and applies those changes to the work tree. removeDirectoryRecursive d
createDirectoryIfMissing True d
withIndexFile tmpi $ do
r <- inRepo (mergein d) <||> resolvemerge
mergeDirectCleanup d (fromMaybe Git.Sha.emptyTree oldref)
mergeDirectCommit startbranch branch
liftIO $ rename tmpi reali
return r
where
mergein d g = Git.Merge.stageMerge branch $
g { location = Local { gitdir = Git.localGitDir g, worktree = Just d } }
{- Commits after a direct mode merge is complete, and after the work
- tree has been updated by mergeDirectCleanup.
-}
mergeDirectCommit :: Maybe Git.Ref -> Git.Branch -> Annex ()
mergeDirectCommit old branch = do
void preCommitDirect
gitdir <- fromRepo Git.localGitDir
let merge_head = gitdir </> "MERGE_HEAD"
let merge_msg = gitdir </> "MERGE_MSG"
let merge_mode = gitdir </> "MERGE_MODE"
ifM (maybe (return False) (\o -> inRepo $ Git.Branch.fastForwardable o branch) old)
( inRepo $ Git.Branch.update Git.Ref.headRef branch -- fast forward
, do
msg <- liftIO $
catchDefaultIO ("merge " ++ fromRef branch) $
readFile merge_msg
void $ inRepo $ Git.Branch.commit False msg
Git.Ref.headRef [Git.Ref.headRef, branch]
)
liftIO $ mapM_ nukeFile [merge_head, merge_msg, merge_mode]
{- Cleans up after a direct mode merge. The merge must have been staged
- in the index. Uses diff-index to compare the staged changes with
- the tree before the merge, and applies those changes to the work tree.
- -
- There are really only two types of changes: An old item can be deleted, - There are really only two types of changes: An old item can be deleted,
- or a new item added. Two passes are made, first deleting and then - or a new item added. Two passes are made, first deleting and then
@ -164,9 +208,9 @@ mergeDirect d branch g = do
- order, but we cannot add the directory until the file with the - order, but we cannot add the directory until the file with the
- same name is removed.) - same name is removed.)
-} -}
mergeDirectCleanup :: FilePath -> Git.Ref -> Git.Ref -> Annex () mergeDirectCleanup :: FilePath -> Git.Ref -> Annex ()
mergeDirectCleanup d oldsha newsha = do mergeDirectCleanup d oldref = do
(items, cleanup) <- inRepo $ DiffTree.diffTreeRecursive oldsha newsha (items, cleanup) <- inRepo $ DiffTree.diffIndex oldref
makeabs <- flip fromTopFilePath <$> gitRepo makeabs <- flip fromTopFilePath <$> gitRepo
let fsitems = zip (map (makeabs . DiffTree.file) items) items let fsitems = zip (map (makeabs . DiffTree.file) items) items
forM_ fsitems $ forM_ fsitems $
@ -194,12 +238,12 @@ mergeDirectCleanup d oldsha newsha = do
- key, it's left alone. - key, it's left alone.
- -
- If the file is already present, and does not exist in the - If the file is already present, and does not exist in the
- oldsha branch, preserve this local file. - oldref, preserve this local file.
- -
- Otherwise, create the symlink and then if possible, replace it - Otherwise, create the symlink and then if possible, replace it
- with the content. -} - with the content. -}
movein item makeabs k f = unlessM (goodContent k f) $ do movein item makeabs k f = unlessM (goodContent k f) $ do
preserveUnannexed item makeabs f oldsha preserveUnannexed item makeabs f oldref
l <- inRepo $ gitAnnexLink f k l <- inRepo $ gitAnnexLink f k
replaceFile f $ makeAnnexLink l replaceFile f $ makeAnnexLink l
toDirect k f toDirect k f
@ -207,13 +251,13 @@ mergeDirectCleanup d oldsha newsha = do
{- Any new, modified, or renamed files were written to the temp {- Any new, modified, or renamed files were written to the temp
- directory by the merge, and are moved to the real work tree. -} - directory by the merge, and are moved to the real work tree. -}
movein_raw item makeabs f = do movein_raw item makeabs f = do
preserveUnannexed item makeabs f oldsha preserveUnannexed item makeabs f oldref
liftIO $ do liftIO $ do
createDirectoryIfMissing True $ parentDir f createDirectoryIfMissing True $ parentDir f
void $ tryIO $ rename (d </> getTopFilePath (DiffTree.file item)) f void $ tryIO $ rename (d </> getTopFilePath (DiffTree.file item)) f
{- If the file that's being moved in is already present in the work {- If the file that's being moved in is already present in the work
- tree, but did not exist in the oldsha branch, preserve this - tree, but did not exist in the oldref, preserve this
- local, unannexed file (or directory), as "variant-local". - local, unannexed file (or directory), as "variant-local".
- -
- It's also possible that the file that's being moved in - It's also possible that the file that's being moved in
@ -221,7 +265,7 @@ mergeDirectCleanup d oldsha newsha = do
- file (not a directory), which should be preserved. - file (not a directory), which should be preserved.
-} -}
preserveUnannexed :: DiffTree.DiffTreeItem -> (TopFilePath -> FilePath) -> FilePath -> Ref -> Annex () preserveUnannexed :: DiffTree.DiffTreeItem -> (TopFilePath -> FilePath) -> FilePath -> Ref -> Annex ()
preserveUnannexed item makeabs absf oldsha = do preserveUnannexed item makeabs absf oldref = do
whenM (liftIO (collidingitem absf) <&&> unannexed absf) $ whenM (liftIO (collidingitem absf) <&&> unannexed absf) $
liftIO $ findnewname absf 0 liftIO $ findnewname absf 0
checkdirs (DiffTree.file item) checkdirs (DiffTree.file item)
@ -241,7 +285,7 @@ preserveUnannexed item makeabs absf oldsha = do
<$> catchMaybeIO (getSymbolicLinkStatus f) <$> catchMaybeIO (getSymbolicLinkStatus f)
unannexed f = (isNothing <$> isAnnexLink f) unannexed f = (isNothing <$> isAnnexLink f)
<&&> (isNothing <$> catFileDetails oldsha f) <&&> (isNothing <$> catFileDetails oldref f)
findnewname :: FilePath -> Int -> IO () findnewname :: FilePath -> Int -> IO ()
findnewname f n = do findnewname f n = do

View file

@ -52,7 +52,22 @@ changed origbranch newbranch repo
diffs = pipeReadStrict diffs = pipeReadStrict
[ Param "log" [ Param "log"
, Param (fromRef origbranch ++ ".." ++ fromRef newbranch) , Param (fromRef origbranch ++ ".." ++ fromRef newbranch)
, Params "--oneline -n1" , Param "-n1"
, Param "--pretty=%H"
] repo
{- Check if it's possible to fast-forward from the old
- ref to the new ref.
-
- This requires there to be a path from the old to the new. -}
fastForwardable :: Ref -> Ref -> Repo -> IO Bool
fastForwardable old new repo = not . null <$>
pipeReadStrict
[ Param "log"
, Param $ fromRef old ++ ".." ++ fromRef new
, Param "-n1"
, Param "--pretty=%H"
, Param "--ancestry-path"
] repo ] repo
{- Given a set of refs that are all known to have commits not {- Given a set of refs that are all known to have commits not
@ -74,7 +89,7 @@ fastForward branch (first:rest) repo =
where where
no_ff = return False no_ff = return False
do_ff to = do do_ff to = do
run [Param "update-ref", Param $ fromRef branch, Param $ fromRef to] repo update branch to repo
return True return True
findbest c [] = return $ Just c findbest c [] = return $ Just c
findbest c (r:rs) findbest c (r:rs)

View file

@ -49,7 +49,7 @@ diffIndex :: Ref -> Repo -> IO ([DiffTreeItem], IO Bool)
diffIndex ref = diffIndex' ref [Param "--cached"] diffIndex ref = diffIndex' ref [Param "--cached"]
{- Diffs between a tree and the working tree. Does nothing if there is not {- Diffs between a tree and the working tree. Does nothing if there is not
- yet a commit in the repository, of if the repository is bare. -} - yet a commit in the repository, or if the repository is bare. -}
diffWorkTree :: Ref -> Repo -> IO ([DiffTreeItem], IO Bool) diffWorkTree :: Ref -> Repo -> IO ([DiffTreeItem], IO Bool)
diffWorkTree ref repo = diffWorkTree ref repo =
ifM (Git.Ref.headExists repo) ifM (Git.Ref.headExists repo)

View file

@ -30,3 +30,7 @@ override index = do
indexFile :: Repo -> FilePath indexFile :: Repo -> FilePath
indexFile r = localGitDir r </> "index" indexFile r = localGitDir r </> "index"
{- Git locks the index by creating this file. -}
indexFileLock :: Repo -> FilePath
indexFileLock r = indexFile r ++ ".lock"

View file

@ -1,6 +1,6 @@
{- git merging {- git merging
- -
- Copyright 2012 Joey Hess <joey@kitenet.net> - Copyright 2012, 2014 Joey Hess <joey@kitenet.net>
- -
- Licensed under the GNU GPL version 3 or higher. - Licensed under the GNU GPL version 3 or higher.
-} -}
@ -19,3 +19,15 @@ mergeNonInteractive branch
| otherwise = merge [Param "--no-edit", Param $ fromRef branch] | otherwise = merge [Param "--no-edit", Param $ fromRef branch]
where where
merge ps = runBool $ Param "merge" : ps merge ps = runBool $ Param "merge" : ps
{- Stage the merge into the index, but do not commit it.-}
stageMerge :: Ref -> Repo -> IO Bool
stageMerge branch = runBool
[ Param "merge"
, Param "--quiet"
, Param "--no-commit"
-- Without this, a fast-forward merge is done, since it involves no
-- commit.
, Param "--no-ff"
, Param $ fromRef branch
]

5
debian/changelog vendored
View file

@ -3,6 +3,11 @@ git-annex (5.20140607) UNRELEASED; urgency=medium
* Ignore setsid failures. * Ignore setsid failures.
* Avoid leaving behind .tmp files when failing in some cases, including * Avoid leaving behind .tmp files when failing in some cases, including
importing files to a disk that is full. importing files to a disk that is full.
* direct mode: Avoid committing a merge until after the work tree is
updated. This avoids an interrupted merge leaving the work tree out
of sync with the last commit, which could result in the wrong thing
being committed later, and files appearing to get deleted.
(They could be recovered by reverting the bad commit.)
-- Joey Hess <joeyh@debian.org> Mon, 09 Jun 2014 14:44:09 -0400 -- Joey Hess <joeyh@debian.org> Mon, 09 Jun 2014 14:44:09 -0400

View file

@ -17,10 +17,30 @@ mode repo can get them back.
To fix this, direct mode merge would need to avoid updating the current To fix this, direct mode merge would need to avoid updating the current
branch when merging the remote branch into it (how?). It should first branch when merging the remote branch into it (how?). It should first
update the whole work tree, and only after it's updated should it update update the whole work tree, and only after it's updated should it update
the current branch to reflect the merge. (I assume this is how `git merge` the index and the current branch to reflect the merge.
normally works.) --[[Joey]]
This way, if the merge is interrupted, the work tree may have uncommitted
changed -- but it's fine if they get accidentially committed, since when
the merge is re-done, those changes will by the same ones made by the
merge. (I assume this is how `git merge` normally works.) --[[Joey]]
> Implemented that. And then realized that even updating the index > Implemented that. And then realized that even updating the index
> as part of a merge results in the work tree being out of sync with the > as part of a merge results in the work tree being out of sync with the
> index. Which will cause the next sync to again delete any files that > index. Which will cause the next sync to again delete any files that
> are in the index but not the work tree. Urgh. --[[Joey]] > are in the index but not the work tree. Urgh.
>
> Seems that a direct mode
> merge also needs to use a different index file to stage its changes?
> (Ugh)
> > [[done]] --[[Joey]]
>
> Or could perhaps use `git-merge-tree`
> and avoid staging the merge in the index until the work-tree is updated.
>
> Alternatively, could use another strategy.. Add a lock file which is held while
> the merge is in progress and contains the pre-merge sha.
> If the lock file is present but not held, state is inconsistent.
> `git-annex sync` and the SanityChecker should
> then run mergeDirectCleanup to recover, before any commits can be made
> from the inconsistent state. This approach seems to get complicated
> quickly.. --[[Joey]]