merge: Use fast-forward merges when possible.

Thanks Valentin Haenel for a test case showing how non-fast-forward merges
could result in an ongoing pull/merge/push cycle.

While the git-annex branch is fast-forwarded, git-annex's index file is still
updated using the union merge strategy as before. There's no other way to
update the index that would be any faster.

It is possible that a union merge and a fast-forward result in different file
contents: Files should have the same lines, but a union merge may change
their order. If this happens, the next commit made to the git-annex branch
will have some unnecessary changes to line orders, but the consistency
of data should be preserved.

Note that when the journal contains changes, a fast-forward is never attempted,
which is fine, because committing those changes would be vanishingly unlikely
to leave the git-annex branch at a commit that already exists in one of
the remotes.

The real difficulty is handling the case where multiple remotes have all
changed. git-annex does find the best (ie, newest) one and fast forwards
to it. If the remotes are diverged, no fast-forward is done at all. It would
be possible to pick one, fast forward to it, and make a merge commit to
the rest, I see no benefit to adding that complexity.

Determining the best of N changed remotes requires N*2+1 calls to git-log, but
these are fast git-log calls, and N is typically small. Also, typically
some or all of the remote refs will be the same, and git-log is not called to
compare those. In the real world I expect this will almost always add only
1 git-log call to the merge process. (Which already makes N anyway.)
This commit is contained in:
Joey Hess 2011-11-06 15:18:45 -04:00
parent bf07e2c921
commit c99fb58909
3 changed files with 79 additions and 23 deletions

View file

@ -117,28 +117,30 @@ commit message = whenM journalDirty $ lockJournal $ do
g <- gitRepo
withIndex $ liftIO $ Git.commit g message fullname [fullname]
{- Ensures that the branch is up-to-date; should be called before
- data is read from it. Runs only once per git-annex run.
{- Ensures that the branch is up-to-date; should be called before data is
- read from it. Runs only once per git-annex run.
-
- Before refs are merged into the index, it's
- important to first stage the journal into the
- index. Otherwise, any changes in the journal
- would later get staged, and might overwrite
- changes made during the merge.
- Before refs are merged into the index, it's important to first stage the
- journal into the index. Otherwise, any changes in the journal would
- later get staged, and might overwrite changes made during the merge.
-
- It would be cleaner to handle the merge by
- updating the journal, not the index, with changes
- from the branches.
- It would be cleaner to handle the merge by updating the journal, not the
- index, with changes from the branches.
-
- The index is always updated using a union merge, as that's the most
- efficient way to update it. However, if the branch can be
- fast-forwarded, that is then done, rather than adding an unnecessary
- commit to it.
-}
update :: Annex ()
update = onceonly $ do
g <- gitRepo
-- check what needs updating before taking the lock
dirty <- journalDirty
c <- filterM changedbranch =<< siblingBranches
c <- filterM (changedBranch name . snd) =<< siblingBranches
let (refs, branches) = unzip c
unless (not dirty && null refs) $ withIndex $ lockJournal $ do
when dirty stageJournalFiles
g <- gitRepo
unless (null branches) $ do
showSideAction $ "merging " ++
(unwords $ map Git.refDescribe branches) ++
@ -150,24 +152,64 @@ update = onceonly $ do
- modify the branch.
-}
liftIO $ Git.UnionMerge.merge_index g branches
liftIO $ Git.commit g "update" fullname (nub $ fullname:refs)
ff <- if dirty then return False else tryFastForwardTo refs
unless ff $
liftIO $ Git.commit g "update" fullname (nub $ fullname:refs)
invalidateCache
where
changedbranch (_, branch) = do
g <- gitRepo
-- checking with log to see if there have been changes
-- is less expensive than always merging
diffs <- liftIO $ Git.pipeRead g [
Param "log",
Param (name ++ ".." ++ branch),
Params "--oneline -n1"
]
return $ not $ L.null diffs
onceonly a = unlessM (branchUpdated <$> getState) $ do
r <- a
disableUpdate
return r
{- Checks if the second branch has any commits not present on the first
- branch. -}
changedBranch :: String -> String -> Annex Bool
changedBranch origbranch newbranch = do
g <- gitRepo
diffs <- liftIO $ Git.pipeRead g [
Param "log",
Param (origbranch ++ ".." ++ newbranch),
Params "--oneline -n1"
]
return $ not $ L.null diffs
{- Given a set of refs that are all known to have commits not
- on the git-annex branch, tries to update the branch by a
- fast-forward.
-
- In order for that to be possible, one of the refs must contain
- every commit present in all the other refs, as well as in the
- git-annex branch.
-}
tryFastForwardTo :: [String] -> Annex Bool
tryFastForwardTo [] = return True
tryFastForwardTo (first:rest) = do
-- First, check that the git-annex branch does not contain any
-- new commits that are in the first other branch. If it does,
-- cannot fast-forward.
diverged <- changedBranch first fullname
if diverged
then no_ff
else maybe no_ff do_ff =<< findbest first rest
where
no_ff = return False
do_ff branch = do
g <- gitRepo
liftIO $ Git.run g "update-ref" [Param fullname, Param branch]
return True
findbest c [] = return $ Just c
findbest c (r:rs)
| c == r = findbest c rs
| otherwise = do
better <- changedBranch c r
worse <- changedBranch r c
case (better, worse) of
(True, True) -> return Nothing -- divergent fail
(True, False) -> findbest r rs -- better
(False, True) -> findbest c rs -- worse
(False, False) -> findbest c rs -- same
{- Avoids updating the branch. A useful optimisation when the branch
- is known to have not changed, or git-annex won't be relying on info
- from it. -}

8
debian/changelog vendored
View file

@ -1,3 +1,11 @@
git-annex (3.20111106) UNRELEASED; urgency=low
* merge: Use fast-forward merges when possible.
Thanks Valentin Haenel for a test case showing how non-fast-forward
merges could result in an ongoing pull/merge/push cycle.
-- Joey Hess <joeyh@debian.org> Sun, 06 Nov 2011 14:57:57 -0400
git-annex (3.20111105) unstable; urgency=low
* The default backend used when adding files to the annex is changed

View file

@ -27,3 +27,9 @@ But as sometimes annex-merge takes time, it would probably be worth it
>
> Although, perhaps fast-forward merge would use slightly
> less space. --[[Joey]]
>> To avoid the ladder-merge between two repositories described at
>> <http://sprunge.us/LOMU>, seems a fast-forward should be detected and
>> written to git, even if the index is still updated the current way.
>> [[done]]
>> --[[Joey]]