merge: Use fast-forward merges when possible.
Thanks Valentin Haenel for a test case showing how non-fast-forward merges could result in an ongoing pull/merge/push cycle. While the git-annex branch is fast-forwarded, git-annex's index file is still updated using the union merge strategy as before. There's no other way to update the index that would be any faster. It is possible that a union merge and a fast-forward result in different file contents: Files should have the same lines, but a union merge may change their order. If this happens, the next commit made to the git-annex branch will have some unnecessary changes to line orders, but the consistency of data should be preserved. Note that when the journal contains changes, a fast-forward is never attempted, which is fine, because committing those changes would be vanishingly unlikely to leave the git-annex branch at a commit that already exists in one of the remotes. The real difficulty is handling the case where multiple remotes have all changed. git-annex does find the best (ie, newest) one and fast forwards to it. If the remotes are diverged, no fast-forward is done at all. It would be possible to pick one, fast forward to it, and make a merge commit to the rest, I see no benefit to adding that complexity. Determining the best of N changed remotes requires N*2+1 calls to git-log, but these are fast git-log calls, and N is typically small. Also, typically some or all of the remote refs will be the same, and git-log is not called to compare those. In the real world I expect this will almost always add only 1 git-log call to the merge process. (Which already makes N anyway.)
This commit is contained in:
parent
bf07e2c921
commit
c99fb58909
3 changed files with 79 additions and 23 deletions
|
@ -117,28 +117,30 @@ commit message = whenM journalDirty $ lockJournal $ do
|
|||
g <- gitRepo
|
||||
withIndex $ liftIO $ Git.commit g message fullname [fullname]
|
||||
|
||||
{- Ensures that the branch is up-to-date; should be called before
|
||||
- data is read from it. Runs only once per git-annex run.
|
||||
{- Ensures that the branch is up-to-date; should be called before data is
|
||||
- read from it. Runs only once per git-annex run.
|
||||
-
|
||||
- Before refs are merged into the index, it's
|
||||
- important to first stage the journal into the
|
||||
- index. Otherwise, any changes in the journal
|
||||
- would later get staged, and might overwrite
|
||||
- changes made during the merge.
|
||||
- Before refs are merged into the index, it's important to first stage the
|
||||
- journal into the index. Otherwise, any changes in the journal would
|
||||
- later get staged, and might overwrite changes made during the merge.
|
||||
-
|
||||
- It would be cleaner to handle the merge by
|
||||
- updating the journal, not the index, with changes
|
||||
- from the branches.
|
||||
- It would be cleaner to handle the merge by updating the journal, not the
|
||||
- index, with changes from the branches.
|
||||
-
|
||||
- The index is always updated using a union merge, as that's the most
|
||||
- efficient way to update it. However, if the branch can be
|
||||
- fast-forwarded, that is then done, rather than adding an unnecessary
|
||||
- commit to it.
|
||||
-}
|
||||
update :: Annex ()
|
||||
update = onceonly $ do
|
||||
g <- gitRepo
|
||||
-- check what needs updating before taking the lock
|
||||
dirty <- journalDirty
|
||||
c <- filterM changedbranch =<< siblingBranches
|
||||
c <- filterM (changedBranch name . snd) =<< siblingBranches
|
||||
let (refs, branches) = unzip c
|
||||
unless (not dirty && null refs) $ withIndex $ lockJournal $ do
|
||||
when dirty stageJournalFiles
|
||||
g <- gitRepo
|
||||
unless (null branches) $ do
|
||||
showSideAction $ "merging " ++
|
||||
(unwords $ map Git.refDescribe branches) ++
|
||||
|
@ -150,24 +152,64 @@ update = onceonly $ do
|
|||
- modify the branch.
|
||||
-}
|
||||
liftIO $ Git.UnionMerge.merge_index g branches
|
||||
liftIO $ Git.commit g "update" fullname (nub $ fullname:refs)
|
||||
ff <- if dirty then return False else tryFastForwardTo refs
|
||||
unless ff $
|
||||
liftIO $ Git.commit g "update" fullname (nub $ fullname:refs)
|
||||
invalidateCache
|
||||
where
|
||||
changedbranch (_, branch) = do
|
||||
g <- gitRepo
|
||||
-- checking with log to see if there have been changes
|
||||
-- is less expensive than always merging
|
||||
diffs <- liftIO $ Git.pipeRead g [
|
||||
Param "log",
|
||||
Param (name ++ ".." ++ branch),
|
||||
Params "--oneline -n1"
|
||||
]
|
||||
return $ not $ L.null diffs
|
||||
onceonly a = unlessM (branchUpdated <$> getState) $ do
|
||||
r <- a
|
||||
disableUpdate
|
||||
return r
|
||||
|
||||
{- Checks if the second branch has any commits not present on the first
|
||||
- branch. -}
|
||||
changedBranch :: String -> String -> Annex Bool
|
||||
changedBranch origbranch newbranch = do
|
||||
g <- gitRepo
|
||||
diffs <- liftIO $ Git.pipeRead g [
|
||||
Param "log",
|
||||
Param (origbranch ++ ".." ++ newbranch),
|
||||
Params "--oneline -n1"
|
||||
]
|
||||
return $ not $ L.null diffs
|
||||
|
||||
{- Given a set of refs that are all known to have commits not
|
||||
- on the git-annex branch, tries to update the branch by a
|
||||
- fast-forward.
|
||||
-
|
||||
- In order for that to be possible, one of the refs must contain
|
||||
- every commit present in all the other refs, as well as in the
|
||||
- git-annex branch.
|
||||
-}
|
||||
tryFastForwardTo :: [String] -> Annex Bool
|
||||
tryFastForwardTo [] = return True
|
||||
tryFastForwardTo (first:rest) = do
|
||||
-- First, check that the git-annex branch does not contain any
|
||||
-- new commits that are in the first other branch. If it does,
|
||||
-- cannot fast-forward.
|
||||
diverged <- changedBranch first fullname
|
||||
if diverged
|
||||
then no_ff
|
||||
else maybe no_ff do_ff =<< findbest first rest
|
||||
where
|
||||
no_ff = return False
|
||||
do_ff branch = do
|
||||
g <- gitRepo
|
||||
liftIO $ Git.run g "update-ref" [Param fullname, Param branch]
|
||||
return True
|
||||
findbest c [] = return $ Just c
|
||||
findbest c (r:rs)
|
||||
| c == r = findbest c rs
|
||||
| otherwise = do
|
||||
better <- changedBranch c r
|
||||
worse <- changedBranch r c
|
||||
case (better, worse) of
|
||||
(True, True) -> return Nothing -- divergent fail
|
||||
(True, False) -> findbest r rs -- better
|
||||
(False, True) -> findbest c rs -- worse
|
||||
(False, False) -> findbest c rs -- same
|
||||
|
||||
{- Avoids updating the branch. A useful optimisation when the branch
|
||||
- is known to have not changed, or git-annex won't be relying on info
|
||||
- from it. -}
|
||||
|
|
8
debian/changelog
vendored
8
debian/changelog
vendored
|
@ -1,3 +1,11 @@
|
|||
git-annex (3.20111106) UNRELEASED; urgency=low
|
||||
|
||||
* merge: Use fast-forward merges when possible.
|
||||
Thanks Valentin Haenel for a test case showing how non-fast-forward
|
||||
merges could result in an ongoing pull/merge/push cycle.
|
||||
|
||||
-- Joey Hess <joeyh@debian.org> Sun, 06 Nov 2011 14:57:57 -0400
|
||||
|
||||
git-annex (3.20111105) unstable; urgency=low
|
||||
|
||||
* The default backend used when adding files to the annex is changed
|
||||
|
|
|
@ -27,3 +27,9 @@ But as sometimes annex-merge takes time, it would probably be worth it
|
|||
>
|
||||
> Although, perhaps fast-forward merge would use slightly
|
||||
> less space. --[[Joey]]
|
||||
|
||||
>> To avoid the ladder-merge between two repositories described at
|
||||
>> <http://sprunge.us/LOMU>, seems a fast-forward should be detected and
|
||||
>> written to git, even if the index is still updated the current way.
|
||||
>> [[done]]
|
||||
>> --[[Joey]]
|
||||
|
|
Loading…
Reference in a new issue