merge: Use fast-forward merges when possible.
Thanks Valentin Haenel for a test case showing how non-fast-forward merges could result in an ongoing pull/merge/push cycle. While the git-annex branch is fast-forwarded, git-annex's index file is still updated using the union merge strategy as before. There's no other way to update the index that would be any faster. It is possible that a union merge and a fast-forward result in different file contents: Files should have the same lines, but a union merge may change their order. If this happens, the next commit made to the git-annex branch will have some unnecessary changes to line orders, but the consistency of data should be preserved. Note that when the journal contains changes, a fast-forward is never attempted, which is fine, because committing those changes would be vanishingly unlikely to leave the git-annex branch at a commit that already exists in one of the remotes. The real difficulty is handling the case where multiple remotes have all changed. git-annex does find the best (ie, newest) one and fast forwards to it. If the remotes are diverged, no fast-forward is done at all. It would be possible to pick one, fast forward to it, and make a merge commit to the rest, I see no benefit to adding that complexity. Determining the best of N changed remotes requires N*2+1 calls to git-log, but these are fast git-log calls, and N is typically small. Also, typically some or all of the remote refs will be the same, and git-log is not called to compare those. In the real world I expect this will almost always add only 1 git-log call to the merge process. (Which already makes N anyway.)
This commit is contained in:
parent
bf07e2c921
commit
c99fb58909
3 changed files with 79 additions and 23 deletions
|
@ -117,28 +117,30 @@ commit message = whenM journalDirty $ lockJournal $ do
|
||||||
g <- gitRepo
|
g <- gitRepo
|
||||||
withIndex $ liftIO $ Git.commit g message fullname [fullname]
|
withIndex $ liftIO $ Git.commit g message fullname [fullname]
|
||||||
|
|
||||||
{- Ensures that the branch is up-to-date; should be called before
|
{- Ensures that the branch is up-to-date; should be called before data is
|
||||||
- data is read from it. Runs only once per git-annex run.
|
- read from it. Runs only once per git-annex run.
|
||||||
-
|
-
|
||||||
- Before refs are merged into the index, it's
|
- Before refs are merged into the index, it's important to first stage the
|
||||||
- important to first stage the journal into the
|
- journal into the index. Otherwise, any changes in the journal would
|
||||||
- index. Otherwise, any changes in the journal
|
- later get staged, and might overwrite changes made during the merge.
|
||||||
- would later get staged, and might overwrite
|
|
||||||
- changes made during the merge.
|
|
||||||
-
|
-
|
||||||
- It would be cleaner to handle the merge by
|
- It would be cleaner to handle the merge by updating the journal, not the
|
||||||
- updating the journal, not the index, with changes
|
- index, with changes from the branches.
|
||||||
- from the branches.
|
-
|
||||||
|
- The index is always updated using a union merge, as that's the most
|
||||||
|
- efficient way to update it. However, if the branch can be
|
||||||
|
- fast-forwarded, that is then done, rather than adding an unnecessary
|
||||||
|
- commit to it.
|
||||||
-}
|
-}
|
||||||
update :: Annex ()
|
update :: Annex ()
|
||||||
update = onceonly $ do
|
update = onceonly $ do
|
||||||
|
g <- gitRepo
|
||||||
-- check what needs updating before taking the lock
|
-- check what needs updating before taking the lock
|
||||||
dirty <- journalDirty
|
dirty <- journalDirty
|
||||||
c <- filterM changedbranch =<< siblingBranches
|
c <- filterM (changedBranch name . snd) =<< siblingBranches
|
||||||
let (refs, branches) = unzip c
|
let (refs, branches) = unzip c
|
||||||
unless (not dirty && null refs) $ withIndex $ lockJournal $ do
|
unless (not dirty && null refs) $ withIndex $ lockJournal $ do
|
||||||
when dirty stageJournalFiles
|
when dirty stageJournalFiles
|
||||||
g <- gitRepo
|
|
||||||
unless (null branches) $ do
|
unless (null branches) $ do
|
||||||
showSideAction $ "merging " ++
|
showSideAction $ "merging " ++
|
||||||
(unwords $ map Git.refDescribe branches) ++
|
(unwords $ map Git.refDescribe branches) ++
|
||||||
|
@ -150,24 +152,64 @@ update = onceonly $ do
|
||||||
- modify the branch.
|
- modify the branch.
|
||||||
-}
|
-}
|
||||||
liftIO $ Git.UnionMerge.merge_index g branches
|
liftIO $ Git.UnionMerge.merge_index g branches
|
||||||
liftIO $ Git.commit g "update" fullname (nub $ fullname:refs)
|
ff <- if dirty then return False else tryFastForwardTo refs
|
||||||
|
unless ff $
|
||||||
|
liftIO $ Git.commit g "update" fullname (nub $ fullname:refs)
|
||||||
invalidateCache
|
invalidateCache
|
||||||
where
|
where
|
||||||
changedbranch (_, branch) = do
|
|
||||||
g <- gitRepo
|
|
||||||
-- checking with log to see if there have been changes
|
|
||||||
-- is less expensive than always merging
|
|
||||||
diffs <- liftIO $ Git.pipeRead g [
|
|
||||||
Param "log",
|
|
||||||
Param (name ++ ".." ++ branch),
|
|
||||||
Params "--oneline -n1"
|
|
||||||
]
|
|
||||||
return $ not $ L.null diffs
|
|
||||||
onceonly a = unlessM (branchUpdated <$> getState) $ do
|
onceonly a = unlessM (branchUpdated <$> getState) $ do
|
||||||
r <- a
|
r <- a
|
||||||
disableUpdate
|
disableUpdate
|
||||||
return r
|
return r
|
||||||
|
|
||||||
|
{- Checks if the second branch has any commits not present on the first
|
||||||
|
- branch. -}
|
||||||
|
changedBranch :: String -> String -> Annex Bool
|
||||||
|
changedBranch origbranch newbranch = do
|
||||||
|
g <- gitRepo
|
||||||
|
diffs <- liftIO $ Git.pipeRead g [
|
||||||
|
Param "log",
|
||||||
|
Param (origbranch ++ ".." ++ newbranch),
|
||||||
|
Params "--oneline -n1"
|
||||||
|
]
|
||||||
|
return $ not $ L.null diffs
|
||||||
|
|
||||||
|
{- Given a set of refs that are all known to have commits not
|
||||||
|
- on the git-annex branch, tries to update the branch by a
|
||||||
|
- fast-forward.
|
||||||
|
-
|
||||||
|
- In order for that to be possible, one of the refs must contain
|
||||||
|
- every commit present in all the other refs, as well as in the
|
||||||
|
- git-annex branch.
|
||||||
|
-}
|
||||||
|
tryFastForwardTo :: [String] -> Annex Bool
|
||||||
|
tryFastForwardTo [] = return True
|
||||||
|
tryFastForwardTo (first:rest) = do
|
||||||
|
-- First, check that the git-annex branch does not contain any
|
||||||
|
-- new commits that are in the first other branch. If it does,
|
||||||
|
-- cannot fast-forward.
|
||||||
|
diverged <- changedBranch first fullname
|
||||||
|
if diverged
|
||||||
|
then no_ff
|
||||||
|
else maybe no_ff do_ff =<< findbest first rest
|
||||||
|
where
|
||||||
|
no_ff = return False
|
||||||
|
do_ff branch = do
|
||||||
|
g <- gitRepo
|
||||||
|
liftIO $ Git.run g "update-ref" [Param fullname, Param branch]
|
||||||
|
return True
|
||||||
|
findbest c [] = return $ Just c
|
||||||
|
findbest c (r:rs)
|
||||||
|
| c == r = findbest c rs
|
||||||
|
| otherwise = do
|
||||||
|
better <- changedBranch c r
|
||||||
|
worse <- changedBranch r c
|
||||||
|
case (better, worse) of
|
||||||
|
(True, True) -> return Nothing -- divergent fail
|
||||||
|
(True, False) -> findbest r rs -- better
|
||||||
|
(False, True) -> findbest c rs -- worse
|
||||||
|
(False, False) -> findbest c rs -- same
|
||||||
|
|
||||||
{- Avoids updating the branch. A useful optimisation when the branch
|
{- Avoids updating the branch. A useful optimisation when the branch
|
||||||
- is known to have not changed, or git-annex won't be relying on info
|
- is known to have not changed, or git-annex won't be relying on info
|
||||||
- from it. -}
|
- from it. -}
|
||||||
|
|
8
debian/changelog
vendored
8
debian/changelog
vendored
|
@ -1,3 +1,11 @@
|
||||||
|
git-annex (3.20111106) UNRELEASED; urgency=low
|
||||||
|
|
||||||
|
* merge: Use fast-forward merges when possible.
|
||||||
|
Thanks Valentin Haenel for a test case showing how non-fast-forward
|
||||||
|
merges could result in an ongoing pull/merge/push cycle.
|
||||||
|
|
||||||
|
-- Joey Hess <joeyh@debian.org> Sun, 06 Nov 2011 14:57:57 -0400
|
||||||
|
|
||||||
git-annex (3.20111105) unstable; urgency=low
|
git-annex (3.20111105) unstable; urgency=low
|
||||||
|
|
||||||
* The default backend used when adding files to the annex is changed
|
* The default backend used when adding files to the annex is changed
|
||||||
|
|
|
@ -27,3 +27,9 @@ But as sometimes annex-merge takes time, it would probably be worth it
|
||||||
>
|
>
|
||||||
> Although, perhaps fast-forward merge would use slightly
|
> Although, perhaps fast-forward merge would use slightly
|
||||||
> less space. --[[Joey]]
|
> less space. --[[Joey]]
|
||||||
|
|
||||||
|
>> To avoid the ladder-merge between two repositories described at
|
||||||
|
>> <http://sprunge.us/LOMU>, seems a fast-forward should be detected and
|
||||||
|
>> written to git, even if the index is still updated the current way.
|
||||||
|
>> [[done]]
|
||||||
|
>> --[[Joey]]
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue