correctly update keys db in merge conflict

This is quite a subtle edge case, see the bug report for full details.

The second git diff is needed only when there's a merge conflict.
It would be possible to speed it up marginally by using
--diff-filter=Unmerged, but probably not enough to bother with.

Sponsored-by: Graham Spencer on Patreon
This commit is contained in:
Joey Hess 2021-06-07 12:52:36 -04:00
parent da24034331
commit 0101363eb8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 87 additions and 23 deletions

View file

@ -245,7 +245,8 @@ reconcileStaged qh = do
go cur indexcache (Just newtree) = do go cur indexcache (Just newtree) = do
oldtree <- getoldtree oldtree <- getoldtree
when (oldtree /= newtree) $ do when (oldtree /= newtree) $ do
updatetodiff (fromRef oldtree) (fromRef newtree) updatetodiff (Just (fromRef oldtree)) (fromRef newtree) procdiff
>>= flushdb . fst
liftIO $ writeFile indexcache $ showInodeCache cur liftIO $ writeFile indexcache $ showInodeCache cur
-- Storing the tree in a ref makes sure it does not -- Storing the tree in a ref makes sure it does not
-- get garbage collected, and is available to diff -- get garbage collected, and is available to diff
@ -253,22 +254,34 @@ reconcileStaged qh = do
inRepo $ update' lastindexref newtree inRepo $ update' lastindexref newtree
-- git write-tree will fail if the index is locked or when there is -- git write-tree will fail if the index is locked or when there is
-- a merge conflict. To get up-to-date with the current index, -- a merge conflict. To get up-to-date with the current index,
-- diff --cached with the old index tree. The current index tree -- diff --staged with the old index tree. The current index tree
-- is not known, so not recorded, and the inode cache is not updated, -- is not known, so not recorded, and the inode cache is not updated,
-- so the next time git-annex runs, it will diff again, even -- so the next time git-annex runs, it will diff again, even
-- if the index is unchanged. -- if the index is unchanged.
--
-- When there is a merge conflict, that will not see the new local
-- version of the files that are conflicted. So a second diff
-- is done, with --staged but no old tree.
go _ _ Nothing = do go _ _ Nothing = do
oldtree <- getoldtree oldtree <- getoldtree
updatetodiff (fromRef oldtree) "--cached" (changed, conflicted) <- updatetodiff
(Just (fromRef oldtree)) "--staged" procdiff
changed' <- if conflicted
then fst <$> updatetodiff Nothing "--staged"
procmergeconflictdiff
else pure False
flushdb (changed || changed')
updatetodiff old new = do updatetodiff old new processor = do
(l, cleanup) <- inRepo $ pipeNullSplit' $ diff old new (l, cleanup) <- inRepo $ pipeNullSplit' $ diff old new
changed <- procdiff l False processor l False False
void $ liftIO cleanup `finally` void (liftIO cleanup)
-- Flush database changes immediately
-- so other processes can see them. -- Flush database changes immediately
when changed $ -- so other processes can see them.
liftIO $ H.flushDbQueue qh flushdb changed
| changed = liftIO $ H.flushDbQueue qh
| otherwise = noop
-- Avoid running smudge clean filter, which would block trying to -- Avoid running smudge clean filter, which would block trying to
-- access the locked database. git write-tree sometimes calls it, -- access the locked database. git write-tree sometimes calls it,
@ -288,8 +301,8 @@ reconcileStaged qh = do
-- (The -G option may make it be used otherwise.) -- (The -G option may make it be used otherwise.)
[ Param "-c", Param "diff.external=" [ Param "-c", Param "diff.external="
, Param "diff" , Param "diff"
, Param old ] ++ maybeToList (Param <$> old) ++
, Param new [ Param new
, Param "--raw" , Param "--raw"
, Param "-z" , Param "-z"
, Param "--no-abbrev" , Param "--no-abbrev"
@ -307,12 +320,13 @@ reconcileStaged qh = do
, Param "--no-ext-diff" , Param "--no-ext-diff"
] ]
procdiff (info:file:rest) changed procdiff (info:file:rest) changed conflicted
| ":" `S.isPrefixOf` info = case S8.words info of | ":" `S.isPrefixOf` info = case S8.words info of
(_colonsrcmode:dstmode:srcsha:dstsha:status:[]) -> do (_colonsrcmode:dstmode:srcsha:dstsha:status:[]) -> do
let conflicted' = status == "U"
-- avoid removing associated file when -- avoid removing associated file when
-- there is a merge conflict -- there is a merge conflict
removed <- if status /= "U" removed <- if not conflicted'
then catKey (Ref srcsha) >>= \case then catKey (Ref srcsha) >>= \case
Just oldkey -> do Just oldkey -> do
liftIO $ SQL.removeAssociatedFile oldkey liftIO $ SQL.removeAssociatedFile oldkey
@ -330,9 +344,32 @@ reconcileStaged qh = do
reconcilerace (asTopFilePath file) key reconcilerace (asTopFilePath file) key
return True return True
Nothing -> return False Nothing -> return False
procdiff rest (changed || removed || added) procdiff rest
_ -> return changed -- parse failed (changed || removed || added)
procdiff _ changed = return changed (conflicted || conflicted')
_ -> return (changed, conflicted) -- parse failed
procdiff _ changed conflicted = return (changed, conflicted)
-- Processing a diff --index when there is a merge conflict.
-- This diff will have the new local version of a file as the
-- first sha, and a null sha as the second sha, and we only
-- care about files that are in conflict.
procmergeconflictdiff (info:file:rest) changed conflicted
| ":" `S.isPrefixOf` info = case S8.words info of
(_colonmode:_mode:sha:_sha:status:[]) -> do
let conflicted' = status == "U"
added <- catKey (Ref sha) >>= \case
Just key -> do
liftIO $ SQL.addAssociatedFile key
(asTopFilePath file)
(SQL.WriteHandle qh)
return True
Nothing -> return False
procmergeconflictdiff rest
(changed || added)
(conflicted || conflicted')
_ -> return (changed, conflicted) -- parse failed
procmergeconflictdiff _ changed conflicted = return (changed, conflicted)
reconcilerace file key = do reconcilerace file key = do
caches <- liftIO $ SQL.getInodeCaches key (SQL.ReadHandle qh) caches <- liftIO $ SQL.getInodeCaches key (SQL.ReadHandle qh)

View file

@ -1,10 +1,10 @@
Found a case where the associated files in the keys db end up out-of-date. Found a case where the associated files in the keys db end up out-of-date.
Make a repo with an unlocked file, clone it to a second repo, and set up a Make a repo with an locked file, clone it to a second repo, and set up a
conflict involving that file in both repos, using git-annex add to add the conflict involving that file in both repos, using git-annex add to add the
conflicting version, and not running other git-annex commands after that, conflicting version, committing, and not running other git-annex commands
before pulling the conflicting branch. When the associated files db after that, before pulling the conflicting branch. When the associated
gets updated in the conflict situation, only 1 key has the conflicting file files db gets updated in the conflict situation, only 1 key has the
associated with it, rather than 2 or 3. conflicting file associated with it, rather than 2 or 3.
The original key before the conflict has the file associated with it, but The original key before the conflict has the file associated with it, but
the new local key and new remote key do not. the new local key and new remote key do not.
@ -21,8 +21,35 @@ git-annex updates the keys db. So, one solution to this bug will be for
git-annex to also update the keys db when staging locked files. git-annex to also update the keys db when staging locked files.
(Unfortunately this would make mass adds somewhat slower.) (Unfortunately this would make mass adds somewhat slower.)
Or, possibly, for reconcileStaged to not use git diff --index in this case, Or, possibly, for reconcileStaged to not use git diff --cached in this case,
but git diff with -1 and -3. That lets both sides of the merge conflict be but git diff with -1 and -3. That lets both sides of the merge conflict be
accessed, and it could then add the file to both keys. As well as not accessed, and it could then add the file to both keys. As well as not
slowing down git-annex add, this would let it honor the preferred content slowing down git-annex add, this would let it honor the preferred content
of the conflicting file for all 3 keys. --[[Joey]] of the conflicting file for all 3 keys. --[[Joey]]
> On second thought, it's not really necessary that all 3 keys have the
> conflicted file associated with them. The original key doesn't because
> the user has already changed the file to use the new key. The new remote
> key does not really need to, and there might not even be any effect if it
> did. The new local key is the one that this bug is really about.
>
> Consider that checkDrop uses catKeyFile to double-check the associated
> files. And that will see the file pointing to the new local key. So
> if the original key or new remote key are also associated with the file,
> it will ignore them and drop anyway. And that's ok, from the user's
> perspective the one it needs to retain is the one that the file in the
> working tree uses, which is the new local key.
>
> > Hmm, -1 and -3 are not what's needed to get the new local key.
> > It's using `git diff oldtree --cached`, and the code preserves the old
> > key when it sees a merge conflict. Using instead
> > `git diff HEAD --cached` has the new key as the src sha, and nullsha as
> > the dst sha.
> >
> > However, the diff with the old tree is needed to incrementally
> > update when it's not in the middle of a merge conflict.
> > So what can be done is do the diff as now; when it sees a merge
> > conflict, run diff a second time with `HEAD --cached` to get the new
> > key.
> >
> > > [[done]] --[[Joey]]