run reconcileStaged even in smudge clean filter, using alternate code path
Improved workaround for git 2.50 bug, avoding an occasional test suite failure, as well as some situations where an unlocked file did not get populated when adding another file to the repository with the same content. This uses the alternate code path that was already using when there was a conflict. Since that code path is not able to record its work, it will redo the same work next time. If the only way reconcileStaged is getting run is via the smudge clean filter, that could result in more and more changes getting processed redundantly each time. Once some other git-annex command runs and calls reconcileStaged, it will stop redoing that work. I don't think the extra work will be a problem.
This commit is contained in:
parent
65a1cf54ce
commit
cf449837ea
3 changed files with 101 additions and 22 deletions
|
@ -12,6 +12,10 @@ git-annex (10.20250631) UNRELEASED; urgency=medium
|
||||||
branch in a linked worktree on a filesystem not supporting symlinks.
|
branch in a linked worktree on a filesystem not supporting symlinks.
|
||||||
* Add --url option and url= preferred content expression, to match
|
* Add --url option and url= preferred content expression, to match
|
||||||
content that is recorded as present in an url.
|
content that is recorded as present in an url.
|
||||||
|
* Improved workaround for git 2.50 bug, avoding an occasional test suite
|
||||||
|
failure, as well as some situations where an unlocked file did not get
|
||||||
|
populated when adding another file to the repository with the same
|
||||||
|
content.
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Mon, 07 Jul 2025 15:59:42 -0400
|
-- Joey Hess <id@joeyh.name> Mon, 07 Jul 2025 15:59:42 -0400
|
||||||
|
|
||||||
|
|
|
@ -260,7 +260,7 @@ isInodeKnown i s = or <$> runReaderIO ContentTable
|
||||||
- is an associated file.
|
- is an associated file.
|
||||||
-}
|
-}
|
||||||
reconcileStaged :: Bool -> H.DbQueue -> Annex DbTablesChanged
|
reconcileStaged :: Bool -> H.DbQueue -> Annex DbTablesChanged
|
||||||
reconcileStaged dbisnew qh = ifM notneeded
|
reconcileStaged dbisnew qh = ifM isBareRepo
|
||||||
( return mempty
|
( return mempty
|
||||||
, do
|
, do
|
||||||
gitindex <- inRepo currentIndexFile
|
gitindex <- inRepo currentIndexFile
|
||||||
|
@ -299,12 +299,12 @@ reconcileStaged dbisnew qh = ifM notneeded
|
||||||
inRepo $ update' lastindexref newtree
|
inRepo $ update' lastindexref newtree
|
||||||
fastDebug "Database.Keys" "reconcileStaged end"
|
fastDebug "Database.Keys" "reconcileStaged end"
|
||||||
return (DbTablesChanged True True)
|
return (DbTablesChanged True True)
|
||||||
-- git write-tree will fail if the index is locked or when there is
|
-- Was not able to run git write-tree, or it failed due to the
|
||||||
-- a merge conflict. To get up-to-date with the current index,
|
-- index being locked or a merge conflict. To get up-to-date with
|
||||||
-- diff --staged with the old index tree. The current index tree
|
-- the current index, diff --staged with the old index tree. The
|
||||||
-- is not known, so not recorded, and the inode cache is not updated,
|
-- current index tree is not known, so not recorded, and the inode
|
||||||
-- so the next time git-annex runs, it will diff again, even
|
-- cache is not updated, so the next time git-annex runs, it will
|
||||||
-- if the index is unchanged.
|
-- diff again, even if the index is unchanged.
|
||||||
--
|
--
|
||||||
-- When there is a merge conflict, that will not see the new local
|
-- When there is a merge conflict, that will not see the new local
|
||||||
-- version of the files that are conflicted. So a second diff
|
-- version of the files that are conflicted. So a second diff
|
||||||
|
@ -327,21 +327,22 @@ reconcileStaged dbisnew qh = ifM notneeded
|
||||||
processor l False
|
processor l False
|
||||||
`finally` void cleanup
|
`finally` void cleanup
|
||||||
|
|
||||||
-- Avoid running smudge clean filter, which would block trying to
|
-- This avoids running git write-tree when run by the smudge clean
|
||||||
-- access the locked database. git write-tree sometimes calls it,
|
-- filter, in order to work around a bug in git. That causes
|
||||||
-- even though it is not adding work tree files to the index,
|
-- git merge to fail with an internal error when git write-tree is
|
||||||
-- and so the filter cannot have an effect on the contents of the
|
-- run by the smudge clean filter in conflicted merge situation.
|
||||||
-- index or on the tree that gets written from it.
|
--
|
||||||
getindextree = inRepo $ \r -> writeTreeQuiet $ r
|
-- When running git write-tree, avoid it running the smudge clean
|
||||||
{ gitGlobalOpts = gitGlobalOpts r ++ bypassSmudgeConfig }
|
-- filter, which would block trying to access the locked database.
|
||||||
|
-- git write-tree sometimes calls it, even though it is not adding
|
||||||
notneeded = isBareRepo
|
-- work tree files to the index, and so the filter cannot have an
|
||||||
-- Avoid doing anything when run by the
|
-- effect on the contents of the index or on the tree that gets
|
||||||
-- smudge clean filter. When that happens in a conflicted
|
-- written from it.
|
||||||
-- merge situation, running git write-tree
|
getindextree = ifM (Annex.getState Annex.insmudgecleanfilter)
|
||||||
-- here would cause git merge to fail with an internal
|
( return Nothing
|
||||||
-- error. This works around around that bug in git.
|
, inRepo $ \r -> writeTreeQuiet $ r
|
||||||
<||> Annex.getState Annex.insmudgecleanfilter
|
{ gitGlobalOpts = gitGlobalOpts r ++ bypassSmudgeConfig }
|
||||||
|
)
|
||||||
|
|
||||||
diff old new =
|
diff old new =
|
||||||
-- Avoid running smudge clean filter, since we want the
|
-- Avoid running smudge clean filter, since we want the
|
||||||
|
|
74
doc/bugs/flaky_test_failure_add_dup.mdwn
Normal file
74
doc/bugs/flaky_test_failure_add_dup.mdwn
Normal file
|
@ -0,0 +1,74 @@
|
||||||
|
Repo Tests v10 unlocked
|
||||||
|
[...]
|
||||||
|
add dup: FAIL (0.25s)
|
||||||
|
./Test/Framework.hs:393:
|
||||||
|
checkcontent foo
|
||||||
|
expected: "annexed file content"
|
||||||
|
but got: "/annex/objects/SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77\n"
|
||||||
|
Use -p '/add dup/' to rerun this test only.
|
||||||
|
|
||||||
|
I am able to produce this failure after about a minute of running the
|
||||||
|
test in a loop with:
|
||||||
|
|
||||||
|
while git-annex test -p '/add dup/' ; do :;done
|
||||||
|
|
||||||
|
Inside the test repo, file "foo" indeed is an unpopulated pointer file,
|
||||||
|
despite the file "foodup", which has the same git-annex key, being populated.
|
||||||
|
|
||||||
|
Reverting [[!commit fb155b1e3e59cc1f9cf8a4fe7d47cba49d1c81af]] avoids
|
||||||
|
this test suite failure. (Or at least if it is flaky, it's much mess likely to
|
||||||
|
fail. I ran the loop for 10 minutes.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
What the test suite is doing is a `git add` and is using the smudge filter
|
||||||
|
to add the new file as an unlocked annexed file.
|
||||||
|
|
||||||
|
I have reproduced the problem doing the same outside the test suite.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
It seems that the keys database does not always get updated to indicate the
|
||||||
|
key used by file "foo". As shown here looking at the keys db in a failed
|
||||||
|
testcase dir:
|
||||||
|
|
||||||
|
sqlite> select * from associated;
|
||||||
|
1|SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77|foodup
|
||||||
|
sqlite>
|
||||||
|
|
||||||
|
That happens in a fresh clone of the repository. So, `reconcileStaged` never
|
||||||
|
gets a chance to do anything, because it's only ever called from inside the
|
||||||
|
smudge filter. And [[!commit fb155b1e3e59cc1f9cf8a4fe7d47cba49d1c81af]] made it
|
||||||
|
not run in the smudge filter.
|
||||||
|
|
||||||
|
This particular failure could be avoided if `git-annex init` called
|
||||||
|
`reconcileStaged`. Then it would learn about pointer files in the tree.
|
||||||
|
|
||||||
|
But would that be a complete fix for all situations? If the user is
|
||||||
|
only running git-annex via `git add` (the smudge clean filter),
|
||||||
|
but is making other changes to the tree too, I don't think
|
||||||
|
it would. Consider for example:
|
||||||
|
|
||||||
|
git clone r r2
|
||||||
|
cd r2
|
||||||
|
git-annex fsck
|
||||||
|
git mv foo bar
|
||||||
|
git config annex.largefiles anything
|
||||||
|
echo hi > baz
|
||||||
|
git add baz
|
||||||
|
|
||||||
|
In the above example, the `git-annex fsck` updates the associated files,
|
||||||
|
so it know that the file foo has the key. But then foo is renamed to bar
|
||||||
|
and when `git add` is run on a file, generating the same key,
|
||||||
|
`reconcileStaged` does not update the associated files, so it does not
|
||||||
|
know about the rename to baz. So it leaves bar unpopulated.
|
||||||
|
|
||||||
|
Conclusion: `reconcileStaged` needs to run even in the smudge clean filter.
|
||||||
|
But to avoid the git bug worked around by
|
||||||
|
[[!commit fb155b1e3e59cc1f9cf8a4fe7d47cba49d1c81af]], it must avoid
|
||||||
|
running `git write-tree` when called in smudge clean, at least when
|
||||||
|
in a conflicted merge situation. Luckily, `reconcileStaged` does contain
|
||||||
|
code to update things when it is unable to run `git write-tree`, so that
|
||||||
|
only needs to be used when in the smudge clean filter.
|
||||||
|
|
||||||
|
> [[fixed|done]] and confirmed the fix works with git 2.50. --[[Joey]]
|
Loading…
Add table
Add a link
Reference in a new issue