assistant: Avoid unncessary git repository repair

In a situation where git fsck gets confused about a commit that is made
while it's running.

Sponsored-by: Graham Spencer on Patreon
This commit is contained in:
Joey Hess 2021-06-30 17:57:49 -04:00
parent ee2e7c28a6
commit d2c48404a8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 62 additions and 17 deletions

View file

@ -185,7 +185,7 @@ runActivity' urlrenderer (ScheduledSelfFsck _ d) = do
g <- liftAnnex gitRepo
fsckresults <- showFscking urlrenderer Nothing $ tryNonAsync $ do
void $ batchCommand program (Param "fsck" : annexFsckParams d)
Git.Fsck.findBroken True g
Git.Fsck.findBroken True False g
u <- liftAnnex getUUID
void $ repairWhenNecessary urlrenderer u Nothing fsckresults
mapM_ reget =<< liftAnnex (dirKeys gitAnnexBadDir)
@ -214,7 +214,7 @@ runActivity' urlrenderer (ScheduledRemoteFsck u s d) = dispatch =<< liftAnnex (r
fsckresults <- showFscking urlrenderer (Just rmt) $ tryNonAsync $ do
void annexfscker
if Git.repoIsLocal repo && not (Git.repoIsLocalUnknown repo)
then Just <$> Git.Fsck.findBroken True repo
then Just <$> Git.Fsck.findBroken True True repo
else pure Nothing
maybe noop (void . repairWhenNecessary urlrenderer u (Just rmt)) fsckresults

View file

@ -60,7 +60,7 @@ handleRemoteProblem' repo urlrenderer rmt
( do
fixedlocks <- repairStaleGitLocks repo
fsckresults <- showFscking urlrenderer (Just rmt) $ tryNonAsync $
Git.Fsck.findBroken True repo
Git.Fsck.findBroken True True repo
repaired <- repairWhenNecessary urlrenderer (Remote.uuid rmt) (Just rmt) fsckresults
return $ fixedlocks || repaired
, return False

View file

@ -1,3 +1,10 @@
git-annex (8.20210631) UNRELEASED; urgency=medium
* assistant: Avoid unncessary git repository repair in a situation where
git fsck gets confused about a commit that is made while it's running.
-- Joey Hess <id@joeyh.name> Wed, 30 Jun 2021 17:55:10 -0400
git-annex (8.20210630) upstream; urgency=medium
* Fixed bug that interrupting git-annex repair (or assistant) while

View file

@ -1,4 +1,5 @@
{- git fsck interface
i it is not fully repoducibleI repeated the same steps
-
- Copyright 2013 Joey Hess <id@joeyh.name>
-
@ -69,9 +70,17 @@ instance Monoid FsckOutput where
- look for anything in its output (both stdout and stderr) that appears
- to be a git sha. Not all such shas are of broken objects, so ask git
- to try to cat the object, and see if it fails.
-
- Note that there is a possible false positive: When changes are being
- made to the repo while this is running, fsck might complain about a
- missing object that has not made it to disk yet. Catting the object
- then succeeds, so it's not included in the FsckResults. But, fsck then
- exits nonzero, and so FsckFailed is returned. Set ignorenonzeroexit
- to avoid this false positive, at the risk of perhaps missing a problem
- so bad that fsck crashes without outputting any missing shas.
-}
findBroken :: Bool -> Repo -> IO FsckResults
findBroken batchmode r = do
findBroken :: Bool -> Bool -> Repo -> IO FsckResults
findBroken batchmode ignorenonzeroexit r = do
let (command, params) = ("git", fsckParams r)
(command', params') <- if batchmode
then toBatchCommand (command, params)
@ -90,10 +99,10 @@ findBroken batchmode r = do
fsckok <- checkSuccessProcess pid
case mappend o1 o2 of
FsckOutput badobjs truncated
| S.null badobjs && not fsckok -> return FsckFailed
| S.null badobjs && not fsckok -> return fsckfailed
| otherwise -> return $ FsckFoundMissing badobjs truncated
NoFsckOutput
| not fsckok -> return FsckFailed
| not fsckok -> return fsckfailed
| otherwise -> return noproblem
-- If all fsck output was duplicateEntries warnings,
-- the repository is not broken, it just has some
@ -104,6 +113,9 @@ findBroken batchmode r = do
maxobjs = 10000
noproblem = FsckFoundMissing S.empty False
fsckfailed
| ignorenonzeroexit = noproblem
| otherwise = FsckFailed
foundBroken :: FsckResults -> Bool
foundBroken FsckFailed = True

View file

@ -476,7 +476,7 @@ runRepair :: (Ref -> Bool) -> Bool -> Repo -> IO (Bool, [Branch])
runRepair removablebranch forced g = do
preRepair g
putStrLn "Running git fsck ..."
fsckresult <- findBroken False g
fsckresult <- findBroken False False g
if foundBroken fsckresult
then do
putStrLn "Fsck found problems, attempting repair."
@ -500,7 +500,7 @@ runRepairOf fsckresult removablebranch forced referencerepo g = do
runRepair' :: (Ref -> Bool) -> FsckResults -> Bool -> Maybe FilePath -> Repo -> IO (Bool, [Branch])
runRepair' removablebranch fsckresult forced referencerepo g = do
cleanCorruptObjects fsckresult g
missing <- findBroken False g
missing <- findBroken False False g
stillmissing <- retrieveMissingObjects missing referencerepo g
case stillmissing of
FsckFoundMissing s t
@ -529,7 +529,7 @@ runRepair' removablebranch fsckresult forced referencerepo g = do
| forced -> ifM (pure (repoIsLocalBare g) <||> checkIndex g)
( do
cleanCorruptObjects FsckFailed g
stillmissing' <- findBroken False g
stillmissing' <- findBroken False False g
case stillmissing' of
FsckFailed -> return (False, [])
FsckFoundMissing s t -> forcerepair s t
@ -575,7 +575,7 @@ runRepair' removablebranch fsckresult forced referencerepo g = do
-- the repair process.
if fscktruncated
then do
fsckresult' <- findBroken False g
fsckresult' <- findBroken False False g
case fsckresult' of
FsckFailed -> do
putStrLn "git fsck is failing"
@ -597,7 +597,7 @@ runRepair' removablebranch fsckresult forced referencerepo g = do
removeWhenExistsWith R.removeLink (indexFile g)
-- The corrupted index can prevent fsck from finding other
-- problems, so re-run repair.
fsckresult' <- findBroken False g
fsckresult' <- findBroken False False g
result <- runRepairOf fsckresult' removablebranch forced referencerepo g
putStrLn "Removed the corrupted index file. You should look at what files are present in your working tree and git add them back to the index when appropriate."
return result

View file

@ -14,13 +14,39 @@ repair. git fsck reported no problems at all. --[[Joey]]
> the fsck exited nonzero for some reason, but did not detect
> any misssing shas.
>
> Reproed on my own laptop, with the family annex. However,
> it is not fully repoducible, I repeated the same steps
> again and it didn't happen. Those were: Clone;
> Reproed on my own laptop, with the family annex. This reproduces it about
> 50% of the time: Clone over ssh; git-annex init;
> git remote rm origin; git annex schedule here 'fsck self 30m every day at
> any time'; git annex assistant
> any time'; git annex assistant; kill git-annex fsck process
>
> Confirmed that it's getting FsckFailed.
>
> Hypothesis: Maybe fsck is failing due to some other change
> that is being made to the git repo at the same time it's running?
> that is being made to the git repo by the assistant
> at the same time it's running?
> I noticed some files being downloaded from the web at the same
> time the failed fsck was running.
>
> Fsck output to stdout is empty, stderr is:
>
> missing commit 4da14c19140e4c240358af4518d83661713ab044
>
> Intriguingly, that commit is present. It is a commit on the git-annex
> branch. And fscking again succeeds. So, fsck found a reference to a
> commit object that had not yet been written to disk. This feels like a
> bug in git, because if it were interrupted there the repo would be left
> in a bad state.
>
> Anyway, git-annex verifies that the commit is present, to double-check
> it understood fsck correctly. And it is. So it is not considered a
> problem. But, fsck still exits nonzero because it thinks there was a
> problem. And that's the problem.
>
> Fixed by making git-annex assistant ignore fsck nonzero
> exit status when it does not find any missing objects.
> Since any actual failure that makes fsck do that can't
> be distinguished from a false positive. I left git-annex repair
> unchanged, because if the user knows the repo is badly broken and explictly
> runs it, they would be surprised if it didn't repair.
>
> [[fixed|done]] --[[Joey]]