assistant: Avoid unncessary git repository repair

In a situation where git fsck gets confused about a commit that is made
while it's running.

Sponsored-by: Graham Spencer on Patreon
This commit is contained in:
Joey Hess 2021-06-30 17:57:49 -04:00
parent ee2e7c28a6
commit d2c48404a8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 62 additions and 17 deletions

View file

@ -185,7 +185,7 @@ runActivity' urlrenderer (ScheduledSelfFsck _ d) = do
g <- liftAnnex gitRepo g <- liftAnnex gitRepo
fsckresults <- showFscking urlrenderer Nothing $ tryNonAsync $ do fsckresults <- showFscking urlrenderer Nothing $ tryNonAsync $ do
void $ batchCommand program (Param "fsck" : annexFsckParams d) void $ batchCommand program (Param "fsck" : annexFsckParams d)
Git.Fsck.findBroken True g Git.Fsck.findBroken True False g
u <- liftAnnex getUUID u <- liftAnnex getUUID
void $ repairWhenNecessary urlrenderer u Nothing fsckresults void $ repairWhenNecessary urlrenderer u Nothing fsckresults
mapM_ reget =<< liftAnnex (dirKeys gitAnnexBadDir) mapM_ reget =<< liftAnnex (dirKeys gitAnnexBadDir)
@ -214,7 +214,7 @@ runActivity' urlrenderer (ScheduledRemoteFsck u s d) = dispatch =<< liftAnnex (r
fsckresults <- showFscking urlrenderer (Just rmt) $ tryNonAsync $ do fsckresults <- showFscking urlrenderer (Just rmt) $ tryNonAsync $ do
void annexfscker void annexfscker
if Git.repoIsLocal repo && not (Git.repoIsLocalUnknown repo) if Git.repoIsLocal repo && not (Git.repoIsLocalUnknown repo)
then Just <$> Git.Fsck.findBroken True repo then Just <$> Git.Fsck.findBroken True True repo
else pure Nothing else pure Nothing
maybe noop (void . repairWhenNecessary urlrenderer u (Just rmt)) fsckresults maybe noop (void . repairWhenNecessary urlrenderer u (Just rmt)) fsckresults

View file

@ -60,7 +60,7 @@ handleRemoteProblem' repo urlrenderer rmt
( do ( do
fixedlocks <- repairStaleGitLocks repo fixedlocks <- repairStaleGitLocks repo
fsckresults <- showFscking urlrenderer (Just rmt) $ tryNonAsync $ fsckresults <- showFscking urlrenderer (Just rmt) $ tryNonAsync $
Git.Fsck.findBroken True repo Git.Fsck.findBroken True True repo
repaired <- repairWhenNecessary urlrenderer (Remote.uuid rmt) (Just rmt) fsckresults repaired <- repairWhenNecessary urlrenderer (Remote.uuid rmt) (Just rmt) fsckresults
return $ fixedlocks || repaired return $ fixedlocks || repaired
, return False , return False

View file

@ -1,3 +1,10 @@
git-annex (8.20210631) UNRELEASED; urgency=medium
* assistant: Avoid unncessary git repository repair in a situation where
git fsck gets confused about a commit that is made while it's running.
-- Joey Hess <id@joeyh.name> Wed, 30 Jun 2021 17:55:10 -0400
git-annex (8.20210630) upstream; urgency=medium git-annex (8.20210630) upstream; urgency=medium
* Fixed bug that interrupting git-annex repair (or assistant) while * Fixed bug that interrupting git-annex repair (or assistant) while

View file

@ -1,4 +1,5 @@
{- git fsck interface {- git fsck interface
i it is not fully repoducibleI repeated the same steps
- -
- Copyright 2013 Joey Hess <id@joeyh.name> - Copyright 2013 Joey Hess <id@joeyh.name>
- -
@ -69,9 +70,17 @@ instance Monoid FsckOutput where
- look for anything in its output (both stdout and stderr) that appears - look for anything in its output (both stdout and stderr) that appears
- to be a git sha. Not all such shas are of broken objects, so ask git - to be a git sha. Not all such shas are of broken objects, so ask git
- to try to cat the object, and see if it fails. - to try to cat the object, and see if it fails.
-
- Note that there is a possible false positive: When changes are being
- made to the repo while this is running, fsck might complain about a
- missing object that has not made it to disk yet. Catting the object
- then succeeds, so it's not included in the FsckResults. But, fsck then
- exits nonzero, and so FsckFailed is returned. Set ignorenonzeroexit
- to avoid this false positive, at the risk of perhaps missing a problem
- so bad that fsck crashes without outputting any missing shas.
-} -}
findBroken :: Bool -> Repo -> IO FsckResults findBroken :: Bool -> Bool -> Repo -> IO FsckResults
findBroken batchmode r = do findBroken batchmode ignorenonzeroexit r = do
let (command, params) = ("git", fsckParams r) let (command, params) = ("git", fsckParams r)
(command', params') <- if batchmode (command', params') <- if batchmode
then toBatchCommand (command, params) then toBatchCommand (command, params)
@ -90,10 +99,10 @@ findBroken batchmode r = do
fsckok <- checkSuccessProcess pid fsckok <- checkSuccessProcess pid
case mappend o1 o2 of case mappend o1 o2 of
FsckOutput badobjs truncated FsckOutput badobjs truncated
| S.null badobjs && not fsckok -> return FsckFailed | S.null badobjs && not fsckok -> return fsckfailed
| otherwise -> return $ FsckFoundMissing badobjs truncated | otherwise -> return $ FsckFoundMissing badobjs truncated
NoFsckOutput NoFsckOutput
| not fsckok -> return FsckFailed | not fsckok -> return fsckfailed
| otherwise -> return noproblem | otherwise -> return noproblem
-- If all fsck output was duplicateEntries warnings, -- If all fsck output was duplicateEntries warnings,
-- the repository is not broken, it just has some -- the repository is not broken, it just has some
@ -104,6 +113,9 @@ findBroken batchmode r = do
maxobjs = 10000 maxobjs = 10000
noproblem = FsckFoundMissing S.empty False noproblem = FsckFoundMissing S.empty False
fsckfailed
| ignorenonzeroexit = noproblem
| otherwise = FsckFailed
foundBroken :: FsckResults -> Bool foundBroken :: FsckResults -> Bool
foundBroken FsckFailed = True foundBroken FsckFailed = True

View file

@ -476,7 +476,7 @@ runRepair :: (Ref -> Bool) -> Bool -> Repo -> IO (Bool, [Branch])
runRepair removablebranch forced g = do runRepair removablebranch forced g = do
preRepair g preRepair g
putStrLn "Running git fsck ..." putStrLn "Running git fsck ..."
fsckresult <- findBroken False g fsckresult <- findBroken False False g
if foundBroken fsckresult if foundBroken fsckresult
then do then do
putStrLn "Fsck found problems, attempting repair." putStrLn "Fsck found problems, attempting repair."
@ -500,7 +500,7 @@ runRepairOf fsckresult removablebranch forced referencerepo g = do
runRepair' :: (Ref -> Bool) -> FsckResults -> Bool -> Maybe FilePath -> Repo -> IO (Bool, [Branch]) runRepair' :: (Ref -> Bool) -> FsckResults -> Bool -> Maybe FilePath -> Repo -> IO (Bool, [Branch])
runRepair' removablebranch fsckresult forced referencerepo g = do runRepair' removablebranch fsckresult forced referencerepo g = do
cleanCorruptObjects fsckresult g cleanCorruptObjects fsckresult g
missing <- findBroken False g missing <- findBroken False False g
stillmissing <- retrieveMissingObjects missing referencerepo g stillmissing <- retrieveMissingObjects missing referencerepo g
case stillmissing of case stillmissing of
FsckFoundMissing s t FsckFoundMissing s t
@ -529,7 +529,7 @@ runRepair' removablebranch fsckresult forced referencerepo g = do
| forced -> ifM (pure (repoIsLocalBare g) <||> checkIndex g) | forced -> ifM (pure (repoIsLocalBare g) <||> checkIndex g)
( do ( do
cleanCorruptObjects FsckFailed g cleanCorruptObjects FsckFailed g
stillmissing' <- findBroken False g stillmissing' <- findBroken False False g
case stillmissing' of case stillmissing' of
FsckFailed -> return (False, []) FsckFailed -> return (False, [])
FsckFoundMissing s t -> forcerepair s t FsckFoundMissing s t -> forcerepair s t
@ -575,7 +575,7 @@ runRepair' removablebranch fsckresult forced referencerepo g = do
-- the repair process. -- the repair process.
if fscktruncated if fscktruncated
then do then do
fsckresult' <- findBroken False g fsckresult' <- findBroken False False g
case fsckresult' of case fsckresult' of
FsckFailed -> do FsckFailed -> do
putStrLn "git fsck is failing" putStrLn "git fsck is failing"
@ -597,7 +597,7 @@ runRepair' removablebranch fsckresult forced referencerepo g = do
removeWhenExistsWith R.removeLink (indexFile g) removeWhenExistsWith R.removeLink (indexFile g)
-- The corrupted index can prevent fsck from finding other -- The corrupted index can prevent fsck from finding other
-- problems, so re-run repair. -- problems, so re-run repair.
fsckresult' <- findBroken False g fsckresult' <- findBroken False False g
result <- runRepairOf fsckresult' removablebranch forced referencerepo g result <- runRepairOf fsckresult' removablebranch forced referencerepo g
putStrLn "Removed the corrupted index file. You should look at what files are present in your working tree and git add them back to the index when appropriate." putStrLn "Removed the corrupted index file. You should look at what files are present in your working tree and git add them back to the index when appropriate."
return result return result

View file

@ -14,13 +14,39 @@ repair. git fsck reported no problems at all. --[[Joey]]
> the fsck exited nonzero for some reason, but did not detect > the fsck exited nonzero for some reason, but did not detect
> any misssing shas. > any misssing shas.
> >
> Reproed on my own laptop, with the family annex. However, > Reproed on my own laptop, with the family annex. This reproduces it about
> it is not fully repoducible, I repeated the same steps > 50% of the time: Clone over ssh; git-annex init;
> again and it didn't happen. Those were: Clone;
> git remote rm origin; git annex schedule here 'fsck self 30m every day at > git remote rm origin; git annex schedule here 'fsck self 30m every day at
> any time'; git annex assistant > any time'; git annex assistant; kill git-annex fsck process
>
> Confirmed that it's getting FsckFailed.
> >
> Hypothesis: Maybe fsck is failing due to some other change > Hypothesis: Maybe fsck is failing due to some other change
> that is being made to the git repo at the same time it's running? > that is being made to the git repo by the assistant
> at the same time it's running?
> I noticed some files being downloaded from the web at the same > I noticed some files being downloaded from the web at the same
> time the failed fsck was running. > time the failed fsck was running.
>
> Fsck output to stdout is empty, stderr is:
>
> missing commit 4da14c19140e4c240358af4518d83661713ab044
>
> Intriguingly, that commit is present. It is a commit on the git-annex
> branch. And fscking again succeeds. So, fsck found a reference to a
> commit object that had not yet been written to disk. This feels like a
> bug in git, because if it were interrupted there the repo would be left
> in a bad state.
>
> Anyway, git-annex verifies that the commit is present, to double-check
> it understood fsck correctly. And it is. So it is not considered a
> problem. But, fsck still exits nonzero because it thinks there was a
> problem. And that's the problem.
>
> Fixed by making git-annex assistant ignore fsck nonzero
> exit status when it does not find any missing objects.
> Since any actual failure that makes fsck do that can't
> be distinguished from a false positive. I left git-annex repair
> unchanged, because if the user knows the repo is badly broken and explictly
> runs it, they would be surprised if it didn't repair.
>
> [[fixed|done]] --[[Joey]]