git-annex/doc/bugs/assistant_repair_misfires.mdwn
Joey Hess d2c48404a8
assistant: Avoid unncessary git repository repair
In a situation where git fsck gets confused about a commit that is made
while it's running.

Sponsored-by: Graham Spencer on Patreon
2021-06-30 18:00:16 -04:00

52 lines
2.3 KiB
Markdown

The assistant's git-annex repair sometimes happens when git fsck does not
actually detect any problems.
See [[Git_repos_corrupt_themselves]] of which this was part of the cause,
although the data loss part of that was solved.
I saw this happen on my sister's laptop, in a freshly cloned repo,
with a git-annex version that fixed that data loss.
assistant was set up to fsck and on the very first fsck it started git
repair. git fsck reported no problems at all. --[[Joey]]
> .git/annex/fsckresults/$uuid was empty, which means that
> writeFsckResults was called with FsckFailed. So apparently
> the fsck exited nonzero for some reason, but did not detect
> any misssing shas.
>
> Reproed on my own laptop, with the family annex. This reproduces it about
> 50% of the time: Clone over ssh; git-annex init;
> git remote rm origin; git annex schedule here 'fsck self 30m every day at
> any time'; git annex assistant; kill git-annex fsck process
>
> Confirmed that it's getting FsckFailed.
>
> Hypothesis: Maybe fsck is failing due to some other change
> that is being made to the git repo by the assistant
> at the same time it's running?
> I noticed some files being downloaded from the web at the same
> time the failed fsck was running.
>
> Fsck output to stdout is empty, stderr is:
>
> missing commit 4da14c19140e4c240358af4518d83661713ab044
>
> Intriguingly, that commit is present. It is a commit on the git-annex
> branch. And fscking again succeeds. So, fsck found a reference to a
> commit object that had not yet been written to disk. This feels like a
> bug in git, because if it were interrupted there the repo would be left
> in a bad state.
>
> Anyway, git-annex verifies that the commit is present, to double-check
> it understood fsck correctly. And it is. So it is not considered a
> problem. But, fsck still exits nonzero because it thinks there was a
> problem. And that's the problem.
>
> Fixed by making git-annex assistant ignore fsck nonzero
> exit status when it does not find any missing objects.
> Since any actual failure that makes fsck do that can't
> be distinguished from a false positive. I left git-annex repair
> unchanged, because if the user knows the repo is badly broken and explictly
> runs it, they would be surprised if it didn't repair.
>
> [[fixed|done]] --[[Joey]]