record fscked files in fsck db by default

Remember the files that are checked, so a later run with --more will
skip them, without needing to use --incremental.
This commit is contained in:
Joey Hess 2025-03-17 15:34:08 -04:00
parent f775c9643f
commit 2d60ce4803
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 61 additions and 7 deletions

View file

@ -9,3 +9,6 @@ I actually don't see much reason to not make use of an incremental fsck either u
On that note: There also does not appear to be a documented method to figure out whether a fsck was interrupted before. You could infer existence and date from the annex internal directory structure but seeing the progress requires manual sql.
Perhaps there could be a `fsck --info` flag for showing both interrupted fsck progress and perhaps also the progress of the current fsck.
> I've implemented the default recording to the fsck database. [[done]]
> --[[Joey]]

View file

@ -0,0 +1,35 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2025-03-17T18:34:20Z"
content="""
I think it could make sense, when --incremental/--more are not passed, to
initialize a new fsck database if there is not already one, and
add each fscked key to the fsck database.
That way, the user could run any combination of fscks, interrupted or not,
and then use --more to fsck only new files. When the user wants to start
a new fsck pass, they would use --incremental.
It would need to avoid recording an incremental fsck pass start time,
to avoid interfering with --incremental-schedule.
The only problem I see with this is, someone might have a long-term
incremental fsck they're running that is doing full checksumming.
If they then do a quick fsck --fast for other reasons, it would
record that every key has been fscked, and so lose their place.
So it seems --fast should disable this new behavior. (Also incremental
--fast fsck is not likely to be very useful anyway.)
> I actually don't see much reason to not make use of an incremental fsck
> either unless it's *really* old
That's a hard judgement call for a program to make... someone might think
10 minutes is really old, and someone else that a month is.
As to figuring out whether a fsck was interrupted before, surely what
matters is you remembering that? All git-annex has is a timestamp when
the last fsck pass started, which is available in
`.git/annex/fsck/*/state`, and a list of the keys that were fscked,
which is not very useful as far as determining the progress of that fsck.
"""]]