record fscked files in fsck db by default

Remember the files that are checked, so a later run with --more will
skip them, without needing to use --incremental.
This commit is contained in:
Joey Hess 2025-03-17 15:34:08 -04:00
parent f775c9643f
commit 2d60ce4803
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 61 additions and 7 deletions

View file

@ -10,6 +10,8 @@ git-annex (10.20250116) UNRELEASED; urgency=medium
* Added OsPath build flag, which speeds up git-annex's operations on files.
* git-lfs: Added an optional apiurl parameter.
(This needs version 1.2.5 of the haskell git-lfs library to be used.)
* fsck: Remember the files that are checked, so a later run with --more
will skip them, without needing to use --incremental.
-- Joey Hess <id@joeyh.name> Mon, 20 Jan 2025 10:24:51 -0400

View file

@ -713,13 +713,12 @@ getStartTime u = do
#endif
data Incremental
= NonIncremental
= NonIncremental (Maybe FsckDb.FsckHandle)
| ScheduleIncremental Duration UUID Incremental
| StartIncremental FsckDb.FsckHandle
| ContIncremental FsckDb.FsckHandle
prepIncremental :: UUID -> Maybe IncrementalOpt -> Annex Incremental
prepIncremental _ Nothing = pure NonIncremental
prepIncremental u (Just StartIncrementalO) = do
recordStartTime u
ifM (FsckDb.newPass u)
@ -734,6 +733,14 @@ prepIncremental u (Just (ScheduleIncrementalO delta)) = do
Nothing -> StartIncrementalO
Just _ -> MoreIncrementalO
return (ScheduleIncremental delta u i)
prepIncremental u Nothing =
ifM (Annex.getRead Annex.fast)
-- Avoid recording fscked files in --fast mode,
-- since that can interfere with a non-fast incremental
-- fsck.
( pure (NonIncremental Nothing)
, (NonIncremental . Just) <$> openFsckDb u
)
cleanupIncremental :: Incremental -> Annex ()
cleanupIncremental (ScheduleIncremental delta u i) = do
@ -757,6 +764,6 @@ openFsckDb u = do
withFsckDb :: Incremental -> (FsckDb.FsckHandle -> Annex ()) -> Annex ()
withFsckDb (ContIncremental h) a = a h
withFsckDb (StartIncremental h) a = a h
withFsckDb NonIncremental _ = noop
withFsckDb (NonIncremental mh) a = maybe noop a mh
withFsckDb (ScheduleIncremental _ _ i) a = withFsckDb i a

View file

@ -37,17 +37,24 @@ better format.
* `--incremental`
Start a new incremental fsck pass. An incremental fsck can be interrupted
at any time, with eg ctrl-c.
Start a new incremental fsck pass, clearing records of all files that
were checked in the previous incremental fsck pass.
* `--more`
Resume the last incremental fsck pass, where it left off.
Skip files that were checked since the last incremental fsck pass
was started.
Note that before `--incremental` is used to start an incremental fsck
pass, files that are checked are still recorded, and using this option
will skip checking those files again.
Resuming may redundantly check some files that were checked
before. Any files that fsck found problems with before will be re-checked
on resume. Also, checkpoints are made every 1000 files or every 5 minutes
during a fsck, and it resumes from the last checkpoint.
during a fsck, and it resumes from the last checkpoint, so if an
incremental fsck is interrupted using eg ctrl-c, it will recheck files
that didn't get into the last checkpoint.
* `--incremental-schedule=time`

View file

@ -9,3 +9,6 @@ I actually don't see much reason to not make use of an incremental fsck either u
On that note: There also does not appear to be a documented method to figure out whether a fsck was interrupted before. You could infer existence and date from the annex internal directory structure but seeing the progress requires manual sql.
Perhaps there could be a `fsck --info` flag for showing both interrupted fsck progress and perhaps also the progress of the current fsck.
> I've implemented the default recording to the fsck database. [[done]]
> --[[Joey]]

View file

@ -0,0 +1,35 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2025-03-17T18:34:20Z"
content="""
I think it could make sense, when --incremental/--more are not passed, to
initialize a new fsck database if there is not already one, and
add each fscked key to the fsck database.
That way, the user could run any combination of fscks, interrupted or not,
and then use --more to fsck only new files. When the user wants to start
a new fsck pass, they would use --incremental.
It would need to avoid recording an incremental fsck pass start time,
to avoid interfering with --incremental-schedule.
The only problem I see with this is, someone might have a long-term
incremental fsck they're running that is doing full checksumming.
If they then do a quick fsck --fast for other reasons, it would
record that every key has been fscked, and so lose their place.
So it seems --fast should disable this new behavior. (Also incremental
--fast fsck is not likely to be very useful anyway.)
> I actually don't see much reason to not make use of an incremental fsck
> either unless it's *really* old
That's a hard judgement call for a program to make... someone might think
10 minutes is really old, and someone else that a month is.
As to figuring out whether a fsck was interrupted before, surely what
matters is you remembering that? All git-annex has is a timestamp when
the last fsck pass started, which is available in
`.git/annex/fsck/*/state`, and a list of the keys that were fscked,
which is not very useful as far as determining the progress of that fsck.
"""]]