From 2d60ce48037393d2e972b3edac52d65b4b326c25 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 17 Mar 2025 15:34:08 -0400 Subject: [PATCH] record fscked files in fsck db by default Remember the files that are checked, so a later run with --more will skip them, without needing to use --incremental. --- CHANGELOG | 2 ++ Command/Fsck.hs | 13 +++++-- doc/git-annex-fsck.mdwn | 15 +++++--- doc/todo/Incremental_fsck_by_default.mdwn | 3 ++ ..._5f35afc17e865899f72a62bff8ff30e9._comment | 35 +++++++++++++++++++ 5 files changed, 61 insertions(+), 7 deletions(-) create mode 100644 doc/todo/Incremental_fsck_by_default/comment_1_5f35afc17e865899f72a62bff8ff30e9._comment diff --git a/CHANGELOG b/CHANGELOG index 8c944a4bfb..83df038ec3 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -10,6 +10,8 @@ git-annex (10.20250116) UNRELEASED; urgency=medium * Added OsPath build flag, which speeds up git-annex's operations on files. * git-lfs: Added an optional apiurl parameter. (This needs version 1.2.5 of the haskell git-lfs library to be used.) + * fsck: Remember the files that are checked, so a later run with --more + will skip them, without needing to use --incremental. -- Joey Hess Mon, 20 Jan 2025 10:24:51 -0400 diff --git a/Command/Fsck.hs b/Command/Fsck.hs index 4e66755c02..a6b6e54875 100644 --- a/Command/Fsck.hs +++ b/Command/Fsck.hs @@ -713,13 +713,12 @@ getStartTime u = do #endif data Incremental - = NonIncremental + = NonIncremental (Maybe FsckDb.FsckHandle) | ScheduleIncremental Duration UUID Incremental | StartIncremental FsckDb.FsckHandle | ContIncremental FsckDb.FsckHandle prepIncremental :: UUID -> Maybe IncrementalOpt -> Annex Incremental -prepIncremental _ Nothing = pure NonIncremental prepIncremental u (Just StartIncrementalO) = do recordStartTime u ifM (FsckDb.newPass u) @@ -734,6 +733,14 @@ prepIncremental u (Just (ScheduleIncrementalO delta)) = do Nothing -> StartIncrementalO Just _ -> MoreIncrementalO return (ScheduleIncremental delta u i) +prepIncremental u Nothing = + ifM (Annex.getRead Annex.fast) + -- Avoid recording fscked files in --fast mode, + -- since that can interfere with a non-fast incremental + -- fsck. + ( pure (NonIncremental Nothing) + , (NonIncremental . Just) <$> openFsckDb u + ) cleanupIncremental :: Incremental -> Annex () cleanupIncremental (ScheduleIncremental delta u i) = do @@ -757,6 +764,6 @@ openFsckDb u = do withFsckDb :: Incremental -> (FsckDb.FsckHandle -> Annex ()) -> Annex () withFsckDb (ContIncremental h) a = a h withFsckDb (StartIncremental h) a = a h -withFsckDb NonIncremental _ = noop +withFsckDb (NonIncremental mh) a = maybe noop a mh withFsckDb (ScheduleIncremental _ _ i) a = withFsckDb i a diff --git a/doc/git-annex-fsck.mdwn b/doc/git-annex-fsck.mdwn index 4083ba4bf1..89760119d8 100644 --- a/doc/git-annex-fsck.mdwn +++ b/doc/git-annex-fsck.mdwn @@ -37,17 +37,24 @@ better format. * `--incremental` - Start a new incremental fsck pass. An incremental fsck can be interrupted - at any time, with eg ctrl-c. + Start a new incremental fsck pass, clearing records of all files that + were checked in the previous incremental fsck pass. * `--more` - Resume the last incremental fsck pass, where it left off. + Skip files that were checked since the last incremental fsck pass + was started. + + Note that before `--incremental` is used to start an incremental fsck + pass, files that are checked are still recorded, and using this option + will skip checking those files again. Resuming may redundantly check some files that were checked before. Any files that fsck found problems with before will be re-checked on resume. Also, checkpoints are made every 1000 files or every 5 minutes - during a fsck, and it resumes from the last checkpoint. + during a fsck, and it resumes from the last checkpoint, so if an + incremental fsck is interrupted using eg ctrl-c, it will recheck files + that didn't get into the last checkpoint. * `--incremental-schedule=time` diff --git a/doc/todo/Incremental_fsck_by_default.mdwn b/doc/todo/Incremental_fsck_by_default.mdwn index f662549e63..169e02c6be 100644 --- a/doc/todo/Incremental_fsck_by_default.mdwn +++ b/doc/todo/Incremental_fsck_by_default.mdwn @@ -9,3 +9,6 @@ I actually don't see much reason to not make use of an incremental fsck either u On that note: There also does not appear to be a documented method to figure out whether a fsck was interrupted before. You could infer existence and date from the annex internal directory structure but seeing the progress requires manual sql. Perhaps there could be a `fsck --info` flag for showing both interrupted fsck progress and perhaps also the progress of the current fsck. + +> I've implemented the default recording to the fsck database. [[done]] +> --[[Joey]] diff --git a/doc/todo/Incremental_fsck_by_default/comment_1_5f35afc17e865899f72a62bff8ff30e9._comment b/doc/todo/Incremental_fsck_by_default/comment_1_5f35afc17e865899f72a62bff8ff30e9._comment new file mode 100644 index 0000000000..2cfeabf04c --- /dev/null +++ b/doc/todo/Incremental_fsck_by_default/comment_1_5f35afc17e865899f72a62bff8ff30e9._comment @@ -0,0 +1,35 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-03-17T18:34:20Z" + content=""" +I think it could make sense, when --incremental/--more are not passed, to +initialize a new fsck database if there is not already one, and +add each fscked key to the fsck database. + +That way, the user could run any combination of fscks, interrupted or not, +and then use --more to fsck only new files. When the user wants to start +a new fsck pass, they would use --incremental. + +It would need to avoid recording an incremental fsck pass start time, +to avoid interfering with --incremental-schedule. + +The only problem I see with this is, someone might have a long-term +incremental fsck they're running that is doing full checksumming. +If they then do a quick fsck --fast for other reasons, it would +record that every key has been fscked, and so lose their place. +So it seems --fast should disable this new behavior. (Also incremental +--fast fsck is not likely to be very useful anyway.) + +> I actually don't see much reason to not make use of an incremental fsck +> either unless it's *really* old + +That's a hard judgement call for a program to make... someone might think +10 minutes is really old, and someone else that a month is. + +As to figuring out whether a fsck was interrupted before, surely what +matters is you remembering that? All git-annex has is a timestamp when +the last fsck pass started, which is available in +`.git/annex/fsck/*/state`, and a list of the keys that were fscked, +which is not very useful as far as determining the progress of that fsck. +"""]]