Future proof activity log parsing

When the log has an activity that is not known, eg added by a future version of git-annex, it used to be treated as no activity at all, which would make git-annex expire think it should expire the repository, despite it having some kind of recent activity. Hopefully there will be no reason to add a new activity until enough time has passed that this commit is in use everywhere. Sponsored-by: Jake Vosloo on Patreon
2021-06-14 14:18:06 -04:00 · 2021-06-14 14:18:06 -04:00 · 78da00c7a6
commit 78da00c7a6
parent 372ace599a
6 changed files with 89 additions and 12 deletions
--- a/1
+++ b/1
@ -29,6 +29,7 @@ git-annex (8.20210429) UNRELEASED; urgency=medium
    that creates the git-annex branch.
  * Added annex.adviceNoSshCaching config.
  * Added --size-limit option.
+  * Future proof activity log parsing.

 -- Joey Hess <id@joeyh.name>  Mon, 03 May 2021 10:33:10 -0400

--- a/Command/Expire.hs
+++ b/Command/Expire.hs
@ -111,6 +111,6 @@ parseExpire ps = do
 parseActivity :: MonadFail m => String -> m Activity
 parseActivity s = case readish s of
 	Nothing -> Fail.fail $ "Unknown activity. Choose from: " ++ 
-		unwords (map show [minBound..maxBound :: Activity])
+		unwords (map show allActivities)
 	Just v -> return v

--- a/Logs/Activity.hs
+++ b/Logs/Activity.hs
@ -1,6 +1,6 @@
 {- git-annex activity log
 -
- - Copyright 2015-2019 Joey Hess <id@joeyh.name>
+ - Copyright 2015-2021 Joey Hess <id@joeyh.name>
 -
 - Licensed under the GNU AGPL version 3 or higher.
 -}
@ -8,6 +8,7 @@
 module Logs.Activity (
 	Log,
 	Activity(..),
+	allActivities,
 	recordActivity,
 	lastActivities,
 ) where
@ -23,30 +24,38 @@ import Data.ByteString.Builder

 data Activity 
 	= Fsck
-	deriving (Eq, Read, Show, Enum, Bounded)
+	-- Allow for unknown activities to be added later.
+	| UnknownActivity S.ByteString
+	deriving (Eq, Read, Show)

+allActivities :: [Activity]
+allActivities = [Fsck]
+
+-- Record an activity. This takes the place of previously recorded activity
+-- for the UUID.
 recordActivity :: Activity -> UUID -> Annex ()
 recordActivity act uuid = do
 	c <- currentVectorClock
 	Annex.Branch.change (Annex.Branch.RegardingUUID [uuid]) activityLog $
 		buildLogOld buildActivity
-			. changeLog c uuid (Right act)
+			. changeLog c uuid act
 			. parseLogOld parseActivity

+-- Most recent activity for each UUID.
 lastActivities :: Maybe Activity -> Annex (Log Activity)
 lastActivities wantact = parseLogOld (onlywanted =<< parseActivity)
 	<$> Annex.Branch.get activityLog
  where
-	onlywanted (Right a) | wanted a = pure a
-	onlywanted _ = fail "unwanted activity"
+	onlywanted a 
+		| wanted a = pure a
+		| otherwise = fail "unwanted activity"
 	wanted a = maybe True (a ==) wantact

-buildActivity :: Either S.ByteString Activity -> Builder
-buildActivity (Right a) = byteString $ encodeBS $ show a
-buildActivity (Left b) = byteString b
+buildActivity :: Activity -> Builder
+buildActivity (UnknownActivity b) = byteString b
+buildActivity a = byteString $ encodeBS $ show a

-- Allow for unknown activities to be added later by preserving them.
-parseActivity :: A.Parser (Either S.ByteString Activity)
+parseActivity :: A.Parser Activity
 parseActivity = go <$> A.takeByteString
  where
-	go b = maybe (Left b) Right $ readish $ decodeBS b
+	go b = fromMaybe (UnknownActivity b) (readish $ decodeBS b)
--- a/doc/todo/trust_based_on_time_since_last_fsck/comment_1_3805e8dd9e6dd986c097c6f1b78ab244._comment
+++ b/doc/todo/trust_based_on_time_since_last_fsck/comment_1_3805e8dd9e6dd986c097c6f1b78ab244._comment
@ -0,0 +1,32 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 1"""
+ date="2021-06-14T17:14:44Z"
+ content="""
+You can query for repositories that have not been fscked
+for some amount of time:
+
+	git annex expire 10d --no-act --activity=Fsck
+
+From there, it's a simple script to set the unfscked ones to untrusted, or
+whatever.
+
+	| grep '^expire' | awk '{print $2}' | xargs git-annex untrust
+
+I suppose `git-annex expire` could have an option added, like `--untrust`
+to specify *how* to expire, rather than the default of marking the repo
+dead.
+
+I suppose you'd want a way to also go the other way, to stop untrusting a
+repo once it's been fscked.. There is not currently a way to do that.
+
+Note that a fsck that is interrupted does not count as a fsck activity,
+and it's not keeping track of what files were fscked. That would bloat the
+git-annex branch. On the other hand, if you `git annex fsck onefile`
+that counts as a fsck activity, even though other files in the repo didn't get
+fscked. So you would have to limit the ways you use fsck to ones that
+generate the activity you want, perhaps to `git annex fsck --all`. 
+
+Perhaps fsck should also have a way to control whether it records an
+activity or not..
+"""]]
--- a/doc/todo/trust_based_on_time_since_last_fsck/comment_2_ec1b87b389dc06440df04c9a719e0cbc._comment
+++ b/doc/todo/trust_based_on_time_since_last_fsck/comment_2_ec1b87b389dc06440df04c9a719e0cbc._comment
@ -0,0 +1,13 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2021-06-14T17:29:29Z"
+ content="""
+What if `git annex fsck --all` recorded an additional activity, eg FsckAll.
+Then there could be a command, or a config that untrusts repos that do not
+have a FsckAll activity that happened recently enough.
+
+A git config would be simplest, eg:
+
+	git config annex.untrustLastFscked 10d
+"""]]
--- a/doc/todo/trust_based_on_time_since_last_fsck/comment_3_23f37b9d8b877b829e34e6c8ea6b40c4._comment
+++ b/doc/todo/trust_based_on_time_since_last_fsck/comment_3_23f37b9d8b877b829e34e6c8ea6b40c4._comment
@ -0,0 +1,22 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 3"""
+ date="2021-06-14T17:56:23Z"
+ content="""
+Tried to implement this, but ran into a problem adding FsckAll:
+If it only logs FsckAll and not also Fsck, then old git-annex expire
+will see the FsckAll and not understand it, and treats it as no activity,
+so expires. (I did fix git-annex now so an unknown activity is not treated
+as no activity.)
+
+And, the way recordActivity is implemented, it
+removes previous activities, and adds the current activity. So a FsckAll
+followed by a Fsck would remove the FsckAll activity.
+
+That could be fixed, and both be logged, but old git-annex would probably
+not be able to parse the result. And if old git-annex is then used to do a
+fsck, it would log Fsck and remove the previously added FsckAll.
+
+So, it seems this will need to use some log other than activity.log
+to keep track of fsck --all.
+"""]]