Future proof activity log parsing

When the log has an activity that is not known, eg added by a future
version of git-annex, it used to be treated as no activity at all,
which would make git-annex expire think it should expire the repository,
despite it having some kind of recent activity.

Hopefully there will be no reason to add a new activity until enough
time has passed that this commit is in use everywhere.

Sponsored-by: Jake Vosloo on Patreon
This commit is contained in:
Joey Hess 2021-06-14 14:18:06 -04:00
parent 372ace599a
commit 78da00c7a6
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 89 additions and 12 deletions

View file

@ -29,6 +29,7 @@ git-annex (8.20210429) UNRELEASED; urgency=medium
that creates the git-annex branch.
* Added annex.adviceNoSshCaching config.
* Added --size-limit option.
* Future proof activity log parsing.
-- Joey Hess <id@joeyh.name> Mon, 03 May 2021 10:33:10 -0400

View file

@ -111,6 +111,6 @@ parseExpire ps = do
parseActivity :: MonadFail m => String -> m Activity
parseActivity s = case readish s of
Nothing -> Fail.fail $ "Unknown activity. Choose from: " ++
unwords (map show [minBound..maxBound :: Activity])
unwords (map show allActivities)
Just v -> return v

View file

@ -1,6 +1,6 @@
{- git-annex activity log
-
- Copyright 2015-2019 Joey Hess <id@joeyh.name>
- Copyright 2015-2021 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -8,6 +8,7 @@
module Logs.Activity (
Log,
Activity(..),
allActivities,
recordActivity,
lastActivities,
) where
@ -23,30 +24,38 @@ import Data.ByteString.Builder
data Activity
= Fsck
deriving (Eq, Read, Show, Enum, Bounded)
-- Allow for unknown activities to be added later.
| UnknownActivity S.ByteString
deriving (Eq, Read, Show)
allActivities :: [Activity]
allActivities = [Fsck]
-- Record an activity. This takes the place of previously recorded activity
-- for the UUID.
recordActivity :: Activity -> UUID -> Annex ()
recordActivity act uuid = do
c <- currentVectorClock
Annex.Branch.change (Annex.Branch.RegardingUUID [uuid]) activityLog $
buildLogOld buildActivity
. changeLog c uuid (Right act)
. changeLog c uuid act
. parseLogOld parseActivity
-- Most recent activity for each UUID.
lastActivities :: Maybe Activity -> Annex (Log Activity)
lastActivities wantact = parseLogOld (onlywanted =<< parseActivity)
<$> Annex.Branch.get activityLog
where
onlywanted (Right a) | wanted a = pure a
onlywanted _ = fail "unwanted activity"
onlywanted a
| wanted a = pure a
| otherwise = fail "unwanted activity"
wanted a = maybe True (a ==) wantact
buildActivity :: Either S.ByteString Activity -> Builder
buildActivity (Right a) = byteString $ encodeBS $ show a
buildActivity (Left b) = byteString b
buildActivity :: Activity -> Builder
buildActivity (UnknownActivity b) = byteString b
buildActivity a = byteString $ encodeBS $ show a
-- Allow for unknown activities to be added later by preserving them.
parseActivity :: A.Parser (Either S.ByteString Activity)
parseActivity :: A.Parser Activity
parseActivity = go <$> A.takeByteString
where
go b = maybe (Left b) Right $ readish $ decodeBS b
go b = fromMaybe (UnknownActivity b) (readish $ decodeBS b)

View file

@ -0,0 +1,32 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-06-14T17:14:44Z"
content="""
You can query for repositories that have not been fscked
for some amount of time:
git annex expire 10d --no-act --activity=Fsck
From there, it's a simple script to set the unfscked ones to untrusted, or
whatever.
| grep '^expire' | awk '{print $2}' | xargs git-annex untrust
I suppose `git-annex expire` could have an option added, like `--untrust`
to specify *how* to expire, rather than the default of marking the repo
dead.
I suppose you'd want a way to also go the other way, to stop untrusting a
repo once it's been fscked.. There is not currently a way to do that.
Note that a fsck that is interrupted does not count as a fsck activity,
and it's not keeping track of what files were fscked. That would bloat the
git-annex branch. On the other hand, if you `git annex fsck onefile`
that counts as a fsck activity, even though other files in the repo didn't get
fscked. So you would have to limit the ways you use fsck to ones that
generate the activity you want, perhaps to `git annex fsck --all`.
Perhaps fsck should also have a way to control whether it records an
activity or not..
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2021-06-14T17:29:29Z"
content="""
What if `git annex fsck --all` recorded an additional activity, eg FsckAll.
Then there could be a command, or a config that untrusts repos that do not
have a FsckAll activity that happened recently enough.
A git config would be simplest, eg:
git config annex.untrustLastFscked 10d
"""]]

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2021-06-14T17:56:23Z"
content="""
Tried to implement this, but ran into a problem adding FsckAll:
If it only logs FsckAll and not also Fsck, then old git-annex expire
will see the FsckAll and not understand it, and treats it as no activity,
so expires. (I did fix git-annex now so an unknown activity is not treated
as no activity.)
And, the way recordActivity is implemented, it
removes previous activities, and adds the current activity. So a FsckAll
followed by a Fsck would remove the FsckAll activity.
That could be fixed, and both be logged, but old git-annex would probably
not be able to parse the result. And if old git-annex is then used to do a
fsck, it would log Fsck and remove the previously added FsckAll.
So, it seems this will need to use some log other than activity.log
to keep track of fsck --all.
"""]]