git-annex/Logs/Activity.hs

{- git-annex activity log
 -
 - Copyright 2015-2019 Joey Hess <id@joeyh.name>
 -
 - Licensed under the GNU AGPL version 3 or higher.
 -}

module Logs.Activity (
	Log,
	Activity(..),
	recordActivity,
	lastActivities,
) where

import Annex.Common
import qualified Annex.Branch
import Logs
import Logs.UUIDBased

import qualified Data.ByteString as S
import qualified Data.Attoparsec.ByteString as A
import Data.ByteString.Builder

data Activity 
	= Fsck
	deriving (Eq, Read, Show, Enum, Bounded)

recordActivity :: Activity -> UUID -> Annex ()
recordActivity act uuid = do
	c <- currentVectorClock
	Annex.Branch.change (Annex.Branch.RegardingUUID [uuid]) activityLog $
		buildLogOld buildActivity
			. changeLog c uuid (Right act)
			. parseLogOld parseActivity

lastActivities :: Maybe Activity -> Annex (Log Activity)
lastActivities wantact = parseLogOld (onlywanted =<< parseActivity)
	<$> Annex.Branch.get activityLog
  where
	onlywanted (Right a) | wanted a = pure a
	onlywanted _ = fail "unwanted activity"
	wanted a = maybe True (a ==) wantact

buildActivity :: Either S.ByteString Activity -> Builder
buildActivity (Right a) = byteString $ encodeBS $ show a
buildActivity (Left b) = byteString b

-- Allow for unknown activities to be added later by preserving them.
parseActivity :: A.Parser (Either S.ByteString Activity)
parseActivity = go <$> A.takeByteString
  where
	go b = maybe (Left b) Right $ readish $ decodeBS b
rethought distributed fsck; instead add activity.log and expire command This is much more space efficient! 2015-04-05 16:50:02 +00:00			`{- git-annex activity log`
			`-`
convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`- Copyright 2015-2019 Joey Hess <id@joeyh.name>`
rethought distributed fsck; instead add activity.log and expire command This is much more space efficient! 2015-04-05 16:50:02 +00:00			`-`
update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.) 2019-03-13 19:48:14 +00:00			`- Licensed under the GNU AGPL version 3 or higher.`
rethought distributed fsck; instead add activity.log and expire command This is much more space efficient! 2015-04-05 16:50:02 +00:00			`-}`

			`module Logs.Activity (`
			`Log,`
			`Activity(..),`
			`recordActivity,`
			`lastActivities,`
			`) where`

remove 163 lines of code without changing anything except imports 2016-01-20 20:36:33 +00:00			`import Annex.Common`
rethought distributed fsck; instead add activity.log and expire command This is much more space efficient! 2015-04-05 16:50:02 +00:00			`import qualified Annex.Branch`
			`import Logs`
			`import Logs.UUIDBased`

convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`import qualified Data.ByteString as S`
			`import qualified Data.Attoparsec.ByteString as A`
convert all per-uuid log files to use Builder Mostly didn't push the ByteStrings down very deep, but all of these log files are not written to frequently at all, so slight remaining innefficiency doesn't matter. In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was last touched by that ancient git-annex version, the descriptions of remotes would appear missing when used with this version of git-annex. That is such minor breakage, and so unlikely to still be a problem for any repos, that it was not worth forward-porting that code to ByteString. 2019-01-09 18:00:35 +00:00			`import Data.ByteString.Builder`

convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`data Activity`
			`= Fsck`
rethought distributed fsck; instead add activity.log and expire command This is much more space efficient! 2015-04-05 16:50:02 +00:00			`deriving (Eq, Read, Show, Enum, Bounded)`

			`recordActivity :: Activity -> UUID -> Annex ()`
			`recordActivity act uuid = do`
generate more compact git-annex branch for imports Especially from borg, where the content identifier logs all end up being the same identical file! But also, for other imports, the location tracking logs can, in some cases, be identical files. Bonus optimisation: Avoid looking up (and parsing when set) GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to. Although the lookup does happen at startup even when no log will be written now. 2020-12-23 19:21:33 +00:00			`c <- currentVectorClock`
start implementing hidden git-annex repositories This adds a separate journal, which does not currently get committed to an index, but is planned to be committed to .git/annex/index-private. Changes that are regarding a UUID that is private will get written to this journal, and so will not be published into the git-annex branch. All log writing should have been made to indicate the UUID it's regarding, though I've not verified this yet. Currently, no UUIDs are treated as private yet, a way to configure that is needed. The implementation is careful to not add any additional IO work when privateUUIDsKnown is False. It will skip looking at the private journal at all. So this should be free, or nearly so, unless the feature is used. When it is used, all branch reads will be about twice as expensive. It is very lucky -- or very prudent design -- that Annex.Branch.change and maybeChange are the only ways to change a file on the branch, and Annex.Branch.set is only internal use. That let Annex.Branch.get always yield any private information that has been recorded, without the risk that Annex.Branch.set might be called, with a non-private UUID, and end up leaking the private information into the git-annex branch. And, this relies on the way git-annex union merges the git-annex branch. When reading a file, there can be a public and a private version, and they are just concacenated together. That will be handled the same as if there were two diverged git-annex branches that got union merged. 2021-04-20 18:32:41 +00:00			`Annex.Branch.change (Annex.Branch.RegardingUUID [uuid]) activityLog $`
renamings to make clean when old-format logs are being used 2019-02-21 17:43:21 +00:00			`buildLogOld buildActivity`
convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`. changeLog c uuid (Right act)`
renamings to make clean when old-format logs are being used 2019-02-21 17:43:21 +00:00			`. parseLogOld parseActivity`
rethought distributed fsck; instead add activity.log and expire command This is much more space efficient! 2015-04-05 16:50:02 +00:00
			`lastActivities :: Maybe Activity -> Annex (Log Activity)`
renamings to make clean when old-format logs are being used 2019-02-21 17:43:21 +00:00			`lastActivities wantact = parseLogOld (onlywanted =<< parseActivity)`
convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`<$> Annex.Branch.get activityLog`
rethought distributed fsck; instead add activity.log and expire command This is much more space efficient! 2015-04-05 16:50:02 +00:00			`where`
convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`onlywanted (Right a) \| wanted a = pure a`
			`onlywanted _ = fail "unwanted activity"`
rethought distributed fsck; instead add activity.log and expire command This is much more space efficient! 2015-04-05 16:50:02 +00:00			`wanted a = maybe True (a ==) wantact`
convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00
			`buildActivity :: Either S.ByteString Activity -> Builder`
			`buildActivity (Right a) = byteString $ encodeBS $ show a`
			`buildActivity (Left b) = byteString b`

			`-- Allow for unknown activities to be added later by preserving them.`
			`parseActivity :: A.Parser (Either S.ByteString Activity)`
			`parseActivity = go <$> A.takeByteString`
			`where`
			`go b = maybe (Left b) Right $ readish $ decodeBS b`