git-annex/Logs/Multicast.hs

{- git-annex multicast fingerprint log
 -
 - Copyright 2017, 2019 Joey Hess <id@joeyh.name>
 -
 - Licensed under the GNU GPL version 3 or higher.
 -}

module Logs.Multicast (
	Fingerprint(..),
	recordFingerprint,
	knownFingerPrints,
) where

import Annex.Common
import qualified Annex.Branch
import Logs
import Logs.UUIDBased

import qualified Data.Map as M
import qualified Data.Attoparsec.ByteString.Lazy as A
import Data.ByteString.Builder

newtype Fingerprint = Fingerprint String
	deriving (Eq, Read, Show)

recordFingerprint :: Fingerprint -> UUID -> Annex ()
recordFingerprint fp uuid = do
	c <- liftIO currentVectorClock
	Annex.Branch.change multicastLog $
		buildLog buildFindgerPrint
			. changeLog c uuid fp
			. parseLog fingerprintParser

knownFingerPrints :: Annex (M.Map UUID Fingerprint)
knownFingerPrints = simpleMap . parseLog fingerprintParser
	<$> Annex.Branch.get activityLog

fingerprintParser :: A.Parser Fingerprint
fingerprintParser = maybe (fail "fingerprint parse failed") pure 
	. readish . decodeBS =<< A.takeByteString

buildFindgerPrint :: Fingerprint -> Builder
buildFindgerPrint = byteString . encodeBS . show
multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting. This commit was supported by the NSF-funded DataLad project. 2017-03-30 23:32:58 +00:00			`{- git-annex multicast fingerprint log`
			`-`
convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`- Copyright 2017, 2019 Joey Hess <id@joeyh.name>`
multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting. This commit was supported by the NSF-funded DataLad project. 2017-03-30 23:32:58 +00:00			`-`
			`- Licensed under the GNU GPL version 3 or higher.`
			`-}`

			`module Logs.Multicast (`
			`Fingerprint(..),`
			`recordFingerprint,`
			`knownFingerPrints,`
			`) where`

			`import Annex.Common`
			`import qualified Annex.Branch`
			`import Logs`
			`import Logs.UUIDBased`

			`import qualified Data.Map as M`
convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`import qualified Data.Attoparsec.ByteString.Lazy as A`
convert all per-uuid log files to use Builder Mostly didn't push the ByteStrings down very deep, but all of these log files are not written to frequently at all, so slight remaining innefficiency doesn't matter. In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was last touched by that ancient git-annex version, the descriptions of remotes would appear missing when used with this version of git-annex. That is such minor breakage, and so unlikely to still be a problem for any repos, that it was not worth forward-porting that code to ByteString. 2019-01-09 18:00:35 +00:00			`import Data.ByteString.Builder`
multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting. This commit was supported by the NSF-funded DataLad project. 2017-03-30 23:32:58 +00:00
			`newtype Fingerprint = Fingerprint String`
			`deriving (Eq, Read, Show)`

			`recordFingerprint :: Fingerprint -> UUID -> Annex ()`
			`recordFingerprint fp uuid = do`
Added GIT_ANNEX_VECTOR_CLOCK environment variable Can be used to override the default timestamps used in log files in the git-annex branch. This is a dangerous environment variable; use with caution. Note that this only affects writing to the logs on the git-annex branch. It is not used for metadata in git commits (other env vars can be set for that). There are many other places where timestamps are still used, that don't get committed to git, but do touch disk. Including regular timestamps of files, and timestamps embedded in some files in .git/annex/, including the last fsck timestamp and timestamps in transfer log files. A good way to find such things in git-annex is to get for getPOSIXTime and getCurrentTime, although some of the results are of course false positives that never hit disk (unless git-annex gets swapped out..) So this commit does NOT necessarily make git-annex comply with some HIPPA privacy regulations; it's up to the user to determine if they can use it in a way compliant with such regulations. Benchmarking: It takes 0.00114 milliseconds to call getEnv "GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log files can be written with an added overhead of only 0.114 seconds. That should be by far swamped by the actual overhead of writing the log files and making the commit containing them. This commit was supported by the NSF-funded DataLad project. 2017-08-14 17:55:38 +00:00			`c <- liftIO currentVectorClock`
multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting. This commit was supported by the NSF-funded DataLad project. 2017-03-30 23:32:58 +00:00			`Annex.Branch.change multicastLog $`
convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`buildLog buildFindgerPrint`
convert all per-uuid log files to use Builder Mostly didn't push the ByteStrings down very deep, but all of these log files are not written to frequently at all, so slight remaining innefficiency doesn't matter. In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was last touched by that ancient git-annex version, the descriptions of remotes would appear missing when used with this version of git-annex. That is such minor breakage, and so unlikely to still be a problem for any repos, that it was not worth forward-porting that code to ByteString. 2019-01-09 18:00:35 +00:00			`. changeLog c uuid fp`
convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`. parseLog fingerprintParser`
multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting. This commit was supported by the NSF-funded DataLad project. 2017-03-30 23:32:58 +00:00
			`knownFingerPrints :: Annex (M.Map UUID Fingerprint)`
convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log. 2019-01-10 18:39:36 +00:00			`knownFingerPrints = simpleMap . parseLog fingerprintParser`
			`<$> Annex.Branch.get activityLog`

			`fingerprintParser :: A.Parser Fingerprint`
			`fingerprintParser = maybe (fail "fingerprint parse failed") pure`
			`. readish . decodeBS =<< A.takeByteString`

			`buildFindgerPrint :: Fingerprint -> Builder`
			`buildFindgerPrint = byteString . encodeBS . show`