git-annex/Types/UUID.hs
Joey Hess 38e9ea8497
one-way escaping of newlines in uuid.log
A repository can have a newline in its description due to being in a
directory containing a newline, or due to git-annex describe being
passed a string with a newline in it for some reason. Putting that
newline in uuid.log breaks its format.

So, escape the newline when it enters uuid.log, to \n

This is a one-way escaping, it is not converted back to a newline
when reading the log. If it were, commands like git-annex info and
whereis would display a multi-line description, which could be confusing
to read.

And, implementing roundtripping would necessarily cause problems if an
old version of git-annex were used to set a description that contained
whatever special character is used to escape the \n. Eg, a \ or if
it used the ! prefix before base64 data that is used in some other logs,
the ! character. Then the description set by the old git-annex would not
roundtrip.

There just doesn't seem to be any benefit of roundtripping newlines through,
so why bother? And, git often displays \n for newline when a filename
contains a newline, so git-annex doing it in this case seems sorta ok
by analogy to git.

(Some other git-annex logs can also have newlines put into them if the
user really wants to break git-annex. For example:
git-annex config annex.largefiles "foo
bar"
The full list is probably config.log, remote.log, group.log,
preferred-content.log, required-content.log,
group-preferred-content.log, schedule.log. Probably there is no
good reason to use a newline in any of these, and the breakage is
probably limited to the bad data the user put in not coming back out.
And users can write any garbage to log files themselves manually in any
case. So, I am not going to address all of those at this time. If a
problem such as this one with the newline in the repository path comes
up, it can be dealt with on a case by case basis.)

Sponsored-by: Dartmouth College's Datalad project
2023-03-13 14:19:32 -04:00

95 lines
2.3 KiB
Haskell

{- git-annex UUID type
-
- Copyright 2011-2019 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
{-# LANGUAGE TypeSynonymInstances, FlexibleInstances, GeneralizedNewtypeDeriving #-}
module Types.UUID where
import qualified Data.ByteString as B
import qualified Data.Map as M
import qualified Data.UUID as U
import Data.Maybe
import Data.String
import Data.ByteString.Builder
import qualified Data.Semigroup as Sem
import Git.Types (ConfigValue(..))
import Utility.FileSystemEncoding
import Utility.QuickCheck
import qualified Utility.SimpleProtocol as Proto
-- A UUID is either an arbitrary opaque string, or UUID info may be missing.
data UUID = NoUUID | UUID B.ByteString
deriving (Eq, Ord, Show, Read)
class FromUUID a where
fromUUID :: UUID -> a
class ToUUID a where
toUUID :: a -> UUID
instance FromUUID UUID where
fromUUID = id
instance ToUUID UUID where
toUUID = id
instance FromUUID B.ByteString where
fromUUID (UUID u) = u
fromUUID NoUUID = B.empty
instance ToUUID B.ByteString where
toUUID b
| B.null b = NoUUID
| otherwise = UUID b
instance FromUUID String where
fromUUID s = decodeBS (fromUUID s)
instance ToUUID String where
toUUID s = toUUID (encodeBS s)
instance FromUUID ConfigValue where
fromUUID s = (ConfigValue (fromUUID s))
instance ToUUID ConfigValue where
toUUID (ConfigValue v) = toUUID v
toUUID NoConfigValue = NoUUID
-- There is no matching FromUUID U.UUID because a git-annex UUID may
-- be NoUUID or perhaps contain something not allowed in a canonical UUID.
instance ToUUID U.UUID where
toUUID = toUUID . U.toASCIIBytes
buildUUID :: UUID -> Builder
buildUUID (UUID b) = byteString b
buildUUID NoUUID = mempty
isUUID :: String -> Bool
isUUID = isJust . U.fromString
-- A description of a UUID.
newtype UUIDDesc = UUIDDesc B.ByteString
deriving (Eq, Sem.Semigroup, Monoid, IsString)
fromUUIDDesc :: UUIDDesc -> String
fromUUIDDesc (UUIDDesc d) = decodeBS d
toUUIDDesc :: String -> UUIDDesc
toUUIDDesc = UUIDDesc . encodeBS
type UUIDDescMap = M.Map UUID UUIDDesc
instance Proto.Serializable UUID where
serialize = fromUUID
deserialize = Just . toUUID
instance Arbitrary UUID where
arbitrary = frequency [(1, return NoUUID), (3, UUID <$> arb)]
where
arb = encodeBS <$> listOf1 (elements uuidchars)
uuidchars = '-' : ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']