Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
2018-04-16 19:42:45 +00:00
|
|
|
{- GHC File system encoding support for Aeson.
|
|
|
|
-
|
|
|
|
- Import instead of Data.Aeson
|
|
|
|
-
|
2019-01-07 16:29:25 +00:00
|
|
|
- Copyright 2018-2019 Joey Hess <id@joeyh.name>
|
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
2018-04-16 19:42:45 +00:00
|
|
|
-
|
|
|
|
- License: BSD-2-clause
|
|
|
|
-}
|
|
|
|
|
|
|
|
{-# LANGUAGE FlexibleInstances, TypeSynonymInstances #-}
|
|
|
|
|
|
|
|
module Utility.Aeson (
|
|
|
|
module X,
|
|
|
|
ToJSON'(..),
|
|
|
|
encode,
|
|
|
|
packString,
|
2019-01-07 16:29:25 +00:00
|
|
|
packByteString,
|
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
2018-04-16 19:42:45 +00:00
|
|
|
) where
|
|
|
|
|
|
|
|
import Data.Aeson as X hiding (ToJSON, toJSON, encode)
|
|
|
|
import Data.Aeson hiding (encode)
|
|
|
|
import qualified Data.Aeson
|
|
|
|
import qualified Data.Text as T
|
|
|
|
import qualified Data.Text.Encoding as T
|
|
|
|
import qualified Data.ByteString.Lazy as L
|
2019-01-07 16:29:25 +00:00
|
|
|
import qualified Data.ByteString as S
|
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
2018-04-16 19:42:45 +00:00
|
|
|
import qualified Data.Set
|
|
|
|
import qualified Data.Vector
|
|
|
|
import Prelude
|
|
|
|
|
|
|
|
import Utility.FileSystemEncoding
|
|
|
|
|
|
|
|
-- | Use this instead of Data.Aeson.encode to make sure that the
|
|
|
|
-- below String instance is used.
|
|
|
|
encode :: ToJSON' a => a -> L.ByteString
|
|
|
|
encode = Data.Aeson.encode . toJSON'
|
|
|
|
|
|
|
|
-- | Aeson has an unfortunate ToJSON instance for Char and [Char]
|
|
|
|
-- which does not support Strings containing UTF8 characters
|
|
|
|
-- encoded using the filesystem encoding when run in a non-utf8 locale.
|
|
|
|
--
|
|
|
|
-- Since we can't replace that with a instance that does the right
|
|
|
|
-- thing, instead here's a new class that handles String right.
|
|
|
|
class ToJSON' a where
|
|
|
|
toJSON' :: a -> Value
|
|
|
|
|
2019-01-07 18:18:24 +00:00
|
|
|
instance ToJSON' T.Text where
|
|
|
|
toJSON' = toJSON
|
|
|
|
|
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
2018-04-16 19:42:45 +00:00
|
|
|
instance ToJSON' String where
|
|
|
|
toJSON' = toJSON . packString
|
|
|
|
|
2019-01-07 16:29:25 +00:00
|
|
|
-- | Aeson does not have a ToJSON instance for ByteString;
|
|
|
|
-- this one assumes that the ByteString contains text, and will
|
|
|
|
-- have the same effect as toJSON' . decodeBS, but with a more efficient
|
|
|
|
-- implementation.
|
|
|
|
instance ToJSON' S.ByteString where
|
|
|
|
toJSON' = toJSON . packByteString
|
|
|
|
|
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
2018-04-16 19:42:45 +00:00
|
|
|
-- | Pack a String to Text, correctly handling the filesystem encoding.
|
|
|
|
--
|
|
|
|
-- Use this instead of Data.Text.pack.
|
|
|
|
--
|
|
|
|
-- Note that if the string contains invalid UTF8 characters not using
|
|
|
|
-- the FileSystemEncoding, this is the same as Data.Text.pack.
|
|
|
|
packString :: String -> T.Text
|
2019-01-01 18:54:06 +00:00
|
|
|
packString s = case T.decodeUtf8' (encodeBS s) of
|
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
2018-04-16 19:42:45 +00:00
|
|
|
Right t -> t
|
|
|
|
Left _ -> T.pack s
|
|
|
|
|
2019-01-07 16:29:25 +00:00
|
|
|
-- | The same as packString . decodeBS, but more efficient in the usual
|
|
|
|
-- case.
|
|
|
|
packByteString :: S.ByteString -> T.Text
|
|
|
|
packByteString b = case T.decodeUtf8' b of
|
|
|
|
Right t -> t
|
|
|
|
Left _ -> T.pack (decodeBS b)
|
|
|
|
|
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
2018-04-16 19:42:45 +00:00
|
|
|
-- | An instance for lists cannot be included as it would overlap with
|
|
|
|
-- the String instance. Instead, you can use a Vector.
|
|
|
|
instance ToJSON' s => ToJSON' (Data.Vector.Vector s) where
|
|
|
|
toJSON' = toJSON . map toJSON' . Data.Vector.toList
|
|
|
|
|
|
|
|
-- Aeson generates the same JSON for a Set as for a list.
|
|
|
|
instance ToJSON' s => ToJSON' (Data.Set.Set s) where
|
|
|
|
toJSON' = toJSON . map toJSON' . Data.Set.toList
|
|
|
|
|
|
|
|
instance (ToJSON' a, ToJSON a) => ToJSON' (Maybe a) where
|
|
|
|
toJSON' (Just a) = toJSON (Just (toJSON' a))
|
|
|
|
toJSON' v@Nothing = toJSON v
|
|
|
|
|
|
|
|
instance (ToJSON' a, ToJSON a, ToJSON' b, ToJSON b) => ToJSON' (a, b) where
|
|
|
|
toJSON' (a, b) = toJSON ((toJSON' a, toJSON' b))
|
|
|
|
|
|
|
|
instance ToJSON' Bool where
|
|
|
|
toJSON' = toJSON
|
|
|
|
|
|
|
|
instance ToJSON' Integer where
|
|
|
|
toJSON' = toJSON
|
|
|
|
|
|
|
|
instance ToJSON' Object where
|
|
|
|
toJSON' = toJSON
|
|
|
|
|
|
|
|
instance ToJSON' Value where
|
|
|
|
toJSON' = toJSON
|