2011-12-13 19:22:43 +00:00
|
|
|
{- Some git commands output encoded filenames, in a rather annoyingly complex
|
|
|
|
- C-style encoding.
|
|
|
|
-
|
2015-01-21 16:50:09 +00:00
|
|
|
- Copyright 2010, 2011 Joey Hess <id@joeyh.name>
|
2011-12-13 19:22:43 +00:00
|
|
|
-
|
|
|
|
- Licensed under the GNU GPL version 3 or higher.
|
|
|
|
-}
|
|
|
|
|
|
|
|
module Git.Filename where
|
|
|
|
|
fix failing quickcheck properties
QuickCheck 2.10 found a counterexample eg "\929184" broke the property.
As far as I can tell, Git.Filename is matching how git handles encoding
of strange high unicode characters in filenames for display. Git does
not display high unicode characters, and instead displays the C-style
escaped form of each byte. This is ambiguous, but since git is not
unicode aware, it doesn't need to roundtrip parse it.
So, making Git.FileName's roundtrip test only chars < 256 seems fine.
Utility.Format.format uses encode_c, in order to mimic git, so that's
ok.
Utility.Format.gen uses decode_c, but only so that stuff like "\n"
in the format string is handled. If the format string contains C-style
octal escapes, they will be converted to ascii characters, and not
combined into unicode characters, but that should not be a problem.
If the user wants unicode characters, they can include them in the
format string, without escaping them.
Finally, decode_c is used by Utility.Gpg.secretKeys, because gpg
--with-colons hex-escapes some characters in particular ':' and '\\'.
gpg passes unicode through, so this use of decode_c is not a problem.
This commit was sponsored by Henrik Riomar on Patreon.
2017-06-17 20:17:09 +00:00
|
|
|
import Common
|
2011-12-23 00:14:35 +00:00
|
|
|
import Utility.Format (decode_c, encode_c)
|
2011-12-13 19:22:43 +00:00
|
|
|
|
fix failing quickcheck properties
QuickCheck 2.10 found a counterexample eg "\929184" broke the property.
As far as I can tell, Git.Filename is matching how git handles encoding
of strange high unicode characters in filenames for display. Git does
not display high unicode characters, and instead displays the C-style
escaped form of each byte. This is ambiguous, but since git is not
unicode aware, it doesn't need to roundtrip parse it.
So, making Git.FileName's roundtrip test only chars < 256 seems fine.
Utility.Format.format uses encode_c, in order to mimic git, so that's
ok.
Utility.Format.gen uses decode_c, but only so that stuff like "\n"
in the format string is handled. If the format string contains C-style
octal escapes, they will be converted to ascii characters, and not
combined into unicode characters, but that should not be a problem.
If the user wants unicode characters, they can include them in the
format string, without escaping them.
Finally, decode_c is used by Utility.Gpg.secretKeys, because gpg
--with-colons hex-escapes some characters in particular ':' and '\\'.
gpg passes unicode through, so this use of decode_c is not a problem.
This commit was sponsored by Henrik Riomar on Patreon.
2017-06-17 20:17:09 +00:00
|
|
|
import Data.Char
|
2011-12-20 18:37:53 +00:00
|
|
|
|
2011-12-13 19:22:43 +00:00
|
|
|
decode :: String -> FilePath
|
|
|
|
decode [] = []
|
|
|
|
decode f@(c:s)
|
|
|
|
-- encoded strings will be inside double quotes
|
2011-12-23 00:14:35 +00:00
|
|
|
| c == '"' && end s == ['"'] = decode_c $ beginning s
|
2011-12-13 19:22:43 +00:00
|
|
|
| otherwise = f
|
|
|
|
|
|
|
|
{- Should not need to use this, except for testing decode. -}
|
|
|
|
encode :: FilePath -> String
|
2011-12-23 00:14:35 +00:00
|
|
|
encode s = "\"" ++ encode_c s ++ "\""
|
2011-12-13 19:22:43 +00:00
|
|
|
|
fix failing quickcheck properties
QuickCheck 2.10 found a counterexample eg "\929184" broke the property.
As far as I can tell, Git.Filename is matching how git handles encoding
of strange high unicode characters in filenames for display. Git does
not display high unicode characters, and instead displays the C-style
escaped form of each byte. This is ambiguous, but since git is not
unicode aware, it doesn't need to roundtrip parse it.
So, making Git.FileName's roundtrip test only chars < 256 seems fine.
Utility.Format.format uses encode_c, in order to mimic git, so that's
ok.
Utility.Format.gen uses decode_c, but only so that stuff like "\n"
in the format string is handled. If the format string contains C-style
octal escapes, they will be converted to ascii characters, and not
combined into unicode characters, but that should not be a problem.
If the user wants unicode characters, they can include them in the
format string, without escaping them.
Finally, decode_c is used by Utility.Gpg.secretKeys, because gpg
--with-colons hex-escapes some characters in particular ':' and '\\'.
gpg passes unicode through, so this use of decode_c is not a problem.
This commit was sponsored by Henrik Riomar on Patreon.
2017-06-17 20:17:09 +00:00
|
|
|
{- For quickcheck.
|
|
|
|
-
|
|
|
|
- See comment on Utility.Format.prop_encode_c_decode_c_roundtrip for
|
|
|
|
- why this only tests chars < 256 -}
|
|
|
|
prop_encode_decode_roundtrip :: String -> Bool
|
|
|
|
prop_encode_decode_roundtrip s = s' == decode (encode s')
|
|
|
|
where
|
|
|
|
s' = filter (\c -> ord c < 256) s
|