Fix setting/setting/viewing metadata that contains unicode or other special characters, when in a non-unicode locale.
Oh boy, not again. So, another place that the filesystem encoding needs to be applied. Yay. In passing, I changed decodeBS so if a NUL is embedded in the input, the resulting FilePath doesn't get truncated at that NUL. This was needed to make prop_b64_roundtrips pass, and on reviewing the callers of decodeBS, I didn't see any where this wouldn't make sense. When a FilePath is used to operate on the filesystem, it'll get truncated at a NUL anyway, whereas if a String is being used for something else, it might conceivably have a NUL in it, and we wouldn't want it to get truncated when going through decodeBS. (NB: There may be a speed impact from this change.)
This commit is contained in:
parent
1e40bfda49
commit
23e9d3bb77
5 changed files with 45 additions and 4 deletions
|
@ -45,7 +45,6 @@ fromTaggedBranch b = case split "/" $ Git.fromRef b of
|
|||
("refs":"synced":u:_base) ->
|
||||
Just (toUUID u, Nothing)
|
||||
_ -> Nothing
|
||||
where
|
||||
|
||||
taggedPush :: UUID -> Maybe String -> Git.Ref -> Remote -> Git.Repo -> IO Bool
|
||||
taggedPush u info branch remote = Git.Command.runBool
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
{- Simple Base64 encoding of Strings
|
||||
-
|
||||
- Note that this uses the FileSystemEncoding, so it can be used on Strings
|
||||
- that repesent filepaths containing arbitrarily encoded characters.
|
||||
-
|
||||
- Copyright 2011, 2015 Joey Hess <id@joeyh.name>
|
||||
-
|
||||
|
@ -9,13 +12,15 @@ module Utility.Base64 (toB64, fromB64Maybe, fromB64, prop_b64_roundtrips) where
|
|||
|
||||
import qualified "sandi" Codec.Binary.Base64 as B64
|
||||
import Data.Maybe
|
||||
import qualified Data.ByteString.Lazy as L
|
||||
import Data.ByteString.UTF8 (fromString, toString)
|
||||
import Utility.FileSystemEncoding
|
||||
|
||||
toB64 :: String -> String
|
||||
toB64 = toString . B64.encode . fromString
|
||||
toB64 = toString . B64.encode . L.toStrict . encodeBS
|
||||
|
||||
fromB64Maybe :: String -> Maybe String
|
||||
fromB64Maybe s = either (const Nothing) (Just . toString)
|
||||
fromB64Maybe s = either (const Nothing) (Just . decodeBS . L.fromStrict)
|
||||
(B64.decode $ fromString s)
|
||||
|
||||
fromB64 :: String -> String
|
||||
|
|
|
@ -13,6 +13,7 @@ module Utility.FileSystemEncoding (
|
|||
withFilePath,
|
||||
md5FilePath,
|
||||
decodeBS,
|
||||
encodeBS,
|
||||
decodeW8,
|
||||
encodeW8,
|
||||
encodeW8NUL,
|
||||
|
@ -81,13 +82,21 @@ md5FilePath = MD5.Str . _encodeFilePath
|
|||
{- Decodes a ByteString into a FilePath, applying the filesystem encoding. -}
|
||||
decodeBS :: L.ByteString -> FilePath
|
||||
#ifndef mingw32_HOST_OS
|
||||
decodeBS = encodeW8 . L.unpack
|
||||
decodeBS = encodeW8NUL . L.unpack
|
||||
#else
|
||||
{- On Windows, we assume that the ByteString is utf-8, since Windows
|
||||
- only uses unicode for filenames. -}
|
||||
decodeBS = L8.toString
|
||||
#endif
|
||||
|
||||
{- Encodes a FilePath into a ByteString, applying the filesystem encoding. -}
|
||||
encodeBS :: FilePath -> L.ByteString
|
||||
#ifndef mingw32_HOST_OS
|
||||
encodeBS = L.pack . decodeW8NUL
|
||||
#else
|
||||
encodeBS = L8.fromString
|
||||
#endif
|
||||
|
||||
{- Converts a [Word8] to a FilePath, encoding using the filesystem encoding.
|
||||
-
|
||||
- w82c produces a String, which may contain Chars that are invalid
|
||||
|
|
2
debian/changelog
vendored
2
debian/changelog
vendored
|
@ -36,6 +36,8 @@ git-annex (5.20150732) UNRELEASED; urgency=medium
|
|||
Thanks, Magnus Therning.
|
||||
* metadata: Fix reversion introduced in 5.20150727 that caused display
|
||||
of metadata to not work.
|
||||
* Fix setting/setting/viewing metadata that contains unicode or other
|
||||
special characters, when in a non-unicode locale.
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Fri, 31 Jul 2015 12:31:39 -0400
|
||||
|
||||
|
|
|
@ -28,3 +28,29 @@ local repository version: 5
|
|||
supported repository version: 5
|
||||
upgrade supported from repository versions: 0 1 2 4
|
||||
"""]]
|
||||
|
||||
> I'm assuming the setlocale part of this is a misconfigured system locale;
|
||||
> as also seen by an arch linux user in
|
||||
> <http://git-annex.branchable.com/bugs/cannot_change_locale___40__en__95__US.UTF-8__41__/>
|
||||
>
|
||||
> So, disregarding that part of the bug report, we still have the actual
|
||||
> failure.
|
||||
>
|
||||
> With LANG=C, setting and getting metadata like "Rondò Veneziano" fails,
|
||||
> as does generating views of that metadata.
|
||||
>
|
||||
> In all cases, it's an IO encoding failure, "commitBuffer: invalid argument (invalid character)"
|
||||
>
|
||||
> This only occurs when there's a space in the metadata; in this case the
|
||||
|
||||
> value is base64ed. While the 'ò' comes back out as "\242", which is the right
|
||||
> character, it's not encoded using the filesystem encoding. This means that
|
||||
> the IO layer can't handle it, when not in a unicode locale. Instead, it
|
||||
> needs to come back out as "\56515\56498".
|
||||
>
|
||||
> Apparently this is a reversion; it worked in an earlier version of
|
||||
> git-annex. Commits such as 9b93278e8abe1163d53fbf56909d0fe6d7de69e9
|
||||
> or the conversion to Sandi may have caused the reversion, unsure.
|
||||
>
|
||||
> Fix is to apply the filesystem encoding when decoding base64ed values.
|
||||
> [[done]] --[[Joey]]
|
||||
|
|
Loading…
Reference in a new issue