argh
This commit is contained in:
parent
c85e52fd85
commit
a8d6481c0a
2 changed files with 11 additions and 56 deletions
|
@ -18,34 +18,17 @@ it for 5 years!)
|
|||
export LANG=C
|
||||
git-annex adjust --unlock
|
||||
|
||||
What seems to be happening is that catCommit gets:
|
||||
Err... I thought I had reproduced this with something like the above,
|
||||
but now that is not working for me. I get:
|
||||
|
||||
commitName = Just "F\56515\56489lix"
|
||||
commit 50fedeefa3ece65ed4866fe7a1e0c1fe9cc90d78 (HEAD -> adjusted/master(unlocked))
|
||||
Author: Félix <joeyh@joeyh.name>
|
||||
Date: Fri Sep 22 15:23:18 2023 -0400
|
||||
|
||||
Which is I think ok, that's a utf-8 surrogate in the filesystem encoding.
|
||||
Then that's passed into commitWithMetaData, which sets the environment
|
||||
variable to its content. And apparently it fails to be converted back to
|
||||
the right bytes.
|
||||
git-annex adjusted branch
|
||||
|
||||
One fix would be to keep it a ByteString all the way though, using
|
||||
`System.Posix.Env.ByteString`. I tried converting all environment in
|
||||
git-annex to use that, but CreateProcess uses String for env, so that is
|
||||
not really possible. Also it's pretty intrusive, and is problimatic for
|
||||
Windows since it would have to decode the ByteString back to String.
|
||||
So while this would be best -- it would ensure that any environment
|
||||
variable that for some reason needs to get set by git-annex would
|
||||
not incur mojibake -- it doesn't seem possible with the current library
|
||||
ecosystem.
|
||||
|
||||
I tried making commitWithMetaData set the env var to a String that
|
||||
had the filesystem encoding applied. Eg `w82s (S.unpack (encodeBS v))`.
|
||||
Interestingly, that failed:
|
||||
|
||||
git-annex: git: recoverEncode: invalid argument (cannot encode character '\195')
|
||||
|
||||
Which looks like the filesystem encoding is being applied after all?
|
||||
And in System.Process.Posix, it does look like it does,
|
||||
withCEnvironment uses withFilePath on the contents of env.
|
||||
|
||||
So huh, why then does the value not roundtrip?
|
||||
I've tried several other combinations of locale settings, LANG=C from the
|
||||
beginning, etc, and all seem to work ok. I also looked at the values coming
|
||||
into git-annex with LANG=C and going out, and it roundtrips unicode fine
|
||||
even in non-unicode locales.
|
||||
"""]]
|
||||
|
|
|
@ -1,28 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2023-09-22T19:13:32Z"
|
||||
content="""
|
||||
joey@darkstar:~>cat f
|
||||
Félix
|
||||
joey@darkstar:~>cat foo.hs
|
||||
import System.Process
|
||||
import qualified GHC.IO.Encoding as Encoding
|
||||
|
||||
main = do
|
||||
e <- Encoding.getFileSystemEncoding
|
||||
Encoding.setLocaleEncoding e
|
||||
v <- readFile "f"
|
||||
print v
|
||||
(_, _, _, p) <- createProcess (proc "sh" ["-c", "echo test $V"])
|
||||
{ env = Just [("V", v)] }
|
||||
waitForProcess p
|
||||
return ()
|
||||
joey@darkstar:~>LANG=C runghc foo.hs
|
||||
"F\56515\56489lix\n"
|
||||
test Félix
|
||||
|
||||
Interesting! This confirms that "F\56515\56489lix" is the correctly
|
||||
encoded value. And yet here, the environment variable gets set correctly
|
||||
as well, and it round-trips.
|
||||
"""]]
|
Loading…
Reference in a new issue