argh
This commit is contained in:
parent
c85e52fd85
commit
a8d6481c0a
2 changed files with 11 additions and 56 deletions
|
@ -18,34 +18,17 @@ it for 5 years!)
|
||||||
export LANG=C
|
export LANG=C
|
||||||
git-annex adjust --unlock
|
git-annex adjust --unlock
|
||||||
|
|
||||||
What seems to be happening is that catCommit gets:
|
Err... I thought I had reproduced this with something like the above,
|
||||||
|
but now that is not working for me. I get:
|
||||||
|
|
||||||
commitName = Just "F\56515\56489lix"
|
commit 50fedeefa3ece65ed4866fe7a1e0c1fe9cc90d78 (HEAD -> adjusted/master(unlocked))
|
||||||
|
Author: Félix <joeyh@joeyh.name>
|
||||||
|
Date: Fri Sep 22 15:23:18 2023 -0400
|
||||||
|
|
||||||
|
git-annex adjusted branch
|
||||||
|
|
||||||
Which is I think ok, that's a utf-8 surrogate in the filesystem encoding.
|
I've tried several other combinations of locale settings, LANG=C from the
|
||||||
Then that's passed into commitWithMetaData, which sets the environment
|
beginning, etc, and all seem to work ok. I also looked at the values coming
|
||||||
variable to its content. And apparently it fails to be converted back to
|
into git-annex with LANG=C and going out, and it roundtrips unicode fine
|
||||||
the right bytes.
|
even in non-unicode locales.
|
||||||
|
|
||||||
One fix would be to keep it a ByteString all the way though, using
|
|
||||||
`System.Posix.Env.ByteString`. I tried converting all environment in
|
|
||||||
git-annex to use that, but CreateProcess uses String for env, so that is
|
|
||||||
not really possible. Also it's pretty intrusive, and is problimatic for
|
|
||||||
Windows since it would have to decode the ByteString back to String.
|
|
||||||
So while this would be best -- it would ensure that any environment
|
|
||||||
variable that for some reason needs to get set by git-annex would
|
|
||||||
not incur mojibake -- it doesn't seem possible with the current library
|
|
||||||
ecosystem.
|
|
||||||
|
|
||||||
I tried making commitWithMetaData set the env var to a String that
|
|
||||||
had the filesystem encoding applied. Eg `w82s (S.unpack (encodeBS v))`.
|
|
||||||
Interestingly, that failed:
|
|
||||||
|
|
||||||
git-annex: git: recoverEncode: invalid argument (cannot encode character '\195')
|
|
||||||
|
|
||||||
Which looks like the filesystem encoding is being applied after all?
|
|
||||||
And in System.Process.Posix, it does look like it does,
|
|
||||||
withCEnvironment uses withFilePath on the contents of env.
|
|
||||||
|
|
||||||
So huh, why then does the value not roundtrip?
|
|
||||||
"""]]
|
"""]]
|
||||||
|
|
|
@ -1,28 +0,0 @@
|
||||||
[[!comment format=mdwn
|
|
||||||
username="joey"
|
|
||||||
subject="""comment 3"""
|
|
||||||
date="2023-09-22T19:13:32Z"
|
|
||||||
content="""
|
|
||||||
joey@darkstar:~>cat f
|
|
||||||
Félix
|
|
||||||
joey@darkstar:~>cat foo.hs
|
|
||||||
import System.Process
|
|
||||||
import qualified GHC.IO.Encoding as Encoding
|
|
||||||
|
|
||||||
main = do
|
|
||||||
e <- Encoding.getFileSystemEncoding
|
|
||||||
Encoding.setLocaleEncoding e
|
|
||||||
v <- readFile "f"
|
|
||||||
print v
|
|
||||||
(_, _, _, p) <- createProcess (proc "sh" ["-c", "echo test $V"])
|
|
||||||
{ env = Just [("V", v)] }
|
|
||||||
waitForProcess p
|
|
||||||
return ()
|
|
||||||
joey@darkstar:~>LANG=C runghc foo.hs
|
|
||||||
"F\56515\56489lix\n"
|
|
||||||
test Félix
|
|
||||||
|
|
||||||
Interesting! This confirms that "F\56515\56489lix" is the correctly
|
|
||||||
encoded value. And yet here, the environment variable gets set correctly
|
|
||||||
as well, and it round-trips.
|
|
||||||
"""]]
|
|
Loading…
Reference in a new issue