This commit is contained in:
Joey Hess 2023-09-22 15:34:30 -04:00
parent c85e52fd85
commit a8d6481c0a
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 11 additions and 56 deletions

View file

@ -18,34 +18,17 @@ it for 5 years!)
export LANG=C
git-annex adjust --unlock
What seems to be happening is that catCommit gets:
Err... I thought I had reproduced this with something like the above,
but now that is not working for me. I get:
commitName = Just "F\56515\56489lix"
commit 50fedeefa3ece65ed4866fe7a1e0c1fe9cc90d78 (HEAD -> adjusted/master(unlocked))
Author: Félix <joeyh@joeyh.name>
Date: Fri Sep 22 15:23:18 2023 -0400
git-annex adjusted branch
Which is I think ok, that's a utf-8 surrogate in the filesystem encoding.
Then that's passed into commitWithMetaData, which sets the environment
variable to its content. And apparently it fails to be converted back to
the right bytes.
One fix would be to keep it a ByteString all the way though, using
`System.Posix.Env.ByteString`. I tried converting all environment in
git-annex to use that, but CreateProcess uses String for env, so that is
not really possible. Also it's pretty intrusive, and is problimatic for
Windows since it would have to decode the ByteString back to String.
So while this would be best -- it would ensure that any environment
variable that for some reason needs to get set by git-annex would
not incur mojibake -- it doesn't seem possible with the current library
ecosystem.
I tried making commitWithMetaData set the env var to a String that
had the filesystem encoding applied. Eg `w82s (S.unpack (encodeBS v))`.
Interestingly, that failed:
git-annex: git: recoverEncode: invalid argument (cannot encode character '\195')
Which looks like the filesystem encoding is being applied after all?
And in System.Process.Posix, it does look like it does,
withCEnvironment uses withFilePath on the contents of env.
So huh, why then does the value not roundtrip?
I've tried several other combinations of locale settings, LANG=C from the
beginning, etc, and all seem to work ok. I also looked at the values coming
into git-annex with LANG=C and going out, and it roundtrips unicode fine
even in non-unicode locales.
"""]]

View file

@ -1,28 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2023-09-22T19:13:32Z"
content="""
joey@darkstar:~>cat f
Félix
joey@darkstar:~>cat foo.hs
import System.Process
import qualified GHC.IO.Encoding as Encoding
main = do
e <- Encoding.getFileSystemEncoding
Encoding.setLocaleEncoding e
v <- readFile "f"
print v
(_, _, _, p) <- createProcess (proc "sh" ["-c", "echo test $V"])
{ env = Just [("V", v)] }
waitForProcess p
return ()
joey@darkstar:~>LANG=C runghc foo.hs
"F\56515\56489lix\n"
test Félix
Interesting! This confirms that "F\56515\56489lix" is the correctly
encoded value. And yet here, the environment variable gets set correctly
as well, and it round-trips.
"""]]