From a8d6481c0a7932a712a36a18edbe82bf526dfb61 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 22 Sep 2023 15:34:30 -0400 Subject: [PATCH] argh --- ..._d28894bc233987f68159e8d1a7a97096._comment | 39 ++++++------------- ..._22810372eecd3c567817623de9eb47c6._comment | 28 ------------- 2 files changed, 11 insertions(+), 56 deletions(-) delete mode 100644 doc/bugs/__34__git_annex_adjust__34___does_not_respect_utf8_in_the_commit_author_field/comment_3_22810372eecd3c567817623de9eb47c6._comment diff --git a/doc/bugs/__34__git_annex_adjust__34___does_not_respect_utf8_in_the_commit_author_field/comment_2_d28894bc233987f68159e8d1a7a97096._comment b/doc/bugs/__34__git_annex_adjust__34___does_not_respect_utf8_in_the_commit_author_field/comment_2_d28894bc233987f68159e8d1a7a97096._comment index f6221211e0..b7842401c6 100644 --- a/doc/bugs/__34__git_annex_adjust__34___does_not_respect_utf8_in_the_commit_author_field/comment_2_d28894bc233987f68159e8d1a7a97096._comment +++ b/doc/bugs/__34__git_annex_adjust__34___does_not_respect_utf8_in_the_commit_author_field/comment_2_d28894bc233987f68159e8d1a7a97096._comment @@ -18,34 +18,17 @@ it for 5 years!) export LANG=C git-annex adjust --unlock -What seems to be happening is that catCommit gets: +Err... I thought I had reproduced this with something like the above, +but now that is not working for me. I get: - commitName = Just "F\56515\56489lix" + commit 50fedeefa3ece65ed4866fe7a1e0c1fe9cc90d78 (HEAD -> adjusted/master(unlocked)) + Author: Félix + Date: Fri Sep 22 15:23:18 2023 -0400 + + git-annex adjusted branch -Which is I think ok, that's a utf-8 surrogate in the filesystem encoding. -Then that's passed into commitWithMetaData, which sets the environment -variable to its content. And apparently it fails to be converted back to -the right bytes. - -One fix would be to keep it a ByteString all the way though, using -`System.Posix.Env.ByteString`. I tried converting all environment in -git-annex to use that, but CreateProcess uses String for env, so that is -not really possible. Also it's pretty intrusive, and is problimatic for -Windows since it would have to decode the ByteString back to String. -So while this would be best -- it would ensure that any environment -variable that for some reason needs to get set by git-annex would -not incur mojibake -- it doesn't seem possible with the current library -ecosystem. - -I tried making commitWithMetaData set the env var to a String that -had the filesystem encoding applied. Eg `w82s (S.unpack (encodeBS v))`. -Interestingly, that failed: - -git-annex: git: recoverEncode: invalid argument (cannot encode character '\195') - -Which looks like the filesystem encoding is being applied after all? -And in System.Process.Posix, it does look like it does, -withCEnvironment uses withFilePath on the contents of env. - -So huh, why then does the value not roundtrip? +I've tried several other combinations of locale settings, LANG=C from the +beginning, etc, and all seem to work ok. I also looked at the values coming +into git-annex with LANG=C and going out, and it roundtrips unicode fine +even in non-unicode locales. """]] diff --git a/doc/bugs/__34__git_annex_adjust__34___does_not_respect_utf8_in_the_commit_author_field/comment_3_22810372eecd3c567817623de9eb47c6._comment b/doc/bugs/__34__git_annex_adjust__34___does_not_respect_utf8_in_the_commit_author_field/comment_3_22810372eecd3c567817623de9eb47c6._comment deleted file mode 100644 index 82df9c673a..0000000000 --- a/doc/bugs/__34__git_annex_adjust__34___does_not_respect_utf8_in_the_commit_author_field/comment_3_22810372eecd3c567817623de9eb47c6._comment +++ /dev/null @@ -1,28 +0,0 @@ -[[!comment format=mdwn - username="joey" - subject="""comment 3""" - date="2023-09-22T19:13:32Z" - content=""" - joey@darkstar:~>cat f - Félix - joey@darkstar:~>cat foo.hs - import System.Process - import qualified GHC.IO.Encoding as Encoding - - main = do - e <- Encoding.getFileSystemEncoding - Encoding.setLocaleEncoding e - v <- readFile "f" - print v - (_, _, _, p) <- createProcess (proc "sh" ["-c", "echo test $V"]) - { env = Just [("V", v)] } - waitForProcess p - return () - joey@darkstar:~>LANG=C runghc foo.hs - "F\56515\56489lix\n" - test Félix - -Interesting! This confirms that "F\56515\56489lix" is the correctly -encoded value. And yet here, the environment variable gets set correctly -as well, and it round-trips. -"""]]