26544de946
Haskell's IO layer crashes on characters > 255 when in a non-unicode (latin1) locale. Until Haskell gets better behavior, put in an admittedly ugly workaround for that: git-annex forces utf8 output mode no matter what locale is selected. So if you use a non-utf8 locale, your filenames with characters > 127 will not be displayed as you'd expect. But at least it won't crash.
41 lines
1.4 KiB
Markdown
41 lines
1.4 KiB
Markdown
Try unsetting LANG and passing git-annex unicode filenames.
|
|
|
|
joey@gnu:~/tmp/aa>git annex add ./Üa
|
|
add add add add git-annex: <stdout>: commitAndReleaseBuffer: invalid
|
|
argument (Invalid or incomplete multibyte or wide character)
|
|
|
|
> Interestingly, I can get the same crash in the de_DE.UTF-8 locale
|
|
> with certian input filenames, while in en_US.UTF-8, it's ok.
|
|
> The workaround below avoided the problem in de_DE.UTF-8. --[[Joey]]
|
|
|
|
> Put in the utf-8 forcing workaround for now. [[done]] --[[Joey]]
|
|
|
|
## underlying haskell problem and workaround
|
|
|
|
The same problem can be seen with a simple haskell program:
|
|
|
|
import System.Environment
|
|
import Codec.Binary.UTF8.String
|
|
main = do
|
|
args <- getArgs
|
|
putStrLn $ decodeString $ args !! 0
|
|
|
|
joey@gnu:~/src/git-annex>LANG= runghc ~/foo.hs Ü
|
|
foo.hs: <stdout>: hPutChar: invalid argument (Invalid or incomplete multibyte or wide character)
|
|
|
|
(The call to `decodeString` is necessary to make the input
|
|
unicode string be displayed properly in a utf8 locale, but
|
|
does not contribute to this problem.)
|
|
|
|
I guess that haskell is setting the IO encoding to latin1, which
|
|
is [documented](http://haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#v:latin1)
|
|
to error out on characters > 255.
|
|
|
|
So this program doesn't have the problem -- but may output garbage
|
|
on non-utf-8 capable terminals:
|
|
|
|
import System.IO
|
|
main = do
|
|
hSetEncoding stdout utf8
|
|
args <- getArgs
|
|
putStrLn $ decodeString $ args !! 0
|