diff --git a/doc/bugs/multibyte_characters_broken.mdwn b/doc/bugs/multibyte_characters_broken.mdwn new file mode 100644 index 0000000000..9eedac0c7e --- /dev/null +++ b/doc/bugs/multibyte_characters_broken.mdwn @@ -0,0 +1,30 @@ +### Please describe the problem. +git annex add is not fully compatible with multibyte-characters in filenames and may generate filenames with invalid character sequences. + +### What steps will reproduce the problem? +$ git init test; cd test +$ git annex init test +$ echo bla > 01-06\ 三石琴乃・富沢美智恵・久川綾・篠原恵美・深見梨加\ -\ Tuxedo\ Mirage.flac +$ git annex add 01* + +The last command generates an invalid character sequence as filename which, depending on the filesystem, may cause an error: + +Example output: +add "01-06 \344\270\211\347\237\263\347\220\264\344\271\203\343\203\273\345\257\214\346\262\242\347\276\216\346\231\272\346\201\265\343\203\273\344\271\205\345\267\235\347\266\276\343\203\273\347\257\240\345\216\237\346\201\265\347\276\216\343\203\273\346\267\261\350\246\213\346\242\250\345\212\240 - Tuxedo Mirage.flac" + .git/annex/othertmp/: openTempFile template ingest-01-06 三石琴乃・富沢美智恵・久川綾・篠原恵美・深見梨�: invalid argument (Invalid or incomplete multibyte or wide character) + +failed +add: 1 failed + + +### What version of git-annex are you using? On what operating system? +git annex 10.20250630 +NixOS 25.11pre851350.3b9f00d7a7bf + +### Please provide any additional information below. + +Creation of the file fails due to zfs being set to only accept valid utf-8 filenames (utf8only=on, normalization=formD), which greatly helps me detecting encoding issues in filenames. +The original file obviously has a correct encoding, but it seems that git annex generates a new filename by just cutting of the filename after a specific byte, instead of taking character lengths into account. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +I use git annex to manage my whole music collection successfully.