reproduced

This commit is contained in:
Joey Hess 2023-03-14 13:36:40 -04:00
parent 47c010155f
commit 1f124103dc
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -0,0 +1,78 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2023-03-14T17:00:06Z"
content="""
Oho, I reproduced it!
First I made a git-annex repo named "fooé", and added a file "foo" to it,
and that is unlocked.
joey@darkstar:/tmp>git clone fooé bar
Cloning into 'bar'...
done.
joey@darkstar:/tmp>cd bar
joey@darkstar:/tmp/bar>git-annex move --from origin
move foo (from origin...)
(recording state in git...)
ok
(recording state in git...)
joey@darkstar:/tmp/bar>git-annex copy --to origin
copy foo (to origin...)
(recording state in git...)
ok
(recording state in git...)
joey@darkstar:/tmp/bar>git-annex move --from origin
move foo (from origin...) (recording state in git...)
ok
(recording state in git...)
joey@darkstar:/tmp/bar>LANG=C git-annex copy --to origin
copy foo (to origin...)
sqlite worker thread crashed: SQLite3 returned ErrorCan'tOpen while attempting to perform open "/tmp/foo\65533\65533/.git/annex/keysdb/db".
CallStack (from HasCallStack):
error, called at ./Database/Handle.hs:87:25 in main:Database.Handle
ok
Note that using git-annex in repo fooé with LANG=C works. The problem
seems limited to a remote with a unicode character in its name when not in a
unicode locale.
In strace I see this:
3451696 openat(AT_FDCWD, "/tmp/foo\357\277\275\357\277\275/.git/annex/keysdb/db", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644 <unfinished ...>
3451694 <... close resumed>) = 0
3451696 <... openat resumed>) = -1 ENOENT (No such file or directory)
That doesn't look like the correct encoding for "é" does it?
`"/tmp/foo\303\251"` would be correct and is what git-annex otherwise uses
when accessing that repo.
I think the reason for this is simply that persistent-sqlite uses Text
for the location of the database. And Text is unicode encoded. So when the non
unicode locale results in a FilePath that is encoded using the filesystem encoding,
with surrogate characters, and that gets fed to Data.Text.pack, it replaces the
"invalid scalar values" with "\65533".
The same thing would happen in a unicode locale if the remote's path was not
valid unicode.
Filed an issue to get persistent-sqlite to not use Text for the FilePath
<https://github.com/yesodweb/persistent/issues/1481>
I don't think that non-unicode FilePaths can be generally squeezed into a Text,
so we may need to wait persistent getting fixed. Although it should be possible,
in a non-unicode locale, to convert a non-unicode FilePath like "fooé"
to a Text.
----
Also notice that git-annex succeeds despite this error. Which is reasonable
since it was only unable to update the remote's keys db, and the remote
can and will just update it itself next time git-annex is used over there.
Which will work since that git-annex will be running in the directory
and will use relative paths.
So perhaps the error message could just be suppressed?
"""]]