Log.Remote.prop_parse_show_Config failed on an input of fromList [("\28162","")]
in LANG=C, encodeBS "\28162" == "\STX=", while in UTF-8 locale,
encodeBS "\28162" == "\230\184\130". So in the C locale, the String
that's the parsed Map key ends up being encoded differently than it was
in the input Map.
Logs.Presence.Pure.prop_parse_build_log was failing in LANG=C because
the Arbitrary LogLine for some reason sometimes generated LogInfo values
containing \n or \r, despite using suchThat to prevent that. I don't
understand why at all, but switching the suchThat to filter the
ByteString instead of the String before conversion with encodeBS
somehow avoids the problem.
Both of these suggest something wonky with encodeBS in LANG=C, but
I *think* it's not a problem except for with test data generated by
Arbitrary.
The object is supposed to be present on the readonly remote; have to
assume the location log is right about that, so the presence check
should succeed.
* Switch to using .git/annex/othertmp for tmp files other than partial
downloads, and make stale files left in that directory when git-annex
is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
delete anything lingering in there after it's 1 week old.
Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
Adding that field broke the Read/Show serialization back-compat,
and also the Eq and Ord instances were not blinded to it, which broke
git annex fsck and probably more.
I think that the new approach used in formatKeyVariety will be nearly
as fast, but have not benchmarked it.
This reverts commit 4536c93bb2.
That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.
It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.
Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
It used to, but that was lost in the bytestring conversion recently.
20 * 4 = 80, but I only increased it to 64, which would be up to 16
4-byte unicode characters.
I think I ran the original benchmark in some subdir of my big repo,
which is not a good test case. Updated with value from newly created
repo of 1000 files.
The builder produces a lazy ByteString, and L.toStrict has to copy it,
but needing to use the builder is no longer to common case; the
serialization will normally be cached already as a strict ByteString,
and this avoids keyFile' needing to use L.toStrict . serializeKey'
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.
It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
Now there's a ByteString used all the way from disk to Key.
The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.
Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.