git-annex/doc/todo/optimize_by_converting_String_to_ByteString.mdwn
Joey Hess 3d38ec9585
fix fileJournal
My ByteString rewrite oversimplified it, resulting in any _ in a journal
file turning into a / in the git-annex branch, which was often the wrong
filename, or sometimes (//) an invalid filename that git
refused to add.
2019-12-18 11:29:34 -04:00

25 lines
1.1 KiB
Markdown

git-annex uses FilePath (String) extensively. That's a slow data type.
Converting to ByteString, and RawFilePath, should speed it up
significantly, according to [[/profiling]].
I've made a test branch, `bs`, to see what kind of performance improvement
to expect.
Benchmarking `git-annex find`, speedups range from 28-66%. The files fly by
much more snappily. Other commands likely also speed up, but do more work
than find so the improvement is not as large.
The `bs` branch is in a mergeable state now, but still needs work:
* Profile various commands and look for hot spots.
* ByteString.Char8.putStrLn may truncate?
* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
decodeBS conversions. Or at least most of them. There are likely
some places where a value is converted back and forth several times.
* Use versions of IO actions like getFileStatus that take a RawFilePath,
avoiding a conversion. Note that these are only available on unix, not
windows, so a compatability shim will be needed.
(I can't seem to find any library that provides one.)