This was already optimised before, but profiling found that delEntry was
around 1.5% of the total runtime of git-annex whereis. It was being
called once per environment variable per file processed.
Fixed by better caching. Since withIndexFile is almost always run with
the same .git/annex/index file, it can cache the modified environment,
rather than re-modifying it each time called.
(cherry picked from commit 6535aea49a)
This was already optimised before, but profiling found that delEntry was
around 1.5% of the total runtime of git-annex whereis. It was being
called once per environment variable per file processed.
Fixed by better caching. Since withIndexFile is almost always run with
the same .git/annex/index file, it can cache the modified environment,
rather than re-modifying it each time called.
The parser and looking up config keys in the map should both be faster
due to using ByteString.
I had hoped this would speed up startup time, but any improvement to
that was too small to measure. Seems worth keeping though.
Note that the parser breaks up the ByteString, but a config map ends up
pointing to the config as read, which is retained in memory until every
value from it is no longer used. This can change memory usage
patterns marginally, but won't affect git-annex.
While L.toStrict copies, profiling showed it was only around 0.3% of
git-annex find runtime. Does not seem worth optimising that, which would
probably involve either a major refactoring, or a use of
UnsafeInterleaveIO.
Also, it seems to me that the latter would need to read chunks, and
preappend the leftover part to the next chunk. But a strict ByteString
append itself is a copy, so I'm not convinced that would be faster than
L.toStrict.
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.
Benchmarking vs master, this git-annex find is significantly faster!
Specifically:
num files old new speedup
48500 4.77 3.73 28%
12500 1.36 1.02 66%
20 0.075 0.074 0% (so startup time is unchanged)
That's without really finishing the optimization. Things still to do:
* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.
It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)
Currently in a very bad unbuildable in-between state.