todo for bs branch

This commit is contained in:
Joey Hess 2019-11-26 16:11:55 -04:00
parent 1f035c0d66
commit 3361edfb61
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 78 additions and 0 deletions

View file

@ -0,0 +1,34 @@
git-annex uses FilePath (String) extensively. That's a slow data type.
Converting to ByteString, and RawFilePath, should speed it up
significantly, according to [[/profiling]].
I've made a test branch, `bs`, to see what kind of performance improvement
to expect. Most commands don't built yet in that branch, but `git annex
find` does. Speedups range from 28-66%. The files fly by much more
snappily.
As well as adding back all the code that was disabled to get it to build,
the `bs` branch has quite a lot of things still needing work, including:
* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
decodeBS conversions. Or at least most of them. There are likely
quite a few places where a value is converted back and forth several times.
It would be good to instrument them with Debug.Trace and find out which
are the hot ones that get called, and focus on those.
* System.FilePath is not available for RawFilePath, and many of the
conversions are to get a FilePath in order to use that library.
It should be entirely straightforward to make a version of System.FilePath
that can operate on RawFilePath, except possibly there could be some
complications due to Windows.
* Use versions of IO actions like getFileStatus that take a RawFilePath,
avoiding a conversion. Note that these are only available on unix, not
windows, so a compatability shim will be needed.
(I can't seem to find any library that provides one.)
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

View file

@ -0,0 +1,44 @@
[[!comment format=mdwn
username="joey"
subject="""profiling"""
date="2019-11-26T20:05:28Z"
content="""
Profiling the early version of the `bs` branch.
Tue Nov 26 16:05 2019 Time and Allocation Profiling Report (Final)
git-annex +RTS -p -RTS find
total time = 2.75 secs (2749 ticks @ 1000 us, 1 processor)
total alloc = 1,642,607,120 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
inAnnex'.\ Annex.Content Annex/Content.hs:(103,61)-(118,31) 31.2 46.8
keyFile' Annex.Locations Annex/Locations.hs:(567,1)-(577,30) 5.3 6.2
encodeW8 Utility.FileSystemEncoding Utility/FileSystemEncoding.hs:(189,1)-(191,70) 3.3 4.2
>>=.\ Data.Attoparsec.Internal.Types Data/Attoparsec/Internal/Types.hs:(146,9)-(147,44) 2.9 0.8
>>=.\.succ' Data.Attoparsec.Internal.Types Data/Attoparsec/Internal/Types.hs:146:13-76 2.6 0.3
keyFile'.esc Annex.Locations Annex/Locations.hs:(573,9)-(577,30) 2.5 5.5
parseLinkTarget Annex.Link Annex/Link.hs:(254,1)-(262,25) 2.4 4.4
getAnnexLinkTarget'.probesymlink Annex.Link Annex/Link.hs:78:9-46 2.4 2.8
w82s Utility.FileSystemEncoding Utility/FileSystemEncoding.hs:217:1-15 2.3 6.0
keyPath Annex.Locations Annex/Locations.hs:(606,1)-(608,23) 1.9 4.0
parseKeyVariety Types.Key Types/Key.hs:(323,1)-(371,42) 1.8 0.0
getState Annex Annex.hs:(251,1)-(254,27) 1.7 0.4
fileKey'.go Annex.Locations Annex/Locations.hs:588:9-55 1.4 0.8
fileKey' Annex.Locations Annex/Locations.hs:(586,1)-(596,41) 1.4 1.7
hashUpdates.\.\.\ Crypto.Hash Crypto/Hash.hs:85:48-99 1.3 0.0
parseLinkTargetOrPointer Annex.Link Annex/Link.hs:(239,1)-(243,25) 1.2 0.1
withPtr Basement.Block.Base Basement/Block/Base.hs:(395,1)-(404,31) 1.2 0.6
primitive Basement.Monad Basement/Monad.hs:72:5-18 1.0 0.1
decodeBS' Utility.FileSystemEncoding Utility/FileSystemEncoding.hs:151:1-31 1.0 2.8
mkKeySerialization Types.Key Types/Key.hs:(115,1)-(117,22) 0.7 1.1
w82c Utility.FileSystemEncoding Utility/FileSystemEncoding.hs:211:1-28 0.6 1.1
Comparing with [[/profiling]] results, the alloc is down significantly.
And the main IO actions are getting a larger share of the runtime.
There is still significantly conversion going on, encodeW8 and w82s and
decodeBS' and w82c. Likely another 5% or so speedup if that's eliminated.
"""]]