git-annex/doc/benchmarking/comment_8_c1f99493f5e5c362d5c39f048280b11b._comment
2017-10-31 13:13:40 -04:00

45 lines
1.9 KiB
Text

[[!comment format=mdwn
username="joey"
subject="""profiling"""
date="2016-09-26T19:20:36Z"
content="""
Built git-annex with profiling, using `stack build --profile`
(For reproduciblity, running git-annex in a clone of the git-annex repo
https://github.com/RichiH/conference_proceedings with rev
2797a49023fc24aff6fcaec55421572e1eddcfa2 checked out. It has 9496 annexed
objects.)
Profiling `git-annex find +RTS -p`:
total time = 3.53 secs (3530 ticks @ 1000 us, 1 processor)
total alloc = 3,772,700,720 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
spanList Data.List.Utils 32.6 37.7
startswith Data.List.Utils 14.3 8.1
md5 Data.Hash.MD5 12.4 18.2
join Data.List.Utils 6.9 13.7
catchIO Utility.Exception 5.9 6.0
catches Control.Monad.Catch 5.0 2.8
inAnnex'.checkindirect Annex.Content 4.6 1.8
readish Utility.PartialPrelude 3.0 1.4
isAnnexLink Annex.Link 2.6 4.0
split Data.List.Utils 1.5 0.8
keyPath Annex.Locations 1.2 1.7
This is interesting!
Fully 40% of CPU time and allocations are in list (really String) processing,
and the details of the profiling report show that `spanList` and `startsWith`
and `join` are all coming from calls to `replace` in `keyFile` and `fileKey`.
Both functions nest several calls to replace, so perhaps that could be unwound
into a single pass and/or a ByteString used to do it more efficiently.
12% of run time is spent calculating the md5 hashes for the hash
directories for .git/annex/objects. Data.Hash.MD5 is from missingh, and
it is probably a quite unoptimised version. Switching to the version
if cryptonite would probably speed it up a lot.
"""]]