Merge branch 'bs'

This commit is contained in:
Joey Hess 2019-12-19 13:12:39 -04:00
commit 37db1fa5a0
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
230 changed files with 2045 additions and 1413 deletions

View file

@ -0,0 +1,40 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2019-12-18T19:18:04Z"
content="""
Updated profiling. git-annex find is now ByteString end-to-end!
Note the massive reduction in alloc, and improved runtime.
Wed Dec 11 14:41 2019 Time and Allocation Profiling Report (Final)
git-annex +RTS -p -RTS find
total time = 1.51 secs (1515 ticks @ 1000 us, 1 processor)
total alloc = 608,475,328 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
keyFile' Annex.Locations Annex/Locations.hs:(590,1)-(600,30) 8.2 16.6
>>=.\.succ' Data.Attoparsec.Internal.Types Data/Attoparsec/Internal/Types.hs:146:13-76 4.7 0.7
getAnnexLinkTarget'.probesymlink Annex.Link Annex/Link.hs:79:9-46 4.2 7.6
>>=.\ Data.Attoparsec.Internal.Types Data/Attoparsec/Internal/Types.hs:(146,9)-(147,44) 3.9 2.3
parseLinkTarget Annex.Link Annex/Link.hs:(255,1)-(263,25) 3.9 11.8
doesPathExist Utility.RawFilePath Utility/RawFilePath.hs:30:1-25 3.4 0.6
keyFile'.esc Annex.Locations Annex/Locations.hs:(596,9)-(600,30) 3.2 14.7
fileKey' Annex.Locations Annex/Locations.hs:(609,1)-(619,41) 3.0 4.7
parseLinkTargetOrPointer Annex.Link Annex/Link.hs:(240,1)-(244,25) 2.8 0.2
hashUpdates.\.\.\ Crypto.Hash Crypto/Hash.hs:85:48-99 2.5 0.1
combineAlways System.FilePath.Posix.ByteString System/FilePath/Posix/../Internal.hs:(698,1)-(704,67) 2.0 3.3
getState Annex Annex.hs:(251,1)-(254,27) 2.0 1.1
withPtr.makeTrampoline Basement.Block.Base Basement/Block/Base.hs:(401,5)-(404,31) 1.9 1.7
withMutablePtrHint Basement.Block.Base Basement/Block/Base.hs:(468,1)-(482,50) 1.8 1.2
parseKeyVariety Types.Key Types/Key.hs:(323,1)-(371,42) 1.8 0.0
fileKey'.go Annex.Locations Annex/Locations.hs:611:9-55 1.7 2.2
isLinkToAnnex Annex.Link Annex/Link.hs:(299,1)-(305,47) 1.7 1.0
hashDirMixed Annex.DirHashes Annex/DirHashes.hs:(82,1)-(90,27) 1.7 1.3
primitive Basement.Monad Basement/Monad.hs:72:5-18 1.6 0.1
withPtr Basement.Block.Base Basement/Block/Base.hs:(395,1)-(404,31) 1.5 1.6
mkKeySerialization Types.Key Types/Key.hs:(115,1)-(117,22) 1.1 2.8
decimal.step Data.Attoparsec.ByteString.Char8 Data/Attoparsec/ByteString/Char8.hs:448:9-49 0.8 1.2
"""]]

View file

@ -0,0 +1,3 @@
Profiling of `git annex find --not --in web` suggests that converting Ref
to contain a ByteString, rather than a String, would eliminate a
fromRawFilePath that uses about 1% of runtime.

View file

@ -0,0 +1,21 @@
Often a command will need to read a number of files from the git-annex
branch, and it uses getJournalFile for each to check for any journalled
change that has not reached the branch. But typically, the journal is empty
and in such a case, that's a lot of time spent trying to open journal files
that DNE.
Profiling eg, `git annex find --in web` shows things called by getJournalFile
use around 5% of runtime.
What if, once at startup, it checked if the journal was entirely empty.
If so, it can remember that, and avoid reading journal files.
Perhaps paired with staging the journal if it's not empty.
This could lead to behavior changes in some cases where one command is
writing changes and another command used to read them from the journal and
may no longer do so. But any such behavior change is of a behavior that
used to involve a race; the reader could just as well be ahead of the
writer and it would have already behaved as it would after the change.
But: When a process writes to the journal, it will need to update its state
to remember it's no longer empty. --[[Joey]]

View file

@ -9,29 +9,9 @@ Benchmarking `git-annex find`, speedups range from 28-66%. The files fly by
much more snappily. Other commands likely also speed up, but do more work
than find so the improvement is not as large.
The `bs` branch is in a mergeable state now, but still needs work:
The `bs` branch is in a mergeable state now. [[done]]
* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
decodeBS conversions. Or at least most of them. There are likely
quite a few places where a value is converted back and forth several times.
Stuff not entirely finished:
As a first step, profile and look for the hot spots. Known hot spots:
* keyFile uses fromRawFilePath and that adds around 3% overhead in `git-annex find`.
Converting it to a RawFilePath needs a version of `</>` for RawFilePaths.
* getJournalFileStale uses fromRawFilePath, and adds 3-5% overhead in
`git-annex whereis`. Converting it to RawFilePath needs a version
of `</>` for RawFilePaths. It also needs a ByteString.readFile
for RawFilePath.
* System.FilePath is not available for RawFilePath, and many of the
conversions are to get a FilePath in order to use that library.
It should be entirely straightforward to make a version of System.FilePath
that can operate on RawFilePath, except possibly there could be some
complications due to Windows.
* Use versions of IO actions like getFileStatus that take a RawFilePath,
avoiding a conversion. Note that these are only available on unix, not
windows, so a compatability shim will be needed.
(I can't seem to find any library that provides one.)
* Profile various commands and look for hot spots involving conversion
between RawFilePath and FilePath.

View file

@ -0,0 +1,40 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2019-12-11T18:16:13Z"
content="""
Updated profiling. git-annex find is now ByteString end-to-end!
Note the massive reduction in alloc, and improved runtime.
Wed Dec 11 14:41 2019 Time and Allocation Profiling Report (Final)
git-annex +RTS -p -RTS find
total time = 1.51 secs (1515 ticks @ 1000 us, 1 processor)
total alloc = 608,475,328 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
keyFile' Annex.Locations Annex/Locations.hs:(590,1)-(600,30) 8.2 16.6
>>=.\.succ' Data.Attoparsec.Internal.Types Data/Attoparsec/Internal/Types.hs:146:13-76 4.7 0.7
getAnnexLinkTarget'.probesymlink Annex.Link Annex/Link.hs:79:9-46 4.2 7.6
>>=.\ Data.Attoparsec.Internal.Types Data/Attoparsec/Internal/Types.hs:(146,9)-(147,44) 3.9 2.3
parseLinkTarget Annex.Link Annex/Link.hs:(255,1)-(263,25) 3.9 11.8
doesPathExist Utility.RawFilePath Utility/RawFilePath.hs:30:1-25 3.4 0.6
keyFile'.esc Annex.Locations Annex/Locations.hs:(596,9)-(600,30) 3.2 14.7
fileKey' Annex.Locations Annex/Locations.hs:(609,1)-(619,41) 3.0 4.7
parseLinkTargetOrPointer Annex.Link Annex/Link.hs:(240,1)-(244,25) 2.8 0.2
hashUpdates.\.\.\ Crypto.Hash Crypto/Hash.hs:85:48-99 2.5 0.1
combineAlways System.FilePath.Posix.ByteString System/FilePath/Posix/../Internal.hs:(698,1)-(704,67) 2.0 3.3
getState Annex Annex.hs:(251,1)-(254,27) 2.0 1.1
withPtr.makeTrampoline Basement.Block.Base Basement/Block/Base.hs:(401,5)-(404,31) 1.9 1.7
withMutablePtrHint Basement.Block.Base Basement/Block/Base.hs:(468,1)-(482,50) 1.8 1.2
parseKeyVariety Types.Key Types/Key.hs:(323,1)-(371,42) 1.8 0.0
fileKey'.go Annex.Locations Annex/Locations.hs:611:9-55 1.7 2.2
isLinkToAnnex Annex.Link Annex/Link.hs:(299,1)-(305,47) 1.7 1.0
hashDirMixed Annex.DirHashes Annex/DirHashes.hs:(82,1)-(90,27) 1.7 1.3
primitive Basement.Monad Basement/Monad.hs:72:5-18 1.6 0.1
withPtr Basement.Block.Base Basement/Block/Base.hs:(395,1)-(404,31) 1.5 1.6
mkKeySerialization Types.Key Types/Key.hs:(115,1)-(117,22) 1.1 2.8
decimal.step Data.Attoparsec.ByteString.Char8 Data/Attoparsec/ByteString/Char8.hs:448:9-49 0.8 1.2
"""]]