Merge branch 'bs' into sqlite-bs

This commit is contained in:
Joey Hess 2019-12-18 14:51:03 -04:00
commit d5628a16b8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
137 changed files with 827 additions and 516 deletions

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2019-12-10T22:01:12Z"
content="""
> An external remote could also do its own checksum checking and then set `remote..annex-verify=false`
that is an interesting idea, thanks! Not sure if that makes it easy for mass consumption though since it is a feature of a external remote, not sure why it should be in the config. Ideally it should be a property of a remote.
Joey, what do you think in regard of built-in remotes?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="annex-verify"
date="2019-12-11T18:13:48Z"
content="""
\"it is a feature of a external remote, not sure why it should be in the config\" -- because the user might not trust an external remote's implementation of this feature. Besides bugs, there might be [[security exploits|security/CVE-2018-10857_and_CVE-2018-10859]] if external remotes could single-handedly disable verification.
"""]]

View file

@ -9,29 +9,9 @@ Benchmarking `git-annex find`, speedups range from 28-66%. The files fly by
much more snappily. Other commands likely also speed up, but do more work
than find so the improvement is not as large.
The `bs` branch is in a mergeable state now, but still needs work:
The `bs` branch is in a mergeable state now.
* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
decodeBS conversions. Or at least most of them. There are likely
quite a few places where a value is converted back and forth several times.
Stuff not entirely finished:
As a first step, profile and look for the hot spots. Known hot spots:
* keyFile uses fromRawFilePath and that adds around 3% overhead in `git-annex find`.
Converting it to a RawFilePath needs a version of `</>` for RawFilePaths.
* getJournalFileStale uses fromRawFilePath, and adds 3-5% overhead in
`git-annex whereis`. Converting it to RawFilePath needs a version
of `</>` for RawFilePaths. It also needs a ByteString.readFile
for RawFilePath.
* System.FilePath is not available for RawFilePath, and many of the
conversions are to get a FilePath in order to use that library.
It should be entirely straightforward to make a version of System.FilePath
that can operate on RawFilePath, except possibly there could be some
complications due to Windows.
* Use versions of IO actions like getFileStatus that take a RawFilePath,
avoiding a conversion. Note that these are only available on unix, not
windows, so a compatability shim will be needed.
(I can't seem to find any library that provides one.)
* Profile various commands and look for hot spots involving conversion
between RawFilePath and FilePath.

View file

@ -0,0 +1,40 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2019-12-11T18:16:13Z"
content="""
Updated profiling. git-annex find is now ByteString end-to-end!
Note the massive reduction in alloc, and improved runtime.
Wed Dec 11 14:41 2019 Time and Allocation Profiling Report (Final)
git-annex +RTS -p -RTS find
total time = 1.51 secs (1515 ticks @ 1000 us, 1 processor)
total alloc = 608,475,328 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
keyFile' Annex.Locations Annex/Locations.hs:(590,1)-(600,30) 8.2 16.6
>>=.\.succ' Data.Attoparsec.Internal.Types Data/Attoparsec/Internal/Types.hs:146:13-76 4.7 0.7
getAnnexLinkTarget'.probesymlink Annex.Link Annex/Link.hs:79:9-46 4.2 7.6
>>=.\ Data.Attoparsec.Internal.Types Data/Attoparsec/Internal/Types.hs:(146,9)-(147,44) 3.9 2.3
parseLinkTarget Annex.Link Annex/Link.hs:(255,1)-(263,25) 3.9 11.8
doesPathExist Utility.RawFilePath Utility/RawFilePath.hs:30:1-25 3.4 0.6
keyFile'.esc Annex.Locations Annex/Locations.hs:(596,9)-(600,30) 3.2 14.7
fileKey' Annex.Locations Annex/Locations.hs:(609,1)-(619,41) 3.0 4.7
parseLinkTargetOrPointer Annex.Link Annex/Link.hs:(240,1)-(244,25) 2.8 0.2
hashUpdates.\.\.\ Crypto.Hash Crypto/Hash.hs:85:48-99 2.5 0.1
combineAlways System.FilePath.Posix.ByteString System/FilePath/Posix/../Internal.hs:(698,1)-(704,67) 2.0 3.3
getState Annex Annex.hs:(251,1)-(254,27) 2.0 1.1
withPtr.makeTrampoline Basement.Block.Base Basement/Block/Base.hs:(401,5)-(404,31) 1.9 1.7
withMutablePtrHint Basement.Block.Base Basement/Block/Base.hs:(468,1)-(482,50) 1.8 1.2
parseKeyVariety Types.Key Types/Key.hs:(323,1)-(371,42) 1.8 0.0
fileKey'.go Annex.Locations Annex/Locations.hs:611:9-55 1.7 2.2
isLinkToAnnex Annex.Link Annex/Link.hs:(299,1)-(305,47) 1.7 1.0
hashDirMixed Annex.DirHashes Annex/DirHashes.hs:(82,1)-(90,27) 1.7 1.3
primitive Basement.Monad Basement/Monad.hs:72:5-18 1.6 0.1
withPtr Basement.Block.Base Basement/Block/Base.hs:(395,1)-(404,31) 1.5 1.6
mkKeySerialization Types.Key Types/Key.hs:(115,1)-(117,22) 1.1 2.8
decimal.step Data.Attoparsec.ByteString.Char8 Data/Attoparsec/ByteString/Char8.hs:448:9-49 0.8 1.2
"""]]