wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.
This commit is contained in:
parent
6a97ff6b3a
commit
067aabdd48
61 changed files with 380 additions and 296 deletions
|
@ -43,6 +43,7 @@ import Utility.Monad
|
|||
import Utility.UserInfo
|
||||
import Utility.Directory
|
||||
import Utility.Split
|
||||
import Utility.FileSystemEncoding
|
||||
|
||||
{- Simplifies a path, removing any "." component, collapsing "dir/..",
|
||||
- and removing the trailing path separator.
|
||||
|
@ -200,20 +201,21 @@ prop_relPathDirToFile_regressionTest = same_dir_shortcurcuits_at_difference
|
|||
- we stop preserving ordering at that point. Presumably a user passing
|
||||
- that many paths in doesn't care too much about order of the later ones.
|
||||
-}
|
||||
segmentPaths :: [FilePath] -> [FilePath] -> [[FilePath]]
|
||||
segmentPaths :: [RawFilePath] -> [RawFilePath] -> [[RawFilePath]]
|
||||
segmentPaths [] new = [new]
|
||||
segmentPaths [_] new = [new] -- optimisation
|
||||
segmentPaths (l:ls) new = found : segmentPaths ls rest
|
||||
where
|
||||
(found, rest) = if length ls < 100
|
||||
then partition (l `dirContains`) new
|
||||
else break (\p -> not (l `dirContains` p)) new
|
||||
then partition inl new
|
||||
else break (not . inl) new
|
||||
inl f = fromRawFilePath l `dirContains` fromRawFilePath f
|
||||
|
||||
{- This assumes that it's cheaper to call segmentPaths on the result,
|
||||
- than it would be to run the action separately with each path. In
|
||||
- the case of git file list commands, that assumption tends to hold.
|
||||
-}
|
||||
runSegmentPaths :: ([FilePath] -> IO [FilePath]) -> [FilePath] -> IO [[FilePath]]
|
||||
runSegmentPaths :: ([RawFilePath] -> IO [RawFilePath]) -> [RawFilePath] -> IO [[RawFilePath]]
|
||||
runSegmentPaths a paths = segmentPaths paths <$> a paths
|
||||
|
||||
{- Converts paths in the home directory to use ~/ -}
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue