claw back annexed file scan speedup

Following commit c941ab6f5b, this avoids
the second, redundant scan when annex.thin is not set.

The benchmark now runs in 35.5 seconds, down from 40 seconds.

Note that the inode cache of the annex object has to be passed to
addInodeCaches now, because it might not already be in the inode caches,
unlike previously.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2021-06-08 11:09:15 -04:00
parent ec1f2f246b
commit 7f742589f9
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 24 additions and 15 deletions

View file

@ -86,15 +86,16 @@ scanAnnexedFiles = whenM (inRepo Git.Ref.headExists <&&> not <$> isBareRepo) $ d
-- The above tries to populate pointer files, but one thing it
-- is not able to handle is populating a pointer file when the
-- annex object file already exists, but its inode is not yet
-- cached. So, the rest of this makes another pass over the
-- tree to do that.
g <- Annex.gitRepo
(l, cleanup) <- inRepo $ Git.LsTree.lsTree
Git.LsTree.LsTreeRecursive
(Git.LsTree.LsTreeLong True)
Git.Ref.headRef
catObjectStreamLsTree l want g go
liftIO $ void cleanup
-- cached and annex.thin is set. So, the rest of this makes
-- another pass over the tree to do that.
whenM (annexThin <$> Annex.getGitConfig) $ do
g <- Annex.gitRepo
(l, cleanup) <- inRepo $ Git.LsTree.lsTree
Git.LsTree.LsTreeRecursive
(Git.LsTree.LsTreeLong True)
Git.Ref.headRef
catObjectStreamLsTree l want g go
liftIO $ void cleanup
where
-- Want to process symlinks, and regular files.
want i = case Git.Types.toTreeItemType (Git.LsTree.mode i) of