build git trees using ContentIdentifier to speed up import

This gets the trees built, but it does not use them. Next step will be
to remember the tree for next time an import is done, and diff between
old and new trees to find the files that have changed.

Added --missing to the mktree parameters. That only disables a check, so
it's ok to do everywhere mktree is used. It probably also speeds up
mktree to disable the check.

Note that git fsck does not complain about the resulting tree objects
that point to shas that are not in the repository. Even with --strict.

A quick benchmark, importing 10000 files, this slowed it down
from 2:04.06 to 2:04.28. So it will more than pay for itself.

Sponsored-by: Luke Shumaker on Patreon
This commit is contained in:
Joey Hess 2023-05-31 12:31:14 -04:00
parent 51319f8558
commit 7298123520
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 124 additions and 69 deletions

View file

@ -77,7 +77,7 @@ withMkTreeHandle :: (MonadIO m, MonadMask m) => Repo -> (MkTreeHandle -> m a) ->
withMkTreeHandle repo a = bracketIO setup cleanup (a . MkTreeHandle)
where
setup = gitCoProcessStart False ps repo
ps = [Param "mktree", Param "--batch", Param "-z"]
ps = [Param "mktree", Param "--missing", Param "--batch", Param "-z"]
cleanup = CoProcess.stop
{- Records a Tree in the Repo, returning its Sha.