avoid double work in git-annex init

reconcileStaged was doing a redundant scan to scannAnnexedFiles.

It would probably make sense to move the body of scannAnnexedFiles
into reconcileStaged, the separation does not really serve any purpose.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2021-06-07 16:50:14 -04:00
parent 6ceb31a30a
commit 0f10f208a7
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 93 additions and 38 deletions

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 20"""
date="2021-06-07T19:22:03Z"
content="""
Turns out git-annex init was running both scanAnnexedFiles and
reconcileStaged, which after recent changes to the latter, both do
close to the same scan when run in a fresh clone. So double work!
Benchmarking with 100,000 files, git-annex init took 88 seconds.
Fixed not to use reconcileStaged it took 37 seconds.
(Keeping reconcileStaged and removing scanAnnexedFiles it took 47 seconds.
That makes sense; reconcileStaged is an incremental updater and is not
able to use SQL as efficiently as scanAnnexedFiles.)
"""]]