speed up initial scanning for annexed files

Streaming through git this way speeds it up by around 25%. This is
similar to the optimisations of seeking annexed files.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2021-05-31 13:40:42 -04:00
parent aa00e171cb
commit 0f54e5e0ae
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 46 additions and 18 deletions

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 12"""
date="2021-05-31T16:30:11Z"
content="""
Implemented streaming through git. In a repo with 100000 unlocked files,
version 8.20210429 took 46 seconds, now reduced to 36 seconds.
When the files are locked, of course the old version was faster
due to being able to skip all symlinks, 2 seconds. The new version takes
slightly less time than it does for unlocked files, 35 seconds.
Now the git query and processing is only a few seconds of the total run time,
writing information about all the files to sqlite is most of the rest,
and may also be possible to speed up.
"""]]