This commit is contained in:
Joey Hess 2021-06-07 16:58:35 -04:00
parent 0f10f208a7
commit 2467de4f9b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 48 additions and 0 deletions

View file

@ -13,4 +13,16 @@ Fixed not to use reconcileStaged it took 37 seconds.
(Keeping reconcileStaged and removing scanAnnexedFiles it took 47 seconds.
That makes sense; reconcileStaged is an incremental updater and is not
able to use SQL as efficiently as scanAnnexedFiles.)
---
Also the git clone of that 100,000 file repo itself, from another repo on
the same SSD, takes 9 seconds. git-annex init taking 4x as long as
a fast local git clone to do a scan is not bad.
This is EOT for me, but I will accept pathes if someone wants to make
git-annex faster.
(Also see
[[todo/display_when_reconcileStaged_is_taking_a_long_time]])
"""]]

View file

@ -0,0 +1,36 @@
Consider this, where branch foo has ten to a hundred thousand files
not in the master branch:
git checkout foo
touch newfile
git annex add newfile
After recent changes to reconcileStaged, the result can be:
add newfile 0b 100% # cursor sits here for several seconds
This is because it has to look in the keys db to see if there's an
associated file that's unlocked and needs populating with the content of
this newly available key, so it does reconcileStaged, which can take some
time.
One fix would be, if reconcileStaged is taking a long time, make it display
a note about what it's doing:
add newfile 0b 100% (scanning annexed files...)
It would also be possible to do the scan before starting to add files,
which would look more consitent and would avoid it getting stuck
with the progress display in view:
(scanning annexed files...)
add newfile ok
It might also be possible to make reconcileStaged run a less expensive
scan in this case, eg the scan it did before
[[!commit 428c91606b434512d1986622e751c795edf4df44]]. In this case, it
only really cares about associated files that are unlocked, and so
diffing from HEAD to the index is sufficient, because the git checkout
will have run the smudge filter on all the unlocked ones in HEAD and so it
will already know about those associated files. However, I can't say I like
this idea much because it complicates using the keys db significantly.