todo

2021-06-07 16:58:35 -04:00 · 2021-06-07 16:58:35 -04:00 · 2467de4f9b
commit 2467de4f9b
parent 0f10f208a7
2 changed files with 48 additions and 0 deletions
--- a/doc/todo/Avoid_lengthy_34Scanning_for_unlocked_files_...34/comment_20_e9a36e9600561201969c4d21499833af._comment
+++ b/doc/todo/Avoid_lengthy_34Scanning_for_unlocked_files_...34/comment_20_e9a36e9600561201969c4d21499833af._comment
@ -13,4 +13,16 @@ Fixed not to use reconcileStaged it took 37 seconds.
 (Keeping reconcileStaged and removing scanAnnexedFiles it took 47 seconds.
 That makes sense; reconcileStaged is an incremental updater and is not
 able to use SQL as efficiently as scanAnnexedFiles.)
+
+---
+
+Also the git clone of that 100,000 file repo itself, from another repo on
+the same SSD, takes 9 seconds. git-annex init taking 4x as long as
+a fast local git clone to do a scan is not bad.
+
+This is EOT for me, but I will accept pathes if someone wants to make
+git-annex faster. 
+
+(Also see
+[[todo/display_when_reconcileStaged_is_taking_a_long_time]])
 """]]
--- a/doc/todo/display_when_reconcileStaged_is_taking_a_long_time.mdwn
+++ b/doc/todo/display_when_reconcileStaged_is_taking_a_long_time.mdwn
@ -0,0 +1,36 @@
+Consider this, where branch foo has ten to a hundred thousand files
+not in the master branch:
+	
+	git checkout foo
+	touch newfile
+	git annex add newfile
+
+After recent changes to reconcileStaged, the result can be:
+
+	add newfile 0b 100% # cursor sits here for several seconds
+
+This is because it has to look in the keys db to see if there's an
+associated file that's unlocked and needs populating with the content of
+this newly available key, so it does reconcileStaged, which can take some
+time.
+
+One fix would be, if reconcileStaged is taking a long time, make it display
+a note about what it's doing:
+
+	add newfile 0b 100% (scanning annexed files...)
+
+It would also be possible to do the scan before starting to add files,
+which would look more consitent and would avoid it getting stuck
+with the progress display in view:
+
+	(scanning annexed files...)
+	add newfile ok
+
+It might also be possible to make reconcileStaged run a less expensive
+scan in this case, eg the scan it did before
+[[!commit 428c91606b434512d1986622e751c795edf4df44]]. In this case, it
+only really cares about associated files that are unlocked, and so
+diffing from HEAD to the index is sufficient, because the git checkout
+will have run the smudge filter on all the unlocked ones in HEAD and so it
+will already know about those associated files. However, I can't say I like
+this idea much because it complicates using the keys db significantly.