From 2467de4f9b1bece15e10fa736c4712d9f024315a Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 7 Jun 2021 16:58:35 -0400 Subject: [PATCH] todo --- ..._e9a36e9600561201969c4d21499833af._comment | 12 +++++++ ...reconcileStaged_is_taking_a_long_time.mdwn | 36 +++++++++++++++++++ 2 files changed, 48 insertions(+) create mode 100644 doc/todo/display_when_reconcileStaged_is_taking_a_long_time.mdwn diff --git a/doc/todo/Avoid_lengthy___34__Scanning_for_unlocked_files_...__34__/comment_20_e9a36e9600561201969c4d21499833af._comment b/doc/todo/Avoid_lengthy___34__Scanning_for_unlocked_files_...__34__/comment_20_e9a36e9600561201969c4d21499833af._comment index 8b8ffef1fc..ea193d0c2d 100644 --- a/doc/todo/Avoid_lengthy___34__Scanning_for_unlocked_files_...__34__/comment_20_e9a36e9600561201969c4d21499833af._comment +++ b/doc/todo/Avoid_lengthy___34__Scanning_for_unlocked_files_...__34__/comment_20_e9a36e9600561201969c4d21499833af._comment @@ -13,4 +13,16 @@ Fixed not to use reconcileStaged it took 37 seconds. (Keeping reconcileStaged and removing scanAnnexedFiles it took 47 seconds. That makes sense; reconcileStaged is an incremental updater and is not able to use SQL as efficiently as scanAnnexedFiles.) + +--- + +Also the git clone of that 100,000 file repo itself, from another repo on +the same SSD, takes 9 seconds. git-annex init taking 4x as long as +a fast local git clone to do a scan is not bad. + +This is EOT for me, but I will accept pathes if someone wants to make +git-annex faster. + +(Also see +[[todo/display_when_reconcileStaged_is_taking_a_long_time]]) """]] diff --git a/doc/todo/display_when_reconcileStaged_is_taking_a_long_time.mdwn b/doc/todo/display_when_reconcileStaged_is_taking_a_long_time.mdwn new file mode 100644 index 0000000000..20b645d6e5 --- /dev/null +++ b/doc/todo/display_when_reconcileStaged_is_taking_a_long_time.mdwn @@ -0,0 +1,36 @@ +Consider this, where branch foo has ten to a hundred thousand files +not in the master branch: + + git checkout foo + touch newfile + git annex add newfile + +After recent changes to reconcileStaged, the result can be: + + add newfile 0b 100% # cursor sits here for several seconds + +This is because it has to look in the keys db to see if there's an +associated file that's unlocked and needs populating with the content of +this newly available key, so it does reconcileStaged, which can take some +time. + +One fix would be, if reconcileStaged is taking a long time, make it display +a note about what it's doing: + + add newfile 0b 100% (scanning annexed files...) + +It would also be possible to do the scan before starting to add files, +which would look more consitent and would avoid it getting stuck +with the progress display in view: + + (scanning annexed files...) + add newfile ok + +It might also be possible to make reconcileStaged run a less expensive +scan in this case, eg the scan it did before +[[!commit 428c91606b434512d1986622e751c795edf4df44]]. In this case, it +only really cares about associated files that are unlocked, and so +diffing from HEAD to the index is sufficient, because the git checkout +will have run the smudge filter on all the unlocked ones in HEAD and so it +will already know about those associated files. However, I can't say I like +this idea much because it complicates using the keys db significantly.