From 424b1912d6a6e78db044518a7b5fa31a058785ab Mon Sep 17 00:00:00 2001
From: Joey Hess <joeyh@joeyh.name>
Date: Wed, 1 Jul 2020 12:28:44 -0400
Subject: [PATCH] followup and add link

---
 doc/design/caching_database.mdwn              |  4 +---
 ..._690c0dcbfc112f6abd94d02c248ce68b._comment | 24 +++++++++++++++++++
 2 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 doc/todo/speed_up_git_annex_sync_--content_--all/comment_4_690c0dcbfc112f6abd94d02c248ce68b._comment

diff --git a/doc/design/caching_database.mdwn b/doc/design/caching_database.mdwn
index f53753a18d..be0dd17fca 100644
--- a/doc/design/caching_database.mdwn
+++ b/doc/design/caching_database.mdwn
@@ -1,6 +1,7 @@
 * [[metadata]] for views
 * [[todo/cache_key_info]]
 * [[bugs/indeterminite_preferred_content_state_for_duplicated_file]]
+* [[todo/speed_up_git_annex_sync_--content_--all]]
 
 What do all these have in common? They could all be improved by
 using some kind of database to locally store the information in an
@@ -11,9 +12,6 @@ generated and updated by looking at the git repository.
 
 * Metadata can be updated by looking at the git-annex branch,
   either its current state, or the diff between the old and new versions
-* Direct mode mappings can be updated by looking at the current branch,
-  to see which files map to which key. Or the diff between the old
-  and new versions of the branch.
 * Incremental fsck information is not stored in git, but can be
   "regenerated" by running fsck again.  
   (Perhaps doesn't quite fit, but let it slide..)
diff --git a/doc/todo/speed_up_git_annex_sync_--content_--all/comment_4_690c0dcbfc112f6abd94d02c248ce68b._comment b/doc/todo/speed_up_git_annex_sync_--content_--all/comment_4_690c0dcbfc112f6abd94d02c248ce68b._comment
new file mode 100644
index 0000000000..8ac734863f
--- /dev/null
+++ b/doc/todo/speed_up_git_annex_sync_--content_--all/comment_4_690c0dcbfc112f6abd94d02c248ce68b._comment
@@ -0,0 +1,24 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 4"""
+ date="2020-07-01T16:13:23Z"
+ content="""
+It's 80s in my big repo. But of course it would also have to be read back
+in and parsed, so seems it would take 160s or so. (It's going to be a dozen
+or so gb of data anywhere the speed of git-annex sync --all is a problem.)
+
+Cross-referencing it with `git ls-tree -r git-annex` to get filenames  
+would mean git-annex would take more memory the more keys are stored in it.
+Which is something I have been careful to avoid.
+
+An sqlite database could surely be faster, especially if it's designed so
+it can be queried for things like "all keys in repo A that are not in repo
+B". But a sqlite database shouldn't only benefit --all, so it also needs to
+be able to do queries like "all keys that have files in HEAD, that are in
+repo A and not in repo B". With that, `git annex get` etc could also get
+faster.
+
+Anyway, it seems like --all is not really the problem for you; I guess
+you would see similar runtime if you ran git-annex sync --content with the
+larger of your two branches checked out than you do with --all.
+"""]]