followup and add link

2020-07-01 12:28:44 -04:00 · 2020-07-01 12:28:44 -04:00 · 424b1912d6
commit 424b1912d6
parent a496ab602d
2 changed files with 25 additions and 3 deletions
--- a/doc/design/caching_database.mdwn
+++ b/doc/design/caching_database.mdwn
@ -1,6 +1,7 @@
 * [[metadata]] for views
 * [[todo/cache_key_info]]
 * [[bugs/indeterminite_preferred_content_state_for_duplicated_file]]
+* [[todo/speed_up_git_annex_sync_--content_--all]]

 What do all these have in common? They could all be improved by
 using some kind of database to locally store the information in an
@ -11,9 +12,6 @@ generated and updated by looking at the git repository.

 * Metadata can be updated by looking at the git-annex branch,
  either its current state, or the diff between the old and new versions
-* Direct mode mappings can be updated by looking at the current branch,
-  to see which files map to which key. Or the diff between the old
-  and new versions of the branch.
 * Incremental fsck information is not stored in git, but can be
  "regenerated" by running fsck again.  
  (Perhaps doesn't quite fit, but let it slide..)
--- a/doc/todo/speed_up_git_annex_sync_--content_--all/comment_4_690c0dcbfc112f6abd94d02c248ce68b._comment
+++ b/doc/todo/speed_up_git_annex_sync_--content_--all/comment_4_690c0dcbfc112f6abd94d02c248ce68b._comment
@ -0,0 +1,24 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 4"""
+ date="2020-07-01T16:13:23Z"
+ content="""
+It's 80s in my big repo. But of course it would also have to be read back
+in and parsed, so seems it would take 160s or so. (It's going to be a dozen
+or so gb of data anywhere the speed of git-annex sync --all is a problem.)
+
+Cross-referencing it with `git ls-tree -r git-annex` to get filenames  
+would mean git-annex would take more memory the more keys are stored in it.
+Which is something I have been careful to avoid.
+
+An sqlite database could surely be faster, especially if it's designed so
+it can be queried for things like "all keys in repo A that are not in repo
+B". But a sqlite database shouldn't only benefit --all, so it also needs to
+be able to do queries like "all keys that have files in HEAD, that are in
+repo A and not in repo B". With that, `git annex get` etc could also get
+faster.
+
+Anyway, it seems like --all is not really the problem for you; I guess
+you would see similar runtime if you ran git-annex sync --content with the
+larger of your two branches checked out than you do with --all.
+"""]]