followup and add link

2020-07-01 12:28:44 -04:00 · 2020-07-01 12:28:44 -04:00 · 424b1912d6
commit 424b1912d6
parent a496ab602d
2 changed files with 25 additions and 3 deletions
--- a/doc/design/caching_database.mdwn
+++ b/doc/design/caching_database.mdwn
@ -1,6 +1,7 @@
 * [[metadata]] for views
 * [[todo/cache_key_info]]
 * [[bugs/indeterminite_preferred_content_state_for_duplicated_file]]
 * [[todo/speed_up_git_annex_sync_--content_--all]]
 What do all these have in common? They could all be improved by
 using some kind of database to locally store the information in an
@ -11,9 +12,6 @@ generated and updated by looking at the git repository.
 * Metadata can be updated by looking at the git-annex branch,
  either its current state, or the diff between the old and new versions
 * Direct mode mappings can be updated by looking at the current branch,
  to see which files map to which key. Or the diff between the old
  and new versions of the branch.
 * Incremental fsck information is not stored in git, but can be
  "regenerated" by running fsck again.  
  (Perhaps doesn't quite fit, but let it slide..)
--- a/doc/todo/speed_up_git_annex_sync_--content_--all/comment_4_690c0dcbfc112f6abd94d02c248ce68b._comment
+++ b/doc/todo/speed_up_git_annex_sync_--content_--all/comment_4_690c0dcbfc112f6abd94d02c248ce68b._comment
@ -0,0 +1,24 @@
 [[!comment format=mdwn
 username="joey"
 subject="""comment 4"""
 date="2020-07-01T16:13:23Z"
 content="""
 It's 80s in my big repo. But of course it would also have to be read back
 in and parsed, so seems it would take 160s or so. (It's going to be a dozen
 or so gb of data anywhere the speed of git-annex sync --all is a problem.)
 Cross-referencing it with `git ls-tree -r git-annex` to get filenames  
 would mean git-annex would take more memory the more keys are stored in it.
 Which is something I have been careful to avoid.
 An sqlite database could surely be faster, especially if it's designed so
 it can be queried for things like "all keys in repo A that are not in repo
 B". But a sqlite database shouldn't only benefit --all, so it also needs to
 be able to do queries like "all keys that have files in HEAD, that are in
 repo A and not in repo B". With that, `git annex get` etc could also get
 faster.
 Anyway, it seems like --all is not really the problem for you; I guess
 you would see similar runtime if you ran git-annex sync --content with the
 larger of your two branches checked out than you do with --all.
 """]]