followup and add link

This commit is contained in:
Joey Hess 2020-07-01 12:28:44 -04:00
parent a496ab602d
commit 424b1912d6
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 25 additions and 3 deletions

View file

@ -1,6 +1,7 @@
* [[metadata]] for views
* [[todo/cache_key_info]]
* [[bugs/indeterminite_preferred_content_state_for_duplicated_file]]
* [[todo/speed_up_git_annex_sync_--content_--all]]
What do all these have in common? They could all be improved by
using some kind of database to locally store the information in an
@ -11,9 +12,6 @@ generated and updated by looking at the git repository.
* Metadata can be updated by looking at the git-annex branch,
either its current state, or the diff between the old and new versions
* Direct mode mappings can be updated by looking at the current branch,
to see which files map to which key. Or the diff between the old
and new versions of the branch.
* Incremental fsck information is not stored in git, but can be
"regenerated" by running fsck again.
(Perhaps doesn't quite fit, but let it slide..)

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2020-07-01T16:13:23Z"
content="""
It's 80s in my big repo. But of course it would also have to be read back
in and parsed, so seems it would take 160s or so. (It's going to be a dozen
or so gb of data anywhere the speed of git-annex sync --all is a problem.)
Cross-referencing it with `git ls-tree -r git-annex` to get filenames
would mean git-annex would take more memory the more keys are stored in it.
Which is something I have been careful to avoid.
An sqlite database could surely be faster, especially if it's designed so
it can be queried for things like "all keys in repo A that are not in repo
B". But a sqlite database shouldn't only benefit --all, so it also needs to
be able to do queries like "all keys that have files in HEAD, that are in
repo A and not in repo B". With that, `git annex get` etc could also get
faster.
Anyway, it seems like --all is not really the problem for you; I guess
you would see similar runtime if you ran git-annex sync --content with the
larger of your two branches checked out than you do with --all.
"""]]