comment

2023-10-24 13:54:31 -04:00 · 2023-10-24 13:54:31 -04:00 · aaeadc422a
commit aaeadc422a
parent 0da1d40cd4
2 changed files with 29 additions and 2 deletions
--- a/doc/todo/Incremental_git_annex_sync_--content_--all/comment_4_c232e1e1cfcc47f70079f2d32c2b4633._comment
+++ b/doc/todo/Incremental_git_annex_sync_--content_--all/comment_4_c232e1e1cfcc47f70079f2d32c2b4633._comment
@ -6,6 +6,6 @@
 My recent optimisations of `git-annex sync` with importtree remotes uses a
 similar diffing approach.

-A transition is underway to making `--content` be enabled by default, and
-faster syncing with it would be a nice thing to do before then.
+`git-annex satisfy` syncs `--content` by default, so this optimisation would
+be especially nice to have for it.
 """]]
--- a/doc/todo/Incremental_git_annex_sync_--content_--all/comment_5_e81719f23565579674249db5d0a883da._comment
+++ b/doc/todo/Incremental_git_annex_sync_--content_--all/comment_5_e81719f23565579674249db5d0a883da._comment
@ -0,0 +1,27 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 5"""
+ date="2023-10-24T17:26:53Z"
+ content="""
+To implement this optimisation for a non-all sync, when 
+the tree being synced has changed, it ought to diff from the old
+tree to the current tree, and sync those files. Preferred
+content can vary depending on filename, and diffing like that will avoid
+scanning every file in the whole tree.
+
+And when there are location log changes, it needs to also sync files in the
+tree that use keys whose location log changed, using the git-annex branch
+diff to find those keys. (And presumably then using the keys database to get
+back to the filenames.)
+
+So, implementing an optimisation like this for a non-all sync has two
+separate diffs which would have to be combined together somehow.
+
+Doing that in constant memory would be hard. It seems that a bloom filter
+cannot be used to check if a file was processed in the first diff and avoid
+processing it again in the second diff. Because a false positive would
+avoid processing a file whose location log did change. I think it would
+need to use an on-disk structure maybe (eg sqlite)?
+
+None of which should prevent implementing this nice optimisation for --all.
+"""]]