comment
This commit is contained in:
parent
0da1d40cd4
commit
aaeadc422a
2 changed files with 29 additions and 2 deletions
|
@ -6,6 +6,6 @@
|
||||||
My recent optimisations of `git-annex sync` with importtree remotes uses a
|
My recent optimisations of `git-annex sync` with importtree remotes uses a
|
||||||
similar diffing approach.
|
similar diffing approach.
|
||||||
|
|
||||||
A transition is underway to making `--content` be enabled by default, and
|
`git-annex satisfy` syncs `--content` by default, so this optimisation would
|
||||||
faster syncing with it would be a nice thing to do before then.
|
be especially nice to have for it.
|
||||||
"""]]
|
"""]]
|
||||||
|
|
|
@ -0,0 +1,27 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 5"""
|
||||||
|
date="2023-10-24T17:26:53Z"
|
||||||
|
content="""
|
||||||
|
To implement this optimisation for a non-all sync, when
|
||||||
|
the tree being synced has changed, it ought to diff from the old
|
||||||
|
tree to the current tree, and sync those files. Preferred
|
||||||
|
content can vary depending on filename, and diffing like that will avoid
|
||||||
|
scanning every file in the whole tree.
|
||||||
|
|
||||||
|
And when there are location log changes, it needs to also sync files in the
|
||||||
|
tree that use keys whose location log changed, using the git-annex branch
|
||||||
|
diff to find those keys. (And presumably then using the keys database to get
|
||||||
|
back to the filenames.)
|
||||||
|
|
||||||
|
So, implementing an optimisation like this for a non-all sync has two
|
||||||
|
separate diffs which would have to be combined together somehow.
|
||||||
|
|
||||||
|
Doing that in constant memory would be hard. It seems that a bloom filter
|
||||||
|
cannot be used to check if a file was processed in the first diff and avoid
|
||||||
|
processing it again in the second diff. Because a false positive would
|
||||||
|
avoid processing a file whose location log did change. I think it would
|
||||||
|
need to use an on-disk structure maybe (eg sqlite)?
|
||||||
|
|
||||||
|
None of which should prevent implementing this nice optimisation for --all.
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue