possible optimisation idea

2015-09-03 13:52:43 -07:00 · 2015-09-03 13:52:43 -07:00 · b5c0fb1c48
commit b5c0fb1c48
parent d570b5c0ff
1 changed files with 29 additions and 0 deletions
--- a/doc/todo/use_git-mktree_rather_than_index_file.mdwn
+++ b/doc/todo/use_git-mktree_rather_than_index_file.mdwn
@ -0,0 +1,29 @@
+When git-annex is updating the git-annex branch, it currently
+uses a separate index file. This adds overhead and complexity to the code.
+Especially when there are many files, the index file gets large and writing
+it gets slow.
+
+It might be an improvement to use `git mktree --batch` to inject a
+tree object into git, without using the index file. `git hash-object`
+is already used to add the files to git. All that would be needed is to
+generate an updated tree containing the new file(s), and then update each
+parent tree up to the root tree. This new tree can then be committed using
+`git commit-tree`
+
+The only thing I can see that might make this slow at all is reading the old
+tree contents, in order to update it. This would need a `git ls-tree` for
+each tree; it does not have a batch mode, so 4 processes would need to be
+spawned when generating a tree that changes 1 file. For any repo that's not
+very small, that's probably still faster than rewriting the index file.
+
+Notes:
+
+* The union merge code currently uses the index. No particular reason
+  it needs to; that's just how the code is written, and it might be a large
+  rewrite to change it.
+* A new git-annex branch can be pushed into the repository at any time.
+  The current code uses the index to detect when this happens, and
+  union merges the new branch head into the index. Would need something
+  like a `GIT_ANNEX_HEAD` ref to do the same if the index is removed.
+
+Thanks to sm for indirectly suggesting this. --[[Joey]]