more thoughts

This commit is contained in:
Joey Hess 2018-10-17 13:26:54 -04:00
parent 558520d27a
commit fc7fe2b19d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 44 additions and 11 deletions

View file

@ -0,0 +1,32 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2018-10-17T15:50:46Z"
content="""
This will need some state maintained, to allow efficiently querying for
worktree files that have gained/lost content since the last sync.
At least need to maintain a map of all keys that were gained/lost since
last time.
It would be easy to loop through `git ls-tree` of the master branch,
look up all the keys with `git cat-file`, and find in the map.
But slow...
Better would be to maintain an additional map from filename to key.
The keys database already maintains a map from key to worktree file
(and back), but only in v6 mode, and only for unlocked files.
Not useful for this.
This would need anything that changes annex pointers
(fix/unlock/lock/pre-commit) to update the map. Would also need to make
sure that it gets updated with any changes to the checked out branch
made by git commit or git-annex sync. Doable, but complicated.
Or, the map could be of the sha1s of the annex pointers, then loop
through `git ls-files --stage` and look up the sha1s in the map
would not be too slow. On my laptop, with 85000 files in the tree,
that command takes 0.13s. Still needs to update the map whenever
annex pointers are changed though.
"""]]

View file

@ -25,17 +25,6 @@ git-annex should use smudge/clean filters. v6 mode
(My enhanced smudge/clean patch set also fixed this problem, in a much
nicer way...)
## other warts
* There are several v6 bugs that are edge cases and
need more info or analysis. None of these seem like blockers
to keep v6 experimental or to replacing direct mode with v6.
- <http://git-annex.branchable.com/bugs/assistant_crashes_in_TransferScanner/>
- <http://git-annex.branchable.com/bugs/v6_appears_to_not_thin/>
- <http://git-annex.branchable.com/bugs/Metadata_views_in_v6_repo_upgraded_from_direct_mode_act_strangely/>a
- <http://git-annex.branchable.com/bugs/git-annex-sync_sometimes_fails_in_submodule_in_V6_adjusted_branch/>
* When git runs the smudge filter, it buffers all its output in ram before
writing it to a file. So, checking out a branch with a large v6 unlocked files
can cause git to use a lot of memory.
@ -51,6 +40,18 @@ git-annex should use smudge/clean filters. v6 mode
The annex.thin idea above could work around this problem.
## other warts
* There are several v6 bugs that are edge cases and
need more info or analysis. None of these seem like blockers
to keep v6 experimental or to replacing direct mode with v6.
- <http://git-annex.branchable.com/bugs/assistant_crashes_in_TransferScanner/>
- <http://git-annex.branchable.com/bugs/v6_appears_to_not_thin/>
- <http://git-annex.branchable.com/bugs/Metadata_views_in_v6_repo_upgraded_from_direct_mode_act_strangely/>
- <http://git-annex.branchable.com/bugs/git-annex-sync_sometimes_fails_in_submodule_in_V6_adjusted_branch/>
### long term todos
* Potentially: Use git's new `filter.<driver>.process` interface, which will