update
This commit is contained in:
parent
1115fb1f9b
commit
ad966e5e7b
2 changed files with 41 additions and 12 deletions
|
@ -21,11 +21,7 @@ repos that had a maxsize recorded, essentially for free.
|
|||
|
||||
But 8 seconds is rather a long time to block a `git-annex push`
|
||||
type command. Which would be needed if any remote's preferred content
|
||||
expression used `balanced_amoung`.
|
||||
|
||||
It would help some to cache the calculated sizes in eq a sqlite db, update
|
||||
the cache after sending or dropping content, and invalidate the cache when
|
||||
git-annex branch update merges in a git-annex branch from elsewhere.
|
||||
expression used the free space information.
|
||||
|
||||
Would it be possible to update incrementally from the previous git-annex
|
||||
branch to the current one? That's essentially what `git-annex log
|
||||
|
@ -39,13 +35,46 @@ particular git-annex branch commit. We don't care about sizes at
|
|||
intermediate points in time, which that command does calculate.
|
||||
|
||||
See [[todo/info_--size-history]] for the subtleties that had to be handled.
|
||||
In particular, diffing from the previous git-annex branch commit to current may
|
||||
In particular, compating the previous git-annex branch commit to current may
|
||||
yield lines that seem to indicate content was added to a repo, but in fact
|
||||
that repo already had that content at the previous git-annex branch commit.
|
||||
So it seems it would have to look up the location log's value at the
|
||||
previous commit, either querying the git-annex branch or cached state.
|
||||
that repo already had that content at the previous git-annex branch commit
|
||||
and another log line was recorded elsewhere redundantly.
|
||||
So it needs to look at the location log's value at the
|
||||
previous commit in order to determine if a change to a log should be
|
||||
counted.
|
||||
|
||||
Worst case, that's queries of the location log file for every single key.
|
||||
If queried from git, that would be slow -- slower than `git-annex info`'s
|
||||
streaming approach. If they were all cached in a sqlite database, it might
|
||||
manage to be faster?
|
||||
|
||||
## incremental update via git diff
|
||||
|
||||
Could `git diff -U1000000` be used and the patch parsed to get the complete
|
||||
old and new location log? (Assuming no log file ever reaches a million
|
||||
lines.) I tried this in my big repo, and even diffing from the first
|
||||
git-annex branch commit to the last took 7.54 seconds.
|
||||
|
||||
Compare that with the method used by `git-annex info`'s size gathering, of
|
||||
dumping out the content of all files on the branch with `git ls-tree -r
|
||||
git-annex |awk '{print $3}'|git cat-file --batch --buffer`, which only
|
||||
takes 3 seconds. So, this is not ideal when diffing to too old a point.
|
||||
|
||||
Diffing in my big repo to the git-annex branch from 2020 takes 4 seconds.
|
||||
... from 3 months ago takes 2 seconds.
|
||||
... from 1 week ago takes 1 second.
|
||||
|
||||
## incremental update when merging git-annex branch
|
||||
|
||||
When merging git-annex branch changes into .git/annex/index,
|
||||
it already diffs between the branch and the index and uses `git cat-file`
|
||||
to get both versions of the file in order to union merge them.
|
||||
|
||||
That's essentially the same information needed to do the incremental update
|
||||
of the repo sizes. So could update sizes at the same time as merging the
|
||||
git-annex branch. That would be essentially free!
|
||||
|
||||
Note that the use of `git cat-file` in union merge is not --buffer
|
||||
streaming, so is slower than the patch parsing method that was discussed in
|
||||
the previous section. So it might be possible to speed up git-annex branch
|
||||
merging using patch parsing.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue