add todo for tracking free space in repos via git-annex branch
For balanced preferred content perhaps, or just for git-annex info display. Sponsored-by: unqueued on Patreon
This commit is contained in:
parent
3ff6eec9bc
commit
3874b7364f
2 changed files with 53 additions and 0 deletions
|
@ -62,6 +62,8 @@ a manual/scripted process.
|
||||||
> This would need only a single one-time write to the git-annex branch,
|
> This would need only a single one-time write to the git-annex branch,
|
||||||
> to record the repo size. Then update a local counter for each repository
|
> to record the repo size. Then update a local counter for each repository
|
||||||
> from the git-annex branch location log changes.
|
> from the git-annex branch location log changes.
|
||||||
|
> There is a todo about doing this,
|
||||||
|
> [[todo/track_free_space_in_repos_via_git-annex_branch]].
|
||||||
>
|
>
|
||||||
> Of course, in the time after the git-annex branch was updated and before
|
> Of course, in the time after the git-annex branch was updated and before
|
||||||
> it reaches the local repo, a repo can be full without us knowing about
|
> it reaches the local repo, a repo can be full without us knowing about
|
||||||
|
|
51
doc/todo/track_free_space_in_repos_via_git-annex_branch.mdwn
Normal file
51
doc/todo/track_free_space_in_repos_via_git-annex_branch.mdwn
Normal file
|
@ -0,0 +1,51 @@
|
||||||
|
If the total space available in a repository for annex objects is recorded
|
||||||
|
on the git-annex branch (by the user running a command probably, or perhaps
|
||||||
|
automatically), then it is possible to examine the git-annex branch and
|
||||||
|
tell how much free space a remote has available.
|
||||||
|
|
||||||
|
One use case is just to display it in `git-annex info`. But a more
|
||||||
|
compelling use case is [[design/balanced_preferred_content]], which needs a
|
||||||
|
way to tell when an object is too large to store on a repository, so that
|
||||||
|
it can be redirected to be stored on another repository in the same group.
|
||||||
|
|
||||||
|
This was actually a fairly common feature request early on in git-annex
|
||||||
|
and I probably should have thought about it more back then!
|
||||||
|
|
||||||
|
`git-annex info` has recently started summing up the sizes of repositories
|
||||||
|
from location logs, and is well optimised. In my big repository, that takes
|
||||||
|
8.54 seconds of its total runtime.
|
||||||
|
|
||||||
|
Since info already knows the repo sizes, just adding a `git-annex maxsize
|
||||||
|
here 200gb` type of command would let it display the free space of all
|
||||||
|
repos that had a maxsize recorded, essentially for free.
|
||||||
|
|
||||||
|
But 8 seconds is rather a long time to block a `git-annex push`
|
||||||
|
type command. Which would be needed if any remote's preferred content
|
||||||
|
expression used `balanced_amoung`.
|
||||||
|
|
||||||
|
It would help some to cache the calculated sizes in eq a sqlite db, update
|
||||||
|
the cache after sending or dropping content, and invalidate the cache when
|
||||||
|
git-annex branch update merges in a git-annex branch from elsewhere.
|
||||||
|
|
||||||
|
Would it be possible to update incrementally from the previous git-annex
|
||||||
|
branch to the current one? That's essentially what `git-annex log
|
||||||
|
--sizesof` does for each commit on the git-annex branch, so could
|
||||||
|
imagine adapting that to store its state on disk, so it can resume
|
||||||
|
at a new git-annex branch commit.
|
||||||
|
|
||||||
|
Perhaps a less expensive implementation than `git-annex log --sizesof`
|
||||||
|
is possible, to get only the current sizes, if the past sizes are known at a
|
||||||
|
particular git-annex branch commit. We don't care about sizes at
|
||||||
|
intermediate points in time, which that command does calculate.
|
||||||
|
|
||||||
|
See [[todo/info_--size-history]] for the subtleties that had to be handled.
|
||||||
|
In particular, diffing from the previous git-annex branch commit to current may
|
||||||
|
yield lines that seem to indicate content was added to a repo, but in fact
|
||||||
|
that repo already had that content at the previous git-annex branch commit.
|
||||||
|
So it seems it would have to look up the location log's value at the
|
||||||
|
previous commit, either querying the git-annex branch or cached state.
|
||||||
|
|
||||||
|
Worst case, that's queries of the location log file for every single key.
|
||||||
|
If queried from git, that would be slow -- slower than `git-annex info`'s
|
||||||
|
streaming approach. If they were all cached in a sqlite database, it might
|
||||||
|
manage to be faster?
|
Loading…
Add table
Add a link
Reference in a new issue