size based rebalancing design
This commit is contained in:
parent
99514f9d18
commit
68a99a8f48
2 changed files with 35 additions and 6 deletions
|
@ -58,6 +58,17 @@ If the maximum size of some but not others is known, what then?
|
||||||
Balancing this way would fall back to the method above when several repos
|
Balancing this way would fall back to the method above when several repos
|
||||||
are equally good candidates to hold a key.
|
are equally good candidates to hold a key.
|
||||||
|
|
||||||
|
The problem with size balancing is that in a split brain situation,
|
||||||
|
the known sizes are not accurate, and so one repository will end up more
|
||||||
|
full than others. Consider, for example, a group of 2 repositories of the
|
||||||
|
same size, where one repository is 50% full and the other is 75%. Sending
|
||||||
|
files to that group will put them all in the 50% repository until it gets
|
||||||
|
to 75%. But if another clone is doing the same thing and sending different
|
||||||
|
files, the 50% full repository will end up 100% full.
|
||||||
|
|
||||||
|
Rebalancing could fix that, but it seems better generally to use `N mod M`
|
||||||
|
balancing amoung the repositories known/believed to have enough free space.
|
||||||
|
|
||||||
## stability
|
## stability
|
||||||
|
|
||||||
Note that this preferred content expression will not be stable. A change in
|
Note that this preferred content expression will not be stable. A change in
|
||||||
|
@ -90,10 +101,11 @@ key.
|
||||||
|
|
||||||
However, once 3 of those 5 repos get full, new keys will only be able to be
|
However, once 3 of those 5 repos get full, new keys will only be able to be
|
||||||
stored on 2 of them. At that point one or more new repos will need to be
|
stored on 2 of them. At that point one or more new repos will need to be
|
||||||
added to reach the goal of each key being stored in 3 of them. It would be
|
added to reach the goal of each key being stored in 3 of them.
|
||||||
possible to rebalance the 3 full repos by moving some keys from them to the
|
|
||||||
other 2 repos, and eke out more storage before needing to add new
|
It would be possible to rebalance the 3 full repos by moving some keys from
|
||||||
repositories. A separate rebalancing pass, that does not use preferred
|
them to the other 2 repos, and eke out more storage before needing to add
|
||||||
|
new repositories. A separate rebalancing pass, that does not use preferred
|
||||||
content alone, could be implemented to handle this (see below).
|
content alone, could be implemented to handle this (see below).
|
||||||
|
|
||||||
## use case: geographically distinct datacenters
|
## use case: geographically distinct datacenters
|
||||||
|
@ -183,4 +195,20 @@ users who want it, then
|
||||||
`balanced=group:N == (fullybalanced=group:N and not copies=group:N) or present`
|
`balanced=group:N == (fullybalanced=group:N and not copies=group:N) or present`
|
||||||
usually and when --rebalance is used, `balanced=group:N == fullybalanced=group:N)`
|
usually and when --rebalance is used, `balanced=group:N == fullybalanced=group:N)`
|
||||||
|
|
||||||
|
In the balanced=group:3 example above, some content needs to be moved from
|
||||||
|
the 3 full repos to the 2 less full repos. To handle this,
|
||||||
|
fullybalanced=group:N needs to look at how full the repositories in
|
||||||
|
the group are. What could be done is make it use size based balancing
|
||||||
|
when rebalancing `group:N (>1)
|
||||||
|
|
||||||
|
While size based balancing generally has problems as described above with
|
||||||
|
split brain, rebalancing is probably run in a single repository, so split
|
||||||
|
brain won't be an issue.
|
||||||
|
|
||||||
|
Note that size based rebalancing will need to take into account the size
|
||||||
|
if the content is moved from one of the repositories that contains it to
|
||||||
|
the candidate repository. For example, if one repository is 75% full and
|
||||||
|
the other is 60% full, and the annex object in the 75% full repo is 20%
|
||||||
|
of the size of the repositories, then it doesn't make sense to make the
|
||||||
|
repo that currently contains it not want it any more, because the other
|
||||||
|
repo would end up more full.
|
||||||
|
|
|
@ -78,8 +78,9 @@ Planned schedule of work:
|
||||||
not occur. Users wanting 2 copies can have 2 groups which are each
|
not occur. Users wanting 2 copies can have 2 groups which are each
|
||||||
balanced, although that would mean more repositories on more drives.
|
balanced, although that would mean more repositories on more drives.
|
||||||
|
|
||||||
Also note that "fullybalanced=foo:2" is not currently actually
|
Size based rebalancing may offer a solution; see design.
|
||||||
implemented!
|
|
||||||
|
* "fullybalanced=foo:2" is not currently actually implemented!
|
||||||
|
|
||||||
* `git-annex info` in the limitedcalc path in cachedAllRepoData
|
* `git-annex info` in the limitedcalc path in cachedAllRepoData
|
||||||
double-counts redundant information from the journal due to using
|
double-counts redundant information from the journal due to using
|
||||||
|
|
Loading…
Reference in a new issue