more formal documentation of balancing
This commit is contained in:
parent
bd5affa362
commit
3019b21c40
2 changed files with 24 additions and 15 deletions
|
@ -15,22 +15,31 @@ that entirely:
|
||||||
|
|
||||||
So, let's add a new expression: `balanced=group`
|
So, let's add a new expression: `balanced=group`
|
||||||
|
|
||||||
## implementation
|
## how it works
|
||||||
|
|
||||||
This would work by taking the list of uuids of all repositories in the
|
To decide which repository wants key K:
|
||||||
group that have enough free space to store a key, and sorting them,
|
|
||||||
which yields a list from 0..M-1 repositories.
|
|
||||||
|
|
||||||
(To know if a repo has enough free space to store a key
|
A is the list of UUIDs of all the repositories in the group,
|
||||||
will need [[todo/track_free_space_in_repos_via_git-annex_branch]]
|
in ascending order.
|
||||||
|
|
||||||
|
B is A filtered to repositories that have enough free space to store key K.
|
||||||
|
(Needs [[todo/track_free_space_in_repos_via_git-annex_branch]]
|
||||||
to be implemented.)
|
to be implemented.)
|
||||||
|
|
||||||
To decide which repository wants key K, convert K to a number N in some
|
S is the concacenation of each UUID in A.
|
||||||
stable way and then `N mod M` yields the number of the repository that
|
|
||||||
wants it, while all the rest don't.
|
|
||||||
|
|
||||||
(Since git-annex keys can be pretty long and not all of them are random
|
N is the HMAC-SHA256 of K and S, with S being the "secret key" and K being
|
||||||
hashes, let's md5sum the key and then use the md5 as a number.)
|
the message.
|
||||||
|
|
||||||
|
M is the number of repositories in B.
|
||||||
|
|
||||||
|
Then `N mod M` is the index of the repository in B that wants key K.
|
||||||
|
|
||||||
|
The purpose of using HMAC-SHA256 here is mostly to evenly distribute
|
||||||
|
amoung the repositories, since git-annex keys can be pretty long and do not
|
||||||
|
always contain hashe. Also, including the concacenation of all the UUIDs
|
||||||
|
of reposotories in the group makes it harder to generate a combination of
|
||||||
|
key and repository UUID that makes that repository want to contain the key.
|
||||||
|
|
||||||
## stability
|
## stability
|
||||||
|
|
||||||
|
@ -39,8 +48,8 @@ the members of the group will change which repository is selected. And
|
||||||
changes in how full repositories are will also change which repo is
|
changes in how full repositories are will also change which repo is
|
||||||
selected.
|
selected.
|
||||||
|
|
||||||
Without stability, when another repo is added to the group, all data will
|
Without stability, when another repo is added to the group, or a repository
|
||||||
be rebalanced, with some moving to it. Which could be desirable in some
|
becomes full, all data will be rebalanced. Which could be desirable in some
|
||||||
situations, but the problem is that it's likely that adding repo3 will make
|
situations, but the problem is that it's likely that adding repo3 will make
|
||||||
repo1 and repo2 want to swap some files between them,
|
repo1 and repo2 want to swap some files between them,
|
||||||
|
|
||||||
|
|
|
@ -42,8 +42,8 @@ Planned schedule of work:
|
||||||
not occur. Users wanting 2 copies can have 2 groups which are each
|
not occur. Users wanting 2 copies can have 2 groups which are each
|
||||||
balanced, although that would mean more repositories on more drives.
|
balanced, although that would mean more repositories on more drives.
|
||||||
|
|
||||||
* document balancing algo well enough that someone else could implement it
|
Also note that "fullybalanced=foo:2" is not currently actually
|
||||||
from the design doc
|
implemented!
|
||||||
|
|
||||||
* Add `git-annex maxsize` command.
|
* Add `git-annex maxsize` command.
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue