3ce2e95a5f
This all works fine. But it doesn't check repository sizes yet, and without repository size checking, once a repository gets full, there will be no other repository that will want its files. Use of sha2 seems unncessary, probably alder2 or md5 or crc would have been enough. Possibly just summing up the bytes of the key mod the number of repositories would have sufficed. But sha2 is there, and probably hardware accellerated. I doubt very much there is any security benefit to using it though. If someone wants to construct a key that will be balanced onto a given repository, sha2 is certianly not going to stop them.
160 lines
6.5 KiB
Markdown
160 lines
6.5 KiB
Markdown
[[!toc ]]
|
|
|
|
## motivations
|
|
|
|
Say we have 2 drives and want to fill them both evenly with files,
|
|
different files in each drive. Currently, preferred content cannot express
|
|
that entirely:
|
|
|
|
* One way is to use a-m* and n-z*, but that's unlikely to split filenames evenly.
|
|
* Or, can let both repos take whatever files, perhaps at random, that the
|
|
other repo is not know to contain, but then repos will race and both get
|
|
the same file, or similarly if they are not communicating frequently.
|
|
Existing preferred content expressions such as the one for archive group
|
|
have this problem.
|
|
|
|
So, let's add a new expression: `balanced=group`
|
|
|
|
## implementation
|
|
|
|
This would work by taking the list of uuids of all repositories in the
|
|
group that have enough free space to store a key, and sorting them,
|
|
which yields a list from 0..M-1 repositories.
|
|
|
|
(To know if a repo has enough free space to store a key
|
|
will need [[todo/track_free_space_in_repos_via_git-annex_branch]]
|
|
to be implemented.)
|
|
|
|
To decide which repository wants key K, convert K to a number N in some
|
|
stable way and then `N mod M` yields the number of the repository that
|
|
wants it, while all the rest don't.
|
|
|
|
(Since git-annex keys can be pretty long and not all of them are random
|
|
hashes, let's md5sum the key and then use the md5 as a number.)
|
|
|
|
## stability
|
|
|
|
Note that this preferred content expression will not be stable. A change in
|
|
the members of the group will change which repository is selected. And
|
|
changes in how full repositories are will also change which repo is
|
|
selected.
|
|
|
|
Without stability, when another repo is added to the group, all data will
|
|
be rebalanced, with some moving to it. Which could be desirable in some
|
|
situations, but the problem is that it's likely that adding repo3 will make
|
|
repo1 and repo2 want to swap some files between them,
|
|
|
|
So, we'll want to add some precautions to avoid a lot of data moving around
|
|
in such a case:
|
|
|
|
(balanced=backup and not (copies=backup:1)) or present
|
|
|
|
So once file lands on a backup drive, it stays there, even if more backup
|
|
drives change the balancing.
|
|
|
|
## use case: 3 of 5
|
|
|
|
What if we have 5 backup repos and want each key to be stored in 3 of them?
|
|
There's a simple change that can support that:
|
|
`balanced=group:3`
|
|
|
|
This works the same as before, but rather than just `N mod M`, take
|
|
`N+I mod M` where I is [0..2] to get the list of 3 repositories that want a
|
|
key.
|
|
|
|
However, once 3 of those 5 repos get full, new keys will only be able to be
|
|
stored on 2 of them. At that point one or more new repos will need to be
|
|
added to reach the goal of each key being stored in 3 of them. It would be
|
|
possible to rebalance the 3 full repos by moving some keys from them to the
|
|
other 2 repos, and eke out more storage before needing to add new
|
|
repositories. A separate rebalancing pass, that does not use preferred
|
|
content alone, could be implemented to handle this (see below).
|
|
|
|
## use case: geographically distinct datacenters
|
|
|
|
Of course this is not limited to backup drives. A more complicated example:
|
|
There are 4 geographically distributed datacenters, each of which has some
|
|
number of drives. Each file should have 1 copy stored in each datacenter,
|
|
on some drive there.
|
|
|
|
This can be implemented by making a group for each datacenter, which all of
|
|
its drives are in, and using `balanced` to pick the drive that holds the
|
|
copy of the file. The preferred content expression would be eg:
|
|
|
|
(balanced=datacenterA and not copies=datacenterA:1) or present
|
|
|
|
In such a situation, to avoid a `N^2` remote interconnect, there might be a
|
|
transfer repository in each datacenter, that is in front of its drives. The
|
|
transfer repository should want files that have not yet reached the
|
|
destination drive. How to write a preferred content expression for that?
|
|
It might be sufficient to use `copies=datacenterA:1`, so long as the file
|
|
reaching any drive in the datacenter is enough. But may want to add
|
|
something analagous to `inallgroup=` that checks if a file is in
|
|
the place that `balanced` picks for a group. Eg,
|
|
`balancedgroup=datacenterA` for 1 copy and `balancedgroup=group:datacenterA:2`
|
|
for N copies.
|
|
|
|
The [[design/passthrough_proxy]] idea is an alternate way to put a
|
|
repository in front of such a cluster, that does not need additional
|
|
extensions to preferred content.
|
|
|
|
## split brain situations
|
|
|
|
Of course, in the time after the git-annex branch was updated and before
|
|
it reaches the local repo, a repo can be full without us knowing about
|
|
it. Stores to it would fail, and perhaps be retried, until the updated
|
|
git-annex branch was synced.
|
|
|
|
In the worst case, a split brain situation
|
|
can make the balanced preferred content expression
|
|
pick a different repository to hold two independent
|
|
stores of the same key. Eg, when one side thinks one repo is full,
|
|
and the other side thinks the other repo is full.
|
|
|
|
If `present` is used in the preferred content, both of them will then
|
|
want to contain it. (Is `present` really needed like shown in the examples
|
|
above?)
|
|
|
|
If it's not, one of them will drop it and the other will
|
|
usually maintain its copy. It would perhaps be possible for both of
|
|
them to drop it, leading to a re-upload cycle. This needs some research
|
|
to see if it's a real problem.
|
|
See [[todo/proving_preferred_content_behavior]].
|
|
|
|
## rebalancing
|
|
|
|
In both the 3 of 5 use case and a split brain situation, it's possible for
|
|
content to end up not optimally balanced between repositories.
|
|
|
|
(There are also situations where a cluster node ends up without a copy
|
|
of a file that is preferred content, or where adding a copy to a node
|
|
would satisfy numcopies. This can happen eg, when a client sends a file
|
|
to a single node rather than to the cluster. Rebalancing also will deal
|
|
with those.)
|
|
|
|
git-annex can be made to operate in a mode where it does additional work
|
|
to rebalance repositories.
|
|
|
|
This can be an option like --rebalance, that changes how the preferred content
|
|
expression is evaluated. The user can choose where and when to run that.
|
|
Eg, it might be run on a node inside a cluster after adding more storage to
|
|
the cluster.
|
|
|
|
In several examples above, we have preferred content expressions in this
|
|
form:
|
|
|
|
(balanced=group:N and not copies=group:N) or present
|
|
|
|
In order to rebalance, that needs to be changed to:
|
|
|
|
balanced=group:N
|
|
|
|
What could be done is make `balanced()` usually expand to the former,
|
|
but when --rebalance is used, it only expands to the latter.
|
|
|
|
(Might make the fully balanced behavior available as `fullybalanced` for
|
|
users who want it, then
|
|
`balanced=group:N == (fullybalanced=group:N and not copies=group:N) or present`
|
|
usually and when --rebalance is used, `balanced=group:N == fullybalanced=group:N)`
|
|
|
|
|