design for simulating clusters w/o simulating cluster gateways

This commit is contained in:
Joey Hess 2024-09-25 12:58:26 -04:00
parent b9214d4162
commit 61c95f4d29
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 73 additions and 1 deletions

View file

@ -368,6 +368,47 @@ as passed to "git annex sim" while a simulation is running.
step 100
rebalance off
* `clusternode name repo`
Simulate a repository being a node of a cluster, which can be referred to
using the specified name.
Rather than a cluster gateway being simulated as a separate entity, any
connection to a cluster node with that name is treated as accessing that
repository via the same cluster gateway.
Since a cluster gateway knows about all changes that are made to nodes
via it, every repository that has a connection to a cluster node will
immediately know about changes that are made via that node, without
needing a simulated git pull.
To simulate a repository being a node of more than one cluster, or behind
multiple gateways in the same cluster, use this command to give it
multiple names.
For example:
init foo
init bar
init node1
init node2
clusternode cluster-node1 node1
clusternode cluster-node2 node2
group node1 cluster
group node2 cluster
wanted node1 sizebalanced=cluster
wanted node2 sizebalanced=cluster
connect cluster-node2 <- foo -> cluster-node1
connect cluster-node2 <- bar -> cluster-node1
addmulti 10 foo 1gb 2gb foo
addmulti 10 bar 1gb 2gb bar
action foo sendwanted cluster-node1 while action foo sendwanted cluster-node2 while action bar sendwanted cluster-node1 while action bar sendwanted cluster-node2
In the above example, while foo and bar are both concurrently sending
wanted files to both nodes, each will know immediately which files have
been sent by the other, and so the files will be sizebalanced between
them optimally.
# OPTIONS
* The [[git-annex-common-options]](1) can be used.

View file

@ -92,7 +92,38 @@ Planned schedule of work:
clusternode mycluster-foo foo
clusternode othercluster-foo foo
Implementation plan for this:
* clusternode initializes a new cluster node UUID, and adds to
simRepos.
* add `simClusterNodes :: M.Map UUID (UUID, RemoteName)`,
which maps from the cluster node UUID to the UUID of the underlying
repo, and its node name.
* clusternode also adds to simClusterNodes.
* setPresentKey checks if the UUID is in simClusterNodes.
* If it is, it makes the key present/missing in the underlying repo
UUID as well.
* And, it looks through simConnections to find any other repos that
also have a connection to the cluster node with that name.
Each of those repos also gets its simLocations updated.
But: The cluster node UUID would need to have the same preferred content
etc as the underlying repo. And, it would need to be in the same groups.
And it would be counted as another copy. Could use a cluster UUID to
avoid the numcopies count. But can adding a separate UUID be avoided?
Implementation plan for this without separate UUID:
* add `simClusterNodes :: M.Map RepoName UUID`,
* clusternode adds to simClusterNodes.
* checkKnownRemote needs to check simClusterNodes as well as
simRepos so that cluster nodes can be used as remotes.
* Plumb repo name through to setPresentKey.
* setPresentKey checks if repo name is in simClusterNodes.
* If it is, it looks through simConnections to find any other
repos that also have a connection to the cluster node with
that name. Each of those repos also gets its simLocations updated
for the change being logged.
* sim: Add support for metadata, so preferred content that matches on it
will work