From 61c95f4d29d0f4ebe6763be55185019251eda8d6 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 25 Sep 2024 12:58:26 -0400 Subject: [PATCH] design for simulating clusters w/o simulating cluster gateways --- doc/git-annex-sim.mdwn | 41 +++++++++++++++++++++++++++++++++ doc/todo/git-annex_proxies.mdwn | 33 +++++++++++++++++++++++++- 2 files changed, 73 insertions(+), 1 deletion(-) diff --git a/doc/git-annex-sim.mdwn b/doc/git-annex-sim.mdwn index 09505ee9d4..20e739aba0 100644 --- a/doc/git-annex-sim.mdwn +++ b/doc/git-annex-sim.mdwn @@ -368,6 +368,47 @@ as passed to "git annex sim" while a simulation is running. step 100 rebalance off +* `clusternode name repo` + + Simulate a repository being a node of a cluster, which can be referred to + using the specified name. + + Rather than a cluster gateway being simulated as a separate entity, any + connection to a cluster node with that name is treated as accessing that + repository via the same cluster gateway. + + Since a cluster gateway knows about all changes that are made to nodes + via it, every repository that has a connection to a cluster node will + immediately know about changes that are made via that node, without + needing a simulated git pull. + + To simulate a repository being a node of more than one cluster, or behind + multiple gateways in the same cluster, use this command to give it + multiple names. + + For example: + + init foo + init bar + init node1 + init node2 + clusternode cluster-node1 node1 + clusternode cluster-node2 node2 + group node1 cluster + group node2 cluster + wanted node1 sizebalanced=cluster + wanted node2 sizebalanced=cluster + connect cluster-node2 <- foo -> cluster-node1 + connect cluster-node2 <- bar -> cluster-node1 + addmulti 10 foo 1gb 2gb foo + addmulti 10 bar 1gb 2gb bar + action foo sendwanted cluster-node1 while action foo sendwanted cluster-node2 while action bar sendwanted cluster-node1 while action bar sendwanted cluster-node2 + + In the above example, while foo and bar are both concurrently sending + wanted files to both nodes, each will know immediately which files have + been sent by the other, and so the files will be sizebalanced between + them optimally. + # OPTIONS * The [[git-annex-common-options]](1) can be used. diff --git a/doc/todo/git-annex_proxies.mdwn b/doc/todo/git-annex_proxies.mdwn index 79d669b6dc..fdae563424 100644 --- a/doc/todo/git-annex_proxies.mdwn +++ b/doc/todo/git-annex_proxies.mdwn @@ -92,7 +92,38 @@ Planned schedule of work: clusternode mycluster-foo foo clusternode othercluster-foo foo - + Implementation plan for this: + + * clusternode initializes a new cluster node UUID, and adds to + simRepos. + * add `simClusterNodes :: M.Map UUID (UUID, RemoteName)`, + which maps from the cluster node UUID to the UUID of the underlying + repo, and its node name. + * clusternode also adds to simClusterNodes. + * setPresentKey checks if the UUID is in simClusterNodes. + * If it is, it makes the key present/missing in the underlying repo + UUID as well. + * And, it looks through simConnections to find any other repos that + also have a connection to the cluster node with that name. + Each of those repos also gets its simLocations updated. + + But: The cluster node UUID would need to have the same preferred content + etc as the underlying repo. And, it would need to be in the same groups. + And it would be counted as another copy. Could use a cluster UUID to + avoid the numcopies count. But can adding a separate UUID be avoided? + + Implementation plan for this without separate UUID: + + * add `simClusterNodes :: M.Map RepoName UUID`, + * clusternode adds to simClusterNodes. + * checkKnownRemote needs to check simClusterNodes as well as + simRepos so that cluster nodes can be used as remotes. + * Plumb repo name through to setPresentKey. + * setPresentKey checks if repo name is in simClusterNodes. + * If it is, it looks through simConnections to find any other + repos that also have a connection to the cluster node with + that name. Each of those repos also gets its simLocations updated + for the change being logged. * sim: Add support for metadata, so preferred content that matches on it will work