design for simulating clusters w/o simulating cluster gateways

2024-09-25 12:58:26 -04:00 · 2024-09-25 12:58:26 -04:00 · 61c95f4d29
commit 61c95f4d29
parent b9214d4162
2 changed files with 73 additions and 1 deletions
--- a/doc/git-annex-sim.mdwn
+++ b/doc/git-annex-sim.mdwn
@ -368,6 +368,47 @@ as passed to "git annex sim" while a simulation is running.
    step 100
    rebalance off

+* `clusternode name repo`
+
+  Simulate a repository being a node of a cluster, which can be referred to
+  using the specified name.
+
+  Rather than a cluster gateway being simulated as a separate entity, any
+  connection to a cluster node with that name is treated as accessing that
+  repository via the same cluster gateway.
+
+  Since a cluster gateway knows about all changes that are made to nodes
+  via it, every repository that has a connection to a cluster node will
+  immediately know about changes that are made via that node, without
+  needing a simulated git pull.
+
+  To simulate a repository being a node of more than one cluster, or behind
+  multiple gateways in the same cluster, use this command to give it
+  multiple names.
+
+  For example:
+
+    init foo
+    init bar
+    init node1
+    init node2
+    clusternode cluster-node1 node1
+    clusternode cluster-node2 node2
+    group node1 cluster
+    group node2 cluster
+    wanted node1 sizebalanced=cluster
+    wanted node2 sizebalanced=cluster
+    connect cluster-node2 <- foo -> cluster-node1
+    connect cluster-node2 <- bar -> cluster-node1
+    addmulti 10 foo 1gb 2gb foo 
+    addmulti 10 bar 1gb 2gb bar
+    action foo sendwanted cluster-node1 while action foo sendwanted cluster-node2 while action bar sendwanted cluster-node1 while action bar sendwanted cluster-node2
+
+  In the above example, while foo and bar are both concurrently sending
+  wanted files to both nodes, each will know immediately which files have
+  been sent by the other, and so the files will be sizebalanced between
+  them optimally.
+
 # OPTIONS

 * The [[git-annex-common-options]](1) can be used.
--- a/doc/todo/git-annex_proxies.mdwn
+++ b/doc/todo/git-annex_proxies.mdwn
@ -92,7 +92,38 @@ Planned schedule of work:
    clusternode mycluster-foo foo
    clusternode othercluster-foo foo

+  Implementation plan for this:

+  * clusternode initializes a new cluster node UUID, and adds to
+    simRepos.
+  * add `simClusterNodes :: M.Map UUID (UUID, RemoteName)`,
+    which maps from the cluster node UUID to the UUID of the underlying
+    repo, and its node name.
+  * clusternode also adds to simClusterNodes.
+  * setPresentKey checks if the UUID is in simClusterNodes.
+  * If it is, it makes the key present/missing in the underlying repo
+    UUID as well.
+  * And, it looks through simConnections to find any other repos that
+    also have a connection to the cluster node with that name.
+    Each of those repos also gets its simLocations updated.
+
+  But: The cluster node UUID would need to have the same preferred content
+  etc as the underlying repo. And, it would need to be in the same groups.
+  And it would be counted as another copy. Could use a cluster UUID to
+  avoid the numcopies count. But can adding a separate UUID be avoided?
+
+  Implementation plan for this without separate UUID:
+
+  * add `simClusterNodes :: M.Map RepoName UUID`,
+  * clusternode adds to simClusterNodes.
+  * checkKnownRemote needs to check simClusterNodes as well as
+    simRepos so that cluster nodes can be used as remotes.
+  * Plumb repo name through to setPresentKey.
+  * setPresentKey checks if repo name is in simClusterNodes.
+  * If it is, it looks through simConnections to find any other
+    repos that also have a connection to the cluster node with
+    that name. Each of those repos also gets its simLocations updated
+    for the change being logged.

 * sim: Add support for metadata, so preferred content that matches on it
  will work