document various multi-gateway cluster considerations
Perhaps this will avoid me needing to eg, implement spanning tree protocol. ;-)
This commit is contained in:
parent
8e322f76bc
commit
87a7eeac33
2 changed files with 18 additions and 12 deletions
|
@ -192,13 +192,27 @@ Notice that remotes for cluster nodes have names indicating the path through
|
||||||
the cluster used to access them. For example, "AMS-NYC-node3" is accessed via
|
the cluster used to access them. For example, "AMS-NYC-node3" is accessed via
|
||||||
the AMS gateway, which then relays to NYC where node3 is located.
|
the AMS gateway, which then relays to NYC where node3 is located.
|
||||||
|
|
||||||
## cluster topologies
|
## considerations for multi-gateway clusters
|
||||||
|
|
||||||
|
When a cluster has multiple gateways, nothing keeps the git repositories on
|
||||||
|
the gateways in sync. A branch pushed to one gateway will not be able to
|
||||||
|
be pulled from another one. And gateways only learn about the locations of
|
||||||
|
keys that are uploaded to the cluster via them. So in the example above,
|
||||||
|
after an upload to AMS-mycluster, NYC-mycluster will only know that the
|
||||||
|
key is stored in its nodes, but won't know that it's stored in nodes
|
||||||
|
behind AMS. So, it's best to have a single git repository that is synced
|
||||||
|
with, or perhaps run [[git-annex-remotedaemon]] on each gateway to keep
|
||||||
|
its git repository in sync with the other gateways.
|
||||||
|
|
||||||
Clusters can be constructed with any number of gateways, and any internal
|
Clusters can be constructed with any number of gateways, and any internal
|
||||||
topology of connections between gateways.
|
topology of connections between gateways. But there must always be a path
|
||||||
|
from any gateway to all nodes of the cluster, otherwise a key won't
|
||||||
There must always be a path from any gateway to all nodes of the cluster.
|
be able to be stored from, or retrieved from some nodes.
|
||||||
|
|
||||||
It's best to avoid there being multiple paths to a node that go via
|
It's best to avoid there being multiple paths to a node that go via
|
||||||
different gateways, since all paths will be tried in parallel when eg,
|
different gateways, since all paths will be tried in parallel when eg,
|
||||||
uploading a key to the cluster.
|
uploading a key to the cluster.
|
||||||
|
|
||||||
|
A breakdown in communication between gateways will temporarily split the
|
||||||
|
cluster. When communication resumes, some keys may need to be copied to
|
||||||
|
additional nodes.
|
||||||
|
|
|
@ -38,14 +38,6 @@ For June's work on [[design/passthrough_proxy]], remaining todos:
|
||||||
round-robin amoung remotes, and prefer to avoid using remotes that
|
round-robin amoung remotes, and prefer to avoid using remotes that
|
||||||
other git-annex processes are currently using.
|
other git-annex processes are currently using.
|
||||||
|
|
||||||
* When a cluster has multiple gateways, and a key is uploaded via one
|
|
||||||
gateway, that gateway learns about every node where the key is stored.
|
|
||||||
But other gateways do not, they only learn about nodes reached via them
|
|
||||||
where the key is stored. This means that another user, syncing with
|
|
||||||
the other gateway, won't know how many copies exist, or necessarily
|
|
||||||
that the key is in the cluster at all. Should gateways broadcast
|
|
||||||
location change messages to other gateways?
|
|
||||||
|
|
||||||
* Optimise proxy speed. See design for ideas.
|
* Optimise proxy speed. See design for ideas.
|
||||||
|
|
||||||
* Use `sendfile()` to avoid data copying overhead when
|
* Use `sendfile()` to avoid data copying overhead when
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue