more thoughts on clusters

This commit is contained in:
Joey Hess 2024-06-12 17:30:55 -04:00
parent 0ebb107974
commit 555d7e52d3
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 23 additions and 3 deletions

View file

@ -200,7 +200,7 @@ that they exist, or perhaps what keys are stored on which nodes.
In the cluster case, the user would like to not need to pick a specific
node to send content to. While they could use preferred content to pick a
node, or nodes, they would prefer to be able to say `git-annex copy --to cluster`
and let it pick which proxied remote(s) to send to. And similarly,
and let it pick which nodes to send to. And similarly,
`git-annex drop --from cluster' should drop the content from every node in
the cluster.
@ -230,6 +230,26 @@ found one does have the content.
Lockcontent to a cluster would lock the content on one (or more?) nodes.
Problem: The location log for a key that is stored in one node of a cluster
will show 2 copies: The UUID of the node and the UUID of the cluster. This
would cause wrong behavior when numcopies is checked. And if a cluster node
has the cluster as a remote, and another node as a remote, this might
extend to lockcontent of both succeeding and satisfying numcopies of 2,
allowing the node to drop content, and resulting in violating numcopies.
That could be solved by publishing a list of the UUIDs of nodes of a
cluster. When loading a location log, we are either inside the cluster or
outside the cluster. If outside the cluster, filter out the UUIDs of its
nodes. If inside the cluster, filter out the cluster's UUID.
Doing that would mean that a key that is stored in several nodes
of a cluster will appear to have only 1 copy from outside the cluster.
Now suppose that a node of the cluster has a remote, and numcopies = 2.
The node would be able to drop a key from the remote when it and another
node contain the key. But then from outside the cluster, it would appear as
if numcopies was violated, with only the 1 copy in the cluster.
(See also [[todo/repositories_that_count_as_more_than_one_copy]])
## speed
A passthrough proxy should be as fast as possible so as not to add overhead

View file

@ -44,10 +44,10 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
* Consider getting instantiated remotes into git remote list.
See design.
* Implement clusters.
* Implement single upload with fanout to proxied remotes.
* Implement clusters.
* Support proxies-of-proxies better, eg foo-bar-baz.
Currently, it does work, but have to run `git-annex updateproxy`
on foo in order for it to notice the bar-baz proxied remote exists,