From 555d7e52d3b0b64a56d7dca77914adb5c54ab01d Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 12 Jun 2024 17:30:55 -0400 Subject: [PATCH] more thoughts on clusters --- doc/design/passthrough_proxy.mdwn | 22 +++++++++++++++++++++- doc/todo/git-annex_proxies.mdwn | 4 ++-- 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/doc/design/passthrough_proxy.mdwn b/doc/design/passthrough_proxy.mdwn index 2478bca9b5..aa87067920 100644 --- a/doc/design/passthrough_proxy.mdwn +++ b/doc/design/passthrough_proxy.mdwn @@ -200,7 +200,7 @@ that they exist, or perhaps what keys are stored on which nodes. In the cluster case, the user would like to not need to pick a specific node to send content to. While they could use preferred content to pick a node, or nodes, they would prefer to be able to say `git-annex copy --to cluster` -and let it pick which proxied remote(s) to send to. And similarly, +and let it pick which nodes to send to. And similarly, `git-annex drop --from cluster' should drop the content from every node in the cluster. @@ -230,6 +230,26 @@ found one does have the content. Lockcontent to a cluster would lock the content on one (or more?) nodes. +Problem: The location log for a key that is stored in one node of a cluster +will show 2 copies: The UUID of the node and the UUID of the cluster. This +would cause wrong behavior when numcopies is checked. And if a cluster node +has the cluster as a remote, and another node as a remote, this might +extend to lockcontent of both succeeding and satisfying numcopies of 2, +allowing the node to drop content, and resulting in violating numcopies. + +That could be solved by publishing a list of the UUIDs of nodes of a +cluster. When loading a location log, we are either inside the cluster or +outside the cluster. If outside the cluster, filter out the UUIDs of its +nodes. If inside the cluster, filter out the cluster's UUID. + +Doing that would mean that a key that is stored in several nodes +of a cluster will appear to have only 1 copy from outside the cluster. +Now suppose that a node of the cluster has a remote, and numcopies = 2. +The node would be able to drop a key from the remote when it and another +node contain the key. But then from outside the cluster, it would appear as +if numcopies was violated, with only the 1 copy in the cluster. +(See also [[todo/repositories_that_count_as_more_than_one_copy]]) + ## speed A passthrough proxy should be as fast as possible so as not to add overhead diff --git a/doc/todo/git-annex_proxies.mdwn b/doc/todo/git-annex_proxies.mdwn index c79219a62a..b63fc865ae 100644 --- a/doc/todo/git-annex_proxies.mdwn +++ b/doc/todo/git-annex_proxies.mdwn @@ -44,10 +44,10 @@ For June's work on [[design/passthrough_proxy]], implementation plan: * Consider getting instantiated remotes into git remote list. See design. -* Implement clusters. - * Implement single upload with fanout to proxied remotes. +* Implement clusters. + * Support proxies-of-proxies better, eg foo-bar-baz. Currently, it does work, but have to run `git-annex updateproxy` on foo in order for it to notice the bar-baz proxied remote exists,