don't sync with cluster nodes by default
Avoid `git-annex sync --content` etc from operating on cluster nodes by default since syncing with a cluster implicitly syncs with its nodes. This avoids a lot of unncessary work when a cluster has a lot of nodes just in checking if each node's preferred content is satisfied. And it avoids content being sent to nodes individually, so instead syncing with clusters always fanout uploads to nodes. The downside is that there are situations where a cluster's preferred content settings can be met, but those of its nodes are not. Or where a node does not contain a key, but the cluster does, and there are not enough copies of the key yet, so it would be desirable the send it there. I think that's an acceptable tradeoff. These kind of situations are ones where the cluster itself should probably be responsible for copying content to the node. Which it can do much less expensively than a client can. Part of the balanced preferred content design that I will be working on in a couple of months involves rebalancing clusters, so I expect to revisit this. The use of annex-sync config does allow running git-annex sync with a specific node, or nodes, and it will sync with it. And it's also possible to set annex-sync git configs to make it sync with a node by default. (Although that will require setting up an explicit git remote for the node rather than relying on the proxied remote.) Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster due to a cycle. And the Annex.Startup load of clusters happens too late for Remote.Git to use that. This does mean one redundant load of the cluster log, though only when there is a proxy.
This commit is contained in:
parent
b8016eeb65
commit
202ea3ff2a
8 changed files with 152 additions and 93 deletions
|
@ -26,17 +26,8 @@ In development on the `proxy` branch.
|
|||
|
||||
For June's work on [[design/passthrough_proxy]], remaining todos:
|
||||
|
||||
* On upload to cluster, send to nodes where it's preferred content, and not
|
||||
to other nodes.
|
||||
|
||||
* `git-annex sync --content` etc, when operating on clusters, should first
|
||||
operate on the cluster as a whole, to take advantages of fanout on upload
|
||||
and mass drop. Only operate on individual cluster nodes afterwards,
|
||||
to handle cases such as a cluster containing a key, but some node
|
||||
wanting and lacking the key. Perhaps just setting cost for nodes slightly
|
||||
higher than the cluster cost will be enough? Or should it even send a key
|
||||
to a cluster node if the cluster contains the key? Perhaps that is
|
||||
unnecessary work, the cluster should be able to rebalance itself.
|
||||
* On upload to cluster, send to nodes where its preferred content, and not
|
||||
to other nodes. Unless no nodes prefer it, then what?
|
||||
|
||||
* Getting a key from a cluster currently always selects the lowest cost
|
||||
remote, and always the same remote if cost is the same. Should
|
||||
|
@ -116,3 +107,6 @@ For June's work on [[design/passthrough_proxy]], remaining todos:
|
|||
which UUIDs it was dropped from. (done)
|
||||
|
||||
* `git-annex testremote` works against proxied remote and cluster. (done)
|
||||
|
||||
* Avoid `git-annex sync --content` etc from operating on cluster nodes by
|
||||
default since syncing with a cluster implicitly syncs with its nodes. (done)
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue