give proxied cluster nodes a higher cost than the cluster gateway

This makes eg git-annex get default to using the cluster rather than an
arbitrary node, which is better UI.

The actual cost of accessing a proxied node vs using the cluster is
basically the same. But using the cluster allows smarter load-balancing
to be done on the cluster.
This commit is contained in:
Joey Hess 2024-06-27 15:21:03 -04:00
parent cf59d7f92c
commit 20ef1262df
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 26 additions and 10 deletions

View file

@ -176,6 +176,7 @@ configRead autoinit r = do
Just r' -> return r'
_ -> return r
gen :: Git.Repo -> UUID -> RemoteConfig -> RemoteGitConfig -> RemoteStateHandle -> Annex (Maybe Remote)
gen r u rc gc rs
-- Remote.GitLFS may be used with a repo that is also encrypted
@ -186,10 +187,9 @@ gen r u rc gc rs
Nothing -> do
st <- mkState r u gc
c <- parsedRemoteConfig remote rc
go st c <$> remoteCost gc c defcst
go st c <$> remoteCost gc c (defaultRepoCost r)
Just addr -> Remote.P2P.chainGen addr r u rc gc rs
where
defcst = if repoCheap r then cheapRemoteCost else expensiveRemoteCost
go st c cst = Just new
where
new = Remote
@ -229,6 +229,11 @@ gen r u rc gc rs
, remoteStateHandle = rs
}
defaultRepoCost :: Git.Repo -> Cost
defaultRepoCost r
| repoCheap r = cheapRemoteCost
| otherwise = expensiveRemoteCost
unavailable :: Git.Repo -> UUID -> RemoteConfig -> RemoteGitConfig -> RemoteStateHandle -> Annex (Maybe Remote)
unavailable r = gen r'
where
@ -854,12 +859,17 @@ listProxied proxies rs = concat <$> mapM go rs
-- that cluster does not need to be synced with
-- by default, because syncing with the cluster will
-- effectively sync with all of its nodes.
--
-- Also, give it a slightly higher cost than the
-- cluster by default, to encourage using the cluster.
adjustclusternode clusters =
case M.lookup (ClusterNodeUUID (proxyRemoteUUID p)) (clusterNodeUUIDs clusters) of
Just cs
| any (\c -> S.member (fromClusterUUID c) proxieduuids) (S.toList cs) ->
addremoteannexfield SyncField
[Git.ConfigValue $ Git.Config.boolConfig' False]
. addremoteannexfield CostField
[Git.ConfigValue $ encodeBS $ show $ defaultRepoCost r + 0.1]
_ -> id
proxieduuids = S.map proxyRemoteUUID proxied

View file

@ -41,14 +41,17 @@ For June's work on [[design/passthrough_proxy]], remaining todos:
eg prefer to avoid using remotes that are doing other transfers at the
same time.
* The cost of a cluster and of its proxied nodes is currently all the same.
It would make sense for proxied nodes that are accessed via an intermedia
gateway to have a higher cost than proxied nodes that are accessed via
the remote gateway. And proxied nodes should generally have a higher cost
than the cluster, so that git-annex defaults to using the cluster.
(The cost of accessing a proxied node vs using the cluster is the same,
but using the cluster allows smarter load-balancing to be done on the
cluster. It also makes the UI not mention individual nodes.)
* The cost of a proxied node that is accessed via an intermediate gateway
is currently the same as a node accessed via the cluster gateway.
To fix this, there needs to be some way to tell how many hops through
gateways it takes to get to a node. Currently the only way is to
guess based on number of dashes in the node name, which is not satisfying.
Even counting hops is not very satisfying, one cluster gateway could
be much more expensive to traverse than another one.
If seriously tackling this, it might be worth making enough information
available to use spanning tree protocol for routing inside clusters.
* Optimise proxy speed. See design for ideas.
@ -117,3 +120,6 @@ For June's work on [[design/passthrough_proxy]], remaining todos:
protocol messages on to any remotes that have the same UUID as
the cluster. Needs extension to P2P protocol to avoid cycles.
(done)
* Proxied cluster nodes should have slightly higher cost than the cluster
gateway. (done)