give proxied cluster nodes a higher cost than the cluster gateway

This makes eg git-annex get default to using the cluster rather than an
arbitrary node, which is better UI.

The actual cost of accessing a proxied node vs using the cluster is
basically the same. But using the cluster allows smarter load-balancing
to be done on the cluster.
This commit is contained in:
Joey Hess 2024-06-27 15:21:03 -04:00
parent cf59d7f92c
commit 20ef1262df
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 26 additions and 10 deletions

View file

@ -176,6 +176,7 @@ configRead autoinit r = do
Just r' -> return r' Just r' -> return r'
_ -> return r _ -> return r
gen :: Git.Repo -> UUID -> RemoteConfig -> RemoteGitConfig -> RemoteStateHandle -> Annex (Maybe Remote) gen :: Git.Repo -> UUID -> RemoteConfig -> RemoteGitConfig -> RemoteStateHandle -> Annex (Maybe Remote)
gen r u rc gc rs gen r u rc gc rs
-- Remote.GitLFS may be used with a repo that is also encrypted -- Remote.GitLFS may be used with a repo that is also encrypted
@ -186,10 +187,9 @@ gen r u rc gc rs
Nothing -> do Nothing -> do
st <- mkState r u gc st <- mkState r u gc
c <- parsedRemoteConfig remote rc c <- parsedRemoteConfig remote rc
go st c <$> remoteCost gc c defcst go st c <$> remoteCost gc c (defaultRepoCost r)
Just addr -> Remote.P2P.chainGen addr r u rc gc rs Just addr -> Remote.P2P.chainGen addr r u rc gc rs
where where
defcst = if repoCheap r then cheapRemoteCost else expensiveRemoteCost
go st c cst = Just new go st c cst = Just new
where where
new = Remote new = Remote
@ -229,6 +229,11 @@ gen r u rc gc rs
, remoteStateHandle = rs , remoteStateHandle = rs
} }
defaultRepoCost :: Git.Repo -> Cost
defaultRepoCost r
| repoCheap r = cheapRemoteCost
| otherwise = expensiveRemoteCost
unavailable :: Git.Repo -> UUID -> RemoteConfig -> RemoteGitConfig -> RemoteStateHandle -> Annex (Maybe Remote) unavailable :: Git.Repo -> UUID -> RemoteConfig -> RemoteGitConfig -> RemoteStateHandle -> Annex (Maybe Remote)
unavailable r = gen r' unavailable r = gen r'
where where
@ -854,12 +859,17 @@ listProxied proxies rs = concat <$> mapM go rs
-- that cluster does not need to be synced with -- that cluster does not need to be synced with
-- by default, because syncing with the cluster will -- by default, because syncing with the cluster will
-- effectively sync with all of its nodes. -- effectively sync with all of its nodes.
--
-- Also, give it a slightly higher cost than the
-- cluster by default, to encourage using the cluster.
adjustclusternode clusters = adjustclusternode clusters =
case M.lookup (ClusterNodeUUID (proxyRemoteUUID p)) (clusterNodeUUIDs clusters) of case M.lookup (ClusterNodeUUID (proxyRemoteUUID p)) (clusterNodeUUIDs clusters) of
Just cs Just cs
| any (\c -> S.member (fromClusterUUID c) proxieduuids) (S.toList cs) -> | any (\c -> S.member (fromClusterUUID c) proxieduuids) (S.toList cs) ->
addremoteannexfield SyncField addremoteannexfield SyncField
[Git.ConfigValue $ Git.Config.boolConfig' False] [Git.ConfigValue $ Git.Config.boolConfig' False]
. addremoteannexfield CostField
[Git.ConfigValue $ encodeBS $ show $ defaultRepoCost r + 0.1]
_ -> id _ -> id
proxieduuids = S.map proxyRemoteUUID proxied proxieduuids = S.map proxyRemoteUUID proxied

View file

@ -41,14 +41,17 @@ For June's work on [[design/passthrough_proxy]], remaining todos:
eg prefer to avoid using remotes that are doing other transfers at the eg prefer to avoid using remotes that are doing other transfers at the
same time. same time.
* The cost of a cluster and of its proxied nodes is currently all the same. * The cost of a proxied node that is accessed via an intermediate gateway
It would make sense for proxied nodes that are accessed via an intermedia is currently the same as a node accessed via the cluster gateway.
gateway to have a higher cost than proxied nodes that are accessed via To fix this, there needs to be some way to tell how many hops through
the remote gateway. And proxied nodes should generally have a higher cost gateways it takes to get to a node. Currently the only way is to
than the cluster, so that git-annex defaults to using the cluster. guess based on number of dashes in the node name, which is not satisfying.
(The cost of accessing a proxied node vs using the cluster is the same,
but using the cluster allows smarter load-balancing to be done on the Even counting hops is not very satisfying, one cluster gateway could
cluster. It also makes the UI not mention individual nodes.) be much more expensive to traverse than another one.
If seriously tackling this, it might be worth making enough information
available to use spanning tree protocol for routing inside clusters.
* Optimise proxy speed. See design for ideas. * Optimise proxy speed. See design for ideas.
@ -117,3 +120,6 @@ For June's work on [[design/passthrough_proxy]], remaining todos:
protocol messages on to any remotes that have the same UUID as protocol messages on to any remotes that have the same UUID as
the cluster. Needs extension to P2P protocol to avoid cycles. the cluster. Needs extension to P2P protocol to avoid cycles.
(done) (done)
* Proxied cluster nodes should have slightly higher cost than the cluster
gateway. (done)