git-annex/Logs/Cluster.hs
Joey Hess 202ea3ff2a
don't sync with cluster nodes by default
Avoid `git-annex sync --content` etc from operating on cluster nodes by default
since syncing with a cluster implicitly syncs with its nodes. This avoids a
lot of unncessary work when a cluster has a lot of nodes just in checking
if each node's preferred content is satisfied. And it avoids content
being sent to nodes individually, so instead syncing with clusters always
fanout uploads to nodes.

The downside is that there are situations where a cluster's preferred content
settings can be met, but those of its nodes are not. Or where a node does not
contain a key, but the cluster does, and there are not enough copies of the key
yet, so it would be desirable the send it there. I think that's an acceptable
tradeoff. These kind of situations are ones where the cluster itself should
probably be responsible for copying content to the node. Which it can do much
less expensively than a client can. Part of the balanced preferred content
design that I will be working on in a couple of months involves rebalancing
clusters, so I expect to revisit this.

The use of annex-sync config does allow running git-annex sync with a specific
node, or nodes, and it will sync with it. And it's also possible to set
annex-sync git configs to make it sync with a node by default. (Although that
will require setting up an explicit git remote for the node rather than relying
on the proxied remote.)

Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster
due to a cycle. And the Annex.Startup load of clusters happens
too late for Remote.Git to use that. This does mean one redundant load
of the cluster log, though only when there is a proxy.
2024-06-25 10:24:38 -04:00

41 lines
1 KiB
Haskell

{- git-annex cluster log
-
- Copyright 2024 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
{-# LANGUAGE OverloadedStrings, TupleSections #-}
module Logs.Cluster (
module Types.Cluster,
getClusters,
loadClusters,
recordCluster,
) where
import qualified Annex
import Annex.Common
import Types.Cluster
import Logs.Cluster.Basic
import Logs.Trust
import qualified Data.Map as M
import qualified Data.Set as S
getClusters :: Annex Clusters
getClusters = maybe loadClusters return =<< Annex.getState Annex.clusters
{- Loads the clusters and caches it for later.
-
- This takes care of removing dead nodes from clusters,
- to avoid inserting the cluster uuid into the location
- log when only dead nodes contain the content of a key.
-}
loadClusters :: Annex Clusters
loadClusters = do
dead <- (S.fromList . map ClusterNodeUUID)
<$> trustGet DeadTrusted
clusters <- getClustersWith (M.map (`S.difference` dead))
Annex.changeState $ \s -> s { Annex.clusters = Just clusters }
return clusters