document using balanced preferred content in a cluster

This commit is contained in:
Joey Hess 2024-08-30 11:08:32 -04:00
parent d0938d730b
commit 54b6151412
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -63,31 +63,6 @@ clusters.
A cluster is not a git repository, and so `git pull bigserver-mycluster` A cluster is not a git repository, and so `git pull bigserver-mycluster`
will not work. will not work.
## preferred content of clusters
The preferred content of the cluster can be configured. This tells
users what files the cluster as a whole should contain.
To configure the preferred content of a cluster, as well as other related
things like [[groups|git-annex-group]] and [[required_content]], it's easiest
to do the configuration in a repository that has the cluster as a remote.
For example:
$ git-annex wanted bigserver-mycluster standard
$ git-annex group bigserver-mycluster archive
By default, when a file is uploaded to a cluster, it is stored on every node of
the cluster. To control which nodes to store to, the [[preferred_content]] of
each individual node can be configured.
It's also a good idea to configure the preferred content of the cluster's
gateway. To avoid files redundantly being stored on the gateway
(which remember, is not a node of the cluster), you might make it not want
any files:
$ git-annex wanted bigserver nothing
## setting up a cluster ## setting up a cluster
A new cluster first needs to be initialized. Run [[git-annex-initcluster]] in A new cluster first needs to be initialized. Run [[git-annex-initcluster]] in
@ -131,6 +106,41 @@ on more than one at a time will likely be faster.
$ git config annex.jobs cpus $ git config annex.jobs cpus
## preferred content of clusters
The preferred content of the cluster can be configured. This tells
users what files the cluster as a whole should contain.
To configure the preferred content of a cluster, as well as other related
things like [[groups|git-annex-group]] and [[required_content]], it's easiest
to do the configuration in a repository that has the cluster as a remote.
For example:
$ git-annex wanted bigserver-mycluster standard
$ git-annex group bigserver-mycluster archive
By default, when a file is uploaded to a cluster, it is stored on every node
of the cluster. To control which nodes to store to, the [[preferred_content]]
of each individual node can be configured.
For example, to balance content evenly across nodes:
$ git-annex groupwanted bigserver-node balanced=bigserver-node
$ git-annex group bigserver-node1 bigserver-node
$ git-annex group bigserver-node2 bigserver-node
$ git-annex group bigserver-node3 bigserver-node
$ git-annex wanted bigserver-node1 groupwanted
$ git-annex wanted bigserver-node2 groupwanted
$ git-annex wanted bigserver-node3 groupwanted
It's also a good idea to configure the preferred content of the cluster's
gateway. To avoid files redundantly being stored on the gateway
(which remember, is not a node of the cluster), you might make it not want
any files:
$ git-annex wanted bigserver nothing
## special remotes as cluster nodes ## special remotes as cluster nodes
Cluster nodes don't have to be regular git remotes. They can Cluster nodes don't have to be regular git remotes. They can
@ -138,7 +148,7 @@ also be special remotes.
Even special remotes with `exporttree=yes` can be Even special remotes with `exporttree=yes` can be
used as cluster nodes. Those also need to be configured with used as cluster nodes. Those also need to be configured with
`annexobjects=yes` though. And, will also need to configure `annexobjects=yes` though. And, you will also need to configure
`remote.name.annex-tracking-branch` to the branch that will `remote.name.annex-tracking-branch` to the branch that will
trigger an update of the exported tree when it is pushed to the trigger an update of the exported tree when it is pushed to the
cluster gateway. cluster gateway.