clusters documentation
This commit is contained in:
parent
032d3902d8
commit
ff5fe4e759
4 changed files with 106 additions and 28 deletions
98
doc/clusters.mdwn
Normal file
98
doc/clusters.mdwn
Normal file
|
@ -0,0 +1,98 @@
|
||||||
|
A git-annex repository can provide access to its remotes as nodes of a
|
||||||
|
cluster. This allows other repositories to access the cluster as a single
|
||||||
|
logical repository.
|
||||||
|
|
||||||
|
[[!toc ]]
|
||||||
|
|
||||||
|
## using a cluster
|
||||||
|
|
||||||
|
For example, a remote "bigserver" that is configured as a cluster will
|
||||||
|
make available an additional remote "bigserver-mycluster", as well as some
|
||||||
|
remotes for each node eg "bigserver-node1", "bigserver-node2", etc.
|
||||||
|
|
||||||
|
The user can get files from the cluster without caring which node it comes
|
||||||
|
from:
|
||||||
|
|
||||||
|
$ git-annex get foo --from bigserver-mycluster
|
||||||
|
copy foo (from bigserver-mycluster...) ok
|
||||||
|
|
||||||
|
And the user can send files to the cluster, without caring what nodes
|
||||||
|
they are stored to:
|
||||||
|
|
||||||
|
$ git-annex move bar --to bigserver-mycluster
|
||||||
|
move bar (to bigserver-mycluster...) ok
|
||||||
|
|
||||||
|
In fact, a single upload can be sent to every node of the cluster at once.
|
||||||
|
|
||||||
|
$ git-annex whereis bar
|
||||||
|
whereis bar (3 copies)
|
||||||
|
acae2ff6-6c1e-8bec-b8b9-397a3755f397 -- my cluster [bigserver-mycluster]
|
||||||
|
9f514001-6dc0-4d83-9af3-c64c96626892 -- node 1 [bigserver-node1]
|
||||||
|
d81e0b28-612e-4d73-a4e6-6dabbb03aba1 -- node 2 [bigserver-node2]
|
||||||
|
5657baca-2f11-11ef-ae1a-5b68c6321dd9 -- node 3 [bigserver-node3]
|
||||||
|
|
||||||
|
Notice that the file is shown as present in the cluster, as well as on
|
||||||
|
individual nodes. But the cluster itself does not count as a copy of the file,
|
||||||
|
so the 3 copies are the copies on individual nodes.
|
||||||
|
|
||||||
|
Most other git-annex commands that operate on repositories can also operate on
|
||||||
|
clusters.
|
||||||
|
|
||||||
|
Clusters can only be accessed via ssh.
|
||||||
|
|
||||||
|
## configuring a cluster
|
||||||
|
|
||||||
|
A new cluster first needs to be initialized. Run [[git-annex-initcluster]] in
|
||||||
|
the repository that will serve the cluster to clients. In the example above,
|
||||||
|
this was the "bigserver" repository.
|
||||||
|
|
||||||
|
$ git-annex initcluster mycluster
|
||||||
|
|
||||||
|
Once a cluster is initialized, the next step is to add nodes to it.
|
||||||
|
To make a remote be a node of the cluster, configure
|
||||||
|
`git config remote.name.annex-cluster-node`, setting it to the
|
||||||
|
name of the cluster.
|
||||||
|
|
||||||
|
In the example above, the three cluster nodes were configured like this:
|
||||||
|
|
||||||
|
$ git remote add node1 /media/disk1/repo
|
||||||
|
$ git remote add node2 /media/disk2/repo
|
||||||
|
$ git remote add node3 /media/disk3/repo
|
||||||
|
$ git config remote.node1.annex-cluster-node true
|
||||||
|
$ git config remote.node2.annex-cluster-node true
|
||||||
|
$ git config remote.node3.annex-cluster-node true
|
||||||
|
|
||||||
|
Finally, run `git-annex updatecluster` to record the cluster configuration
|
||||||
|
in the git-annex branch. That tells other repositories about the cluster.
|
||||||
|
|
||||||
|
$ git-annex updatecluster mycluster
|
||||||
|
Added node node1 to cluster: mycluster
|
||||||
|
Added node node2 to cluster: mycluster
|
||||||
|
Added node node3 to cluster: mycluster
|
||||||
|
Started proxying for node1
|
||||||
|
Started proxying for node2
|
||||||
|
Started proxying for node3
|
||||||
|
|
||||||
|
## preferred content of clusters
|
||||||
|
|
||||||
|
The preferred content of the cluster can be configured. This tells
|
||||||
|
users what files the cluster as a whole should contain.
|
||||||
|
|
||||||
|
To configure the preferred content of a cluster, as well as other related
|
||||||
|
things like [[groups|git-annex-group]] and [[required_content]], it's easiest
|
||||||
|
to do the configuration in a repository that has the cluster as a remote.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
git-annex wanted bigserver-mycluster standard
|
||||||
|
git-annex group bigserver-mycluster archive
|
||||||
|
|
||||||
|
By default, when a file is uploaded to a cluster, it is stored on every node of
|
||||||
|
the cluster. To control which nodes to store to, the [[preferred_content]] of
|
||||||
|
each node can be configured.
|
||||||
|
|
||||||
|
If the preferred content configuration of nodes make none of them
|
||||||
|
want a copy of a file, the upload to the cluster will fail. That is done to
|
||||||
|
avoid git-annex picking an arbitrary node. But, the user can bypass the
|
||||||
|
cluster and send content to any individual node, even if it's not preferred
|
||||||
|
content of that node.
|
|
@ -8,37 +8,11 @@ git-annex initcluster name [description]
|
||||||
|
|
||||||
# DESCRIPTION
|
# DESCRIPTION
|
||||||
|
|
||||||
A git-annex repository can provide access to its remotes as a unified
|
|
||||||
cluster. This allows other repositories to access the cluster as a remote,
|
|
||||||
with uploads and downloads distributed amoung the nodes of the cluster,
|
|
||||||
according to their preferred content settings.
|
|
||||||
|
|
||||||
This command initializes a new cluster with the specified name. If no
|
This command initializes a new cluster with the specified name. If no
|
||||||
description is provided, one will be set automatically.
|
description is provided, one will be set automatically.
|
||||||
|
|
||||||
Once a cluster is initialized, the next step is to add nodes to it.
|
The next step after running this command is to configure
|
||||||
To make a remote be a node of the cluster, configure
|
the cluster, then run [[git-annex-updatecluster]].
|
||||||
`git config remote.name.annex-cluster-node`, setting it to the
|
|
||||||
name of the cluster.
|
|
||||||
|
|
||||||
Finally, run `git-annex updatecluster` to record the cluster configuration
|
|
||||||
in the git-annex branch. That tells other repositories about the cluster.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
|
|
||||||
git-annex initcluster mycluster
|
|
||||||
git config remote.foo.annex-cluster-node mycluster
|
|
||||||
git config remote.bar.annex-cluster-node mycluster
|
|
||||||
git config remote.baz.annex-cluster-node mycluster
|
|
||||||
git-annex updatecluster
|
|
||||||
|
|
||||||
Suppose, for example, that remote "bigserver" has had those commands run in
|
|
||||||
it. Then after pulling from "bigserver", git-annex will know about an
|
|
||||||
additional remote, "bigserver-mycluster", which can be used like any other
|
|
||||||
remote but is an interface to the cluster as a whole. The individual cluster
|
|
||||||
nodes will also be proxied as remotes, eg "bigserver-foo".
|
|
||||||
|
|
||||||
Clusters can only be accessed via ssh.
|
|
||||||
|
|
||||||
# OPTIONS
|
# OPTIONS
|
||||||
|
|
||||||
|
@ -51,6 +25,8 @@ Clusters can only be accessed via ssh.
|
||||||
[[git-annex-preferred-content]](1)
|
[[git-annex-preferred-content]](1)
|
||||||
[[git-annex-updateproxy]](1)
|
[[git-annex-updateproxy]](1)
|
||||||
|
|
||||||
|
<https://git-annex.branchable.com/clusters/>
|
||||||
|
|
||||||
# AUTHOR
|
# AUTHOR
|
||||||
|
|
||||||
Joey Hess <id@joeyh.name>
|
Joey Hess <id@joeyh.name>
|
||||||
|
|
|
@ -29,6 +29,8 @@ and run this command.
|
||||||
[[git-annex-initcluster]](1)
|
[[git-annex-initcluster]](1)
|
||||||
[[git-annex-updateproxy]](1)
|
[[git-annex-updateproxy]](1)
|
||||||
|
|
||||||
|
<https://git-annex.branchable.com/clusters/>
|
||||||
|
|
||||||
# AUTHOR
|
# AUTHOR
|
||||||
|
|
||||||
Joey Hess <id@joeyh.name>
|
Joey Hess <id@joeyh.name>
|
||||||
|
|
|
@ -5,3 +5,5 @@
|
||||||
* [[special_remotes]]
|
* [[special_remotes]]
|
||||||
* [[workflows|workflow]]
|
* [[workflows|workflow]]
|
||||||
* [[sync]]
|
* [[sync]]
|
||||||
|
* [[preferred_content]]
|
||||||
|
* [[clusters]]
|
||||||
|
|
Loading…
Reference in a new issue