diff --git a/doc/clusters.mdwn b/doc/clusters.mdwn new file mode 100644 index 0000000000..adcbea9cb1 --- /dev/null +++ b/doc/clusters.mdwn @@ -0,0 +1,98 @@ +A git-annex repository can provide access to its remotes as nodes of a +cluster. This allows other repositories to access the cluster as a single +logical repository. + +[[!toc ]] + +## using a cluster + +For example, a remote "bigserver" that is configured as a cluster will +make available an additional remote "bigserver-mycluster", as well as some +remotes for each node eg "bigserver-node1", "bigserver-node2", etc. + +The user can get files from the cluster without caring which node it comes +from: + + $ git-annex get foo --from bigserver-mycluster + copy foo (from bigserver-mycluster...) ok + +And the user can send files to the cluster, without caring what nodes +they are stored to: + + $ git-annex move bar --to bigserver-mycluster + move bar (to bigserver-mycluster...) ok + +In fact, a single upload can be sent to every node of the cluster at once. + + $ git-annex whereis bar + whereis bar (3 copies) + acae2ff6-6c1e-8bec-b8b9-397a3755f397 -- my cluster [bigserver-mycluster] + 9f514001-6dc0-4d83-9af3-c64c96626892 -- node 1 [bigserver-node1] + d81e0b28-612e-4d73-a4e6-6dabbb03aba1 -- node 2 [bigserver-node2] + 5657baca-2f11-11ef-ae1a-5b68c6321dd9 -- node 3 [bigserver-node3] + +Notice that the file is shown as present in the cluster, as well as on +individual nodes. But the cluster itself does not count as a copy of the file, +so the 3 copies are the copies on individual nodes. + +Most other git-annex commands that operate on repositories can also operate on +clusters. + +Clusters can only be accessed via ssh. + +## configuring a cluster + +A new cluster first needs to be initialized. Run [[git-annex-initcluster]] in +the repository that will serve the cluster to clients. In the example above, +this was the "bigserver" repository. + + $ git-annex initcluster mycluster + +Once a cluster is initialized, the next step is to add nodes to it. +To make a remote be a node of the cluster, configure +`git config remote.name.annex-cluster-node`, setting it to the +name of the cluster. + +In the example above, the three cluster nodes were configured like this: + + $ git remote add node1 /media/disk1/repo + $ git remote add node2 /media/disk2/repo + $ git remote add node3 /media/disk3/repo + $ git config remote.node1.annex-cluster-node true + $ git config remote.node2.annex-cluster-node true + $ git config remote.node3.annex-cluster-node true + +Finally, run `git-annex updatecluster` to record the cluster configuration +in the git-annex branch. That tells other repositories about the cluster. + + $ git-annex updatecluster mycluster + Added node node1 to cluster: mycluster + Added node node2 to cluster: mycluster + Added node node3 to cluster: mycluster + Started proxying for node1 + Started proxying for node2 + Started proxying for node3 + +## preferred content of clusters + +The preferred content of the cluster can be configured. This tells +users what files the cluster as a whole should contain. + +To configure the preferred content of a cluster, as well as other related +things like [[groups|git-annex-group]] and [[required_content]], it's easiest +to do the configuration in a repository that has the cluster as a remote. + +For example: + + git-annex wanted bigserver-mycluster standard + git-annex group bigserver-mycluster archive + +By default, when a file is uploaded to a cluster, it is stored on every node of +the cluster. To control which nodes to store to, the [[preferred_content]] of +each node can be configured. + +If the preferred content configuration of nodes make none of them +want a copy of a file, the upload to the cluster will fail. That is done to +avoid git-annex picking an arbitrary node. But, the user can bypass the +cluster and send content to any individual node, even if it's not preferred +content of that node. diff --git a/doc/git-annex-initcluster.mdwn b/doc/git-annex-initcluster.mdwn index 7b9c9cb7f4..c8916564c1 100644 --- a/doc/git-annex-initcluster.mdwn +++ b/doc/git-annex-initcluster.mdwn @@ -8,37 +8,11 @@ git-annex initcluster name [description] # DESCRIPTION -A git-annex repository can provide access to its remotes as a unified -cluster. This allows other repositories to access the cluster as a remote, -with uploads and downloads distributed amoung the nodes of the cluster, -according to their preferred content settings. - This command initializes a new cluster with the specified name. If no description is provided, one will be set automatically. -Once a cluster is initialized, the next step is to add nodes to it. -To make a remote be a node of the cluster, configure -`git config remote.name.annex-cluster-node`, setting it to the -name of the cluster. - -Finally, run `git-annex updatecluster` to record the cluster configuration -in the git-annex branch. That tells other repositories about the cluster. - -Example: - - git-annex initcluster mycluster - git config remote.foo.annex-cluster-node mycluster - git config remote.bar.annex-cluster-node mycluster - git config remote.baz.annex-cluster-node mycluster - git-annex updatecluster - -Suppose, for example, that remote "bigserver" has had those commands run in -it. Then after pulling from "bigserver", git-annex will know about an -additional remote, "bigserver-mycluster", which can be used like any other -remote but is an interface to the cluster as a whole. The individual cluster -nodes will also be proxied as remotes, eg "bigserver-foo". - -Clusters can only be accessed via ssh. +The next step after running this command is to configure +the cluster, then run [[git-annex-updatecluster]]. # OPTIONS @@ -51,6 +25,8 @@ Clusters can only be accessed via ssh. [[git-annex-preferred-content]](1) [[git-annex-updateproxy]](1) + + # AUTHOR Joey Hess diff --git a/doc/git-annex-updatecluster.mdwn b/doc/git-annex-updatecluster.mdwn index 75bc6f41cf..ddbc968586 100644 --- a/doc/git-annex-updatecluster.mdwn +++ b/doc/git-annex-updatecluster.mdwn @@ -29,6 +29,8 @@ and run this command. [[git-annex-initcluster]](1) [[git-annex-updateproxy]](1) + + # AUTHOR Joey Hess diff --git a/doc/links/key_concepts.mdwn b/doc/links/key_concepts.mdwn index b1b037789c..0c2e1ddf74 100644 --- a/doc/links/key_concepts.mdwn +++ b/doc/links/key_concepts.mdwn @@ -5,3 +5,5 @@ * [[special_remotes]] * [[workflows|workflow]] * [[sync]] +* [[preferred_content]] +* [[clusters]]