improve docs
This commit is contained in:
parent
0a1001dbfb
commit
e3dd29409b
2 changed files with 31 additions and 17 deletions
|
@ -1,20 +1,22 @@
|
||||||
A git-annex repository can provide access to its remotes as nodes of a
|
A cluster is a collection of git-annex repositories which are combined to
|
||||||
cluster. This allows other repositories to access the cluster as a single
|
form a single logical repository.
|
||||||
logical repository.
|
|
||||||
|
A cluster is accessed via a gateway repository. The gateway is not itself
|
||||||
|
a node of the cluster.
|
||||||
|
|
||||||
[[!toc ]]
|
[[!toc ]]
|
||||||
|
|
||||||
## using a cluster
|
## using a cluster
|
||||||
|
|
||||||
To use a cluster, your repository needs to have a remote that serves the
|
To use a cluster, your repository needs to have its gateway configured as a
|
||||||
cluster. Clusters can currently only be accessed via ssh. This remote
|
remote. Clusters can currently only be accessed via ssh. This gateway
|
||||||
is added the same as any other remote:
|
remote is added the same as any other remote:
|
||||||
|
|
||||||
git remote add bigserver me@bigserver:annex
|
git remote add bigserver me@bigserver:annex
|
||||||
|
|
||||||
The remote publishes information about the cluster that it serves
|
The gateway publishes information about the cluster to the git-annex
|
||||||
to the git-annex branch. (See below for how that is configured.) So you may
|
branch. (See below for how that is configured.) So you may need to fetch
|
||||||
need to fetch from it to learn about the cluster that it serves:
|
from it to learn about the cluster:
|
||||||
|
|
||||||
git fetch bigserver
|
git fetch bigserver
|
||||||
|
|
||||||
|
@ -34,7 +36,8 @@ they are stored to:
|
||||||
$ git-annex move bar --to bigserver-mycluster
|
$ git-annex move bar --to bigserver-mycluster
|
||||||
move bar (to bigserver-mycluster...) ok
|
move bar (to bigserver-mycluster...) ok
|
||||||
|
|
||||||
In fact, a single upload can be sent to every node of the cluster at once.
|
In fact, a single upload like that can be sent to every node of the cluster
|
||||||
|
at once, very efficiently.
|
||||||
|
|
||||||
$ git-annex whereis bar
|
$ git-annex whereis bar
|
||||||
whereis bar (3 copies)
|
whereis bar (3 copies)
|
||||||
|
@ -50,10 +53,13 @@ so the 3 copies are the copies on individual nodes.
|
||||||
Most other git-annex commands that operate on repositories can also operate on
|
Most other git-annex commands that operate on repositories can also operate on
|
||||||
clusters.
|
clusters.
|
||||||
|
|
||||||
|
A cluster is not a git repository, and so `git pull bigserver-mycluster`
|
||||||
|
will not work.
|
||||||
|
|
||||||
## configuring a cluster
|
## configuring a cluster
|
||||||
|
|
||||||
A new cluster first needs to be initialized. Run [[git-annex-initcluster]] in
|
A new cluster first needs to be initialized. Run [[git-annex-initcluster]] in
|
||||||
the repository that will serve the cluster to clients. In the example above,
|
the repository that will serve as the cluster's gateway. In the example above,
|
||||||
this was the "bigserver" repository.
|
this was the "bigserver" repository.
|
||||||
|
|
||||||
$ git-annex initcluster mycluster
|
$ git-annex initcluster mycluster
|
||||||
|
@ -107,3 +113,10 @@ For example:
|
||||||
By default, when a file is uploaded to a cluster, it is stored on every node of
|
By default, when a file is uploaded to a cluster, it is stored on every node of
|
||||||
the cluster. To control which nodes to store to, the [[preferred_content]] of
|
the cluster. To control which nodes to store to, the [[preferred_content]] of
|
||||||
each node can be configured.
|
each node can be configured.
|
||||||
|
|
||||||
|
It's also a good idea to configure the preferred content of the cluster's
|
||||||
|
gateway. To avoid files redundantly being stored on the gateway
|
||||||
|
(which remember, is not a node of the cluster), you might make it not want
|
||||||
|
any files:
|
||||||
|
|
||||||
|
$ git-annex wanted bigserver nothing
|
||||||
|
|
|
@ -26,17 +26,18 @@ In development on the `proxy` branch.
|
||||||
|
|
||||||
For June's work on [[design/passthrough_proxy]], remaining todos:
|
For June's work on [[design/passthrough_proxy]], remaining todos:
|
||||||
|
|
||||||
* Getting a key from a cluster currently always selects the lowest cost
|
* Since proxying to special remotes is not supported yet, and won't be for
|
||||||
remote, and always the same remote if cost is the same. Should
|
the first release, make it fail in a reasonable way.
|
||||||
round-robin amoung remotes, and prefer to avoid using remotes that
|
|
||||||
other git-annex processes are currently using.
|
|
||||||
|
|
||||||
* Basic proxying to special remote support (non-streaming).
|
|
||||||
|
|
||||||
* Support distributed clusters: Make a proxy for a cluster repeat
|
* Support distributed clusters: Make a proxy for a cluster repeat
|
||||||
protocol messages on to any remotes that have the same UUID as
|
protocol messages on to any remotes that have the same UUID as
|
||||||
the cluster. Needs VIA extension to P2P protocol to avoid cycles.
|
the cluster. Needs VIA extension to P2P protocol to avoid cycles.
|
||||||
|
|
||||||
|
* Getting a key from a cluster currently always selects the lowest cost
|
||||||
|
remote, and always the same remote if cost is the same. Should
|
||||||
|
round-robin amoung remotes, and prefer to avoid using remotes that
|
||||||
|
other git-annex processes are currently using.
|
||||||
|
|
||||||
* Optimise proxy speed. See design for ideas.
|
* Optimise proxy speed. See design for ideas.
|
||||||
|
|
||||||
* Use `sendfile()` to avoid data copying overhead when
|
* Use `sendfile()` to avoid data copying overhead when
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue