improve docs
This commit is contained in:
parent
0a1001dbfb
commit
e3dd29409b
2 changed files with 31 additions and 17 deletions
|
@ -1,20 +1,22 @@
|
|||
A git-annex repository can provide access to its remotes as nodes of a
|
||||
cluster. This allows other repositories to access the cluster as a single
|
||||
logical repository.
|
||||
A cluster is a collection of git-annex repositories which are combined to
|
||||
form a single logical repository.
|
||||
|
||||
A cluster is accessed via a gateway repository. The gateway is not itself
|
||||
a node of the cluster.
|
||||
|
||||
[[!toc ]]
|
||||
|
||||
## using a cluster
|
||||
|
||||
To use a cluster, your repository needs to have a remote that serves the
|
||||
cluster. Clusters can currently only be accessed via ssh. This remote
|
||||
is added the same as any other remote:
|
||||
To use a cluster, your repository needs to have its gateway configured as a
|
||||
remote. Clusters can currently only be accessed via ssh. This gateway
|
||||
remote is added the same as any other remote:
|
||||
|
||||
git remote add bigserver me@bigserver:annex
|
||||
|
||||
The remote publishes information about the cluster that it serves
|
||||
to the git-annex branch. (See below for how that is configured.) So you may
|
||||
need to fetch from it to learn about the cluster that it serves:
|
||||
The gateway publishes information about the cluster to the git-annex
|
||||
branch. (See below for how that is configured.) So you may need to fetch
|
||||
from it to learn about the cluster:
|
||||
|
||||
git fetch bigserver
|
||||
|
||||
|
@ -34,7 +36,8 @@ they are stored to:
|
|||
$ git-annex move bar --to bigserver-mycluster
|
||||
move bar (to bigserver-mycluster...) ok
|
||||
|
||||
In fact, a single upload can be sent to every node of the cluster at once.
|
||||
In fact, a single upload like that can be sent to every node of the cluster
|
||||
at once, very efficiently.
|
||||
|
||||
$ git-annex whereis bar
|
||||
whereis bar (3 copies)
|
||||
|
@ -50,10 +53,13 @@ so the 3 copies are the copies on individual nodes.
|
|||
Most other git-annex commands that operate on repositories can also operate on
|
||||
clusters.
|
||||
|
||||
A cluster is not a git repository, and so `git pull bigserver-mycluster`
|
||||
will not work.
|
||||
|
||||
## configuring a cluster
|
||||
|
||||
A new cluster first needs to be initialized. Run [[git-annex-initcluster]] in
|
||||
the repository that will serve the cluster to clients. In the example above,
|
||||
the repository that will serve as the cluster's gateway. In the example above,
|
||||
this was the "bigserver" repository.
|
||||
|
||||
$ git-annex initcluster mycluster
|
||||
|
@ -107,3 +113,10 @@ For example:
|
|||
By default, when a file is uploaded to a cluster, it is stored on every node of
|
||||
the cluster. To control which nodes to store to, the [[preferred_content]] of
|
||||
each node can be configured.
|
||||
|
||||
It's also a good idea to configure the preferred content of the cluster's
|
||||
gateway. To avoid files redundantly being stored on the gateway
|
||||
(which remember, is not a node of the cluster), you might make it not want
|
||||
any files:
|
||||
|
||||
$ git-annex wanted bigserver nothing
|
||||
|
|
|
@ -26,17 +26,18 @@ In development on the `proxy` branch.
|
|||
|
||||
For June's work on [[design/passthrough_proxy]], remaining todos:
|
||||
|
||||
* Getting a key from a cluster currently always selects the lowest cost
|
||||
remote, and always the same remote if cost is the same. Should
|
||||
round-robin amoung remotes, and prefer to avoid using remotes that
|
||||
other git-annex processes are currently using.
|
||||
|
||||
* Basic proxying to special remote support (non-streaming).
|
||||
* Since proxying to special remotes is not supported yet, and won't be for
|
||||
the first release, make it fail in a reasonable way.
|
||||
|
||||
* Support distributed clusters: Make a proxy for a cluster repeat
|
||||
protocol messages on to any remotes that have the same UUID as
|
||||
the cluster. Needs VIA extension to P2P protocol to avoid cycles.
|
||||
|
||||
* Getting a key from a cluster currently always selects the lowest cost
|
||||
remote, and always the same remote if cost is the same. Should
|
||||
round-robin amoung remotes, and prefer to avoid using remotes that
|
||||
other git-annex processes are currently using.
|
||||
|
||||
* Optimise proxy speed. See design for ideas.
|
||||
|
||||
* Use `sendfile()` to avoid data copying overhead when
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue