improve docs

2024-06-25 17:50:22 -04:00 · 2024-06-25 17:50:22 -04:00 · e3dd29409b
commit e3dd29409b
parent 0a1001dbfb
2 changed files with 31 additions and 17 deletions
--- a/doc/clusters.mdwn
+++ b/doc/clusters.mdwn
@ -1,20 +1,22 @@
-A git-annex repository can provide access to its remotes as nodes of a
+A cluster is a collection of git-annex repositories which are combined to
-cluster. This allows other repositories to access the cluster as a single
+form a single logical repository.
-logical repository.
+
 A cluster is accessed via a gateway repository. The gateway is not itself
 a node of the cluster.
 [[!toc ]]
 ## using a cluster
-To use a cluster, your repository needs to have a remote that serves the
+To use a cluster, your repository needs to have its gateway configured as a
-cluster. Clusters can currently only be accessed via ssh. This remote
+remote. Clusters can currently only be accessed via ssh. This gateway
-is added the same as any other remote:
+remote is added the same as any other remote:
    git remote add bigserver me@bigserver:annex
-The remote publishes information about the cluster that it serves
+The gateway publishes information about the cluster to the git-annex
-to the git-annex branch. (See below for how that is configured.) So you may
+branch. (See below for how that is configured.) So you may need to fetch
-need to fetch from it to learn about the cluster that it serves:
+from it to learn about the cluster:
    git fetch bigserver
@ -34,7 +36,8 @@ they are stored to:
    $ git-annex move bar --to bigserver-mycluster
    move bar (to bigserver-mycluster...) ok
-In fact, a single upload can be sent to every node of the cluster at once. 
+In fact, a single upload like that can be sent to every node of the cluster
 at once, very efficiently.
    $ git-annex whereis bar
 	whereis bar (3 copies)
@ -50,10 +53,13 @@ so the 3 copies are the copies on individual nodes.
 Most other git-annex commands that operate on repositories can also operate on
 clusters.
 A cluster is not a git repository, and so `git pull bigserver-mycluster`
 will not work.
 ## configuring a cluster
 A new cluster first needs to be initialized. Run [[git-annex-initcluster]] in
-the repository that will serve the cluster to clients. In the example above,
+the repository that will serve as the cluster's gateway. In the example above,
 this was the "bigserver" repository.
 	$ git-annex initcluster mycluster
@ -107,3 +113,10 @@ For example:
 By default, when a file is uploaded to a cluster, it is stored on every node of
 the cluster. To control which nodes to store to, the [[preferred_content]] of
 each node can be configured.
 It's also a good idea to configure the preferred content of the cluster's
 gateway. To avoid files redundantly being stored on the gateway
 (which remember, is not a node of the cluster), you might make it not want
 any files:
    $ git-annex wanted bigserver nothing
--- a/doc/todo/git-annex_proxies.mdwn
+++ b/doc/todo/git-annex_proxies.mdwn
@ -26,17 +26,18 @@ In development on the `proxy` branch.
 For June's work on [[design/passthrough_proxy]], remaining todos:
-* Getting a key from a cluster currently always selects the lowest cost
+* Since proxying to special remotes is not supported yet, and won't be for
-  remote, and always the same remote if cost is the same. Should
+  the first release, make it fail in a reasonable way.
  round-robin amoung remotes, and prefer to avoid using remotes that
  other git-annex processes are currently using.
 * Basic proxying to special remote support (non-streaming).
 * Support distributed clusters: Make a proxy for a cluster repeat
  protocol messages on to any remotes that have the same UUID as
  the cluster. Needs VIA extension to P2P protocol to avoid cycles.
 * Getting a key from a cluster currently always selects the lowest cost
  remote, and always the same remote if cost is the same. Should
  round-robin amoung remotes, and prefer to avoid using remotes that
  other git-annex processes are currently using.
 * Optimise proxy speed. See design for ideas.
 * Use `sendfile()` to avoid data copying overhead when