Merge branch 'proxy'

2024-06-27 15:43:45 -04:00 · 2024-06-27 15:43:45 -04:00 · c3f88923c0
commit c3f88923c0
parent bd2507de17 591f79a9c3
78 changed files with 3145 additions and 448 deletions
--- a/doc/design/balanced_preferred_content.mdwn
+++ b/doc/design/balanced_preferred_content.mdwn
@ -124,9 +124,16 @@ See [[todo/proving_preferred_content_behavior]].
 ## rebalancing

 In both the 3 of 5 use case and a split brain situation, it's possible for
-content to end up not optimally balanced between repositories. git-annex
-can be made to operate in a mode where it does additional work to rebalance
-repositories. 
+content to end up not optimally balanced between repositories. 
+
+(There are also situations where a cluster node ends up without a copy
+of a file that is preferred content, or where adding a copy to a node
+would satisfy numcopies. This can happen eg, when a client sends a file
+to a single node rather than to the cluster. Rebalancing also will deal
+with those.)
+
+git-annex can be made to operate in a mode where it does additional work
+to rebalance repositories. 

 This can be an option like --rebalance, that changes how the preferred content
 expression is evaluated. The user can choose where and when to run that.
--- a/doc/design/p2p_protocol.mdwn
+++ b/doc/design/p2p_protocol.mdwn
@ -40,8 +40,8 @@ The server responds with either its own UUID when authentication
 is successful. Or, it can fail the authentication, and close the
 connection.

-	AUTH_SUCCESS UUID
-	AUTH_FAILURE
+	AUTH-SUCCESS UUID
+	AUTH-FAILURE

 Note that authentication does not guarantee that the client is talking to
 who they expect to be talking to. This, and encryption of the connection,
@ -64,6 +64,19 @@ that is less than or equal to the version the client sent:

 Now both client and server should use version 1.

+## Cluster cycle prevention
+
+In protocol version 2, immediately after VERSION, the
+client can send an additional message that is used to
+prevent cycles when accessing clusters.
+
+    BYPASS UUID1 UUID2 ...
+
+The UUIDs are cluster gateways to avoid connecting to when
+serving a cluster.
+
+The server makes no response to this message.
+
 ## Binary data

 The protocol allows raw binary data to be sent. This is done
@ -117,6 +130,10 @@ To remove a key's content from the server, the client sends:

 The server responds with either SUCCESS or FAILURE.

+In protocol version 2, the server can optionally reply with SUCCESS-PLUS
+or FAILURE-PLUS. Each has a subsequent list of UUIDs of repositories
+that the content was removed from.
+
 ## Storing content on the server

 To store content on the server, the client sends:
@ -132,7 +149,14 @@ spaces, since it's not the last token in the line. Use '%' to indicate
 whitespace.)

 The server may respond with ALREADY-HAVE if it already
-had the conent of that key. Otherwise, it responds with:
+had the conent of that key. 
+
+In protocol version 2, the server can optionally reply with
+ALREADY-HAVE-PLUS. The subsequent list of UUIDs are additional
+UUIDs where the content is stored, in addition to the UUID where
+the client was going to send it.
+
+Otherwise, it responds with:

 	PUT-FROM Offset

@ -152,6 +176,9 @@ was being sent.
 If the server successfully receives the data and stores the content,
 it replies with SUCCESS. Otherwise, FAILURE.

+In protocol version 2, the server can optionally reply with SUCCESS-PLUS
+and a list of UUIDs where the content was stored.
+
 ## Getting content from the server

 To get content from the server, the client sends:
@ -192,6 +219,8 @@ its exit code.

 	CONNECTDONE ExitCode

+After that, the server closes the connection.
+
 ## Change notification

 The client can request to be notified when a ref in 
--- a/doc/design/p2p_protocol_over_http.mdwn
+++ b/doc/design/p2p_protocol_over_http.mdwn
@ -35,7 +35,7 @@ For example (eliding the full HTTP responses, only showing the data):
    > Content-Length: ...
    > 
    > AUTH 79a5a1f4-07e8-11ef-873d-97f93ca91925 
-    < AUTH_SUCCESS ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6
+    < AUTH-SUCCESS ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6

    > POST /git-annex HTTP/1.0
    > Content-Type: x-git-annex-p2p
@ -80,7 +80,7 @@ correspond to each action in the P2P protocol.
 Something like this:

    > GET /git-annex/v1/AUTH?clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925 HTTP/1.0
-    < AUTH_SUCCESS ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6
+    < AUTH-SUCCESS ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6

    > GET /git-annex/v1/CHECKPRESENT?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.0
    > SUCCESS
--- a/doc/design/passthrough_proxy.mdwn
+++ b/doc/design/passthrough_proxy.mdwn
@ -219,11 +219,6 @@ And, if the proxy repository itself contains the requested key, it can send
 it directly. This allows the proxy repository to be primed with frequently
 accessed files when it has the space.

-(Should uploads check preferred content of the proxy repository and also
-store a copy there when allowed? I think this would be ok, so long as when
-preferred content is not set, it does not default to storing content
-there.)
-
 When a drop is requested from the cluster's UUID, git-annex-shell drops
 from all nodes, as well as from the proxy itself. Only indicating success
 if it is able to delete all copies from the cluster. This needs 
@ -238,6 +233,14 @@ always fail. Also, when constructing a drop proof for a cluster's UUID,
 the nodes of that cluster should be omitted, otherwise a drop from the
 cluster can lock content on individual nodes, causing the drop to fail.

+Moving from a cluster is a special case because it may reduce the number
+of copies. So move's `willDropMakeItWorse` check needs to special case
+clusters. Since dropping from the cluster may remove content from any of
+its nodes, which may include copies on nodes that the local location log does
+not know about yet, the special case probably needs to always assume
+that dropping from a cluster in a move risks reducing numcopies,
+and so only allow it when a drop proof can be constructed.
+
 Some commands like `git-annex whereis` will list content as being stored in
 the cluster, as well as on whichever of its nodes, and whereis currently
 says "n copies", but since the cluster doesn't count as a copy, that
@ -279,9 +282,9 @@ configuration of the cluster. But the cluster is configured via the
 git-annex branch, particularly preferred content, and the proxy log, and
 the cluster log.

-A user could, for example, make the cluster's frontend want all
-content, and so fill up its small disk. They could make a particular node
-not want any content. They could remove nodes from the cluster.
+A user could, for example, make a small cluster node want all content, and
+so fill up its small disk. They could make a particular node not want any
+content. They could remove nodes from the cluster.

 One way to deal with this is for the cluster to reject git-annex branch
 pushes that make such changes. Or only allow them if they are signed with a
@ -296,24 +299,43 @@ A remote will only be treated as a node of a cluster when the git
 configuration remote.name.annex-cluster-node is set, which will prevent
 creating clusters in places where they are not intended to be.

+## distributed clusters
+
+A cluster's nodes may be geographically distributed amoung several
+locations, which are effectivly subclusters. To support this, an upload
+or removal sent to one frontend proxy of the cluster will be repeated to
+other frontend proxies that are remotes of that one and have the cluster's
+UUID.
+
+This is better than supporting a cluster that is a node of another cluster,
+because rather than a hierarchical structure, this allows for organic
+structures of any shape. For example, there could be two frontends to a
+cluster, in different locations. An upload to either frontend fans out to
+its local nodes as well as over to the other frontend, and to its local
+nodes.
+
+This does mean that cycles need to be prevented. See section below.
+
 ## speed

-A passthrough proxy should be as fast as possible so as not to add overhead
+A proxy should be as fast as possible so as not to add overhead
 to a file retrieve, store, or checkpresent. This probably means that
-it keeps TCP connections open to each host in the cluster. It might use a
+it keeps TCP connections open to each host. It might use a
 protocol with less overhead than ssh.

-In the case of checkpresent, it would be possible for the proxy to not
-communicate with the cluster to check that the data is still present on it.
-As long as all access is intermediated via the proxy, its git-annex branch
-could be relied on to always be correct, in theory. Proving that theory,
-making sure to account for all possible race conditions and other scenarios,
-would be necessary for such an optimisation.
+In the case of checkpresent, it would be possible for the gateway to not
+communicate with cluster nodes to check that the data is still present
+in the cluster. As long as all access is intermediated via a single gateway, 
+its git-annex branch could be relied on to always be correct, in theory.
+Proving that theory, making sure to account for all possible race conditions
+and other scenarios, would be necessary for such an optimisation. This
+would not work for multi-gateway clusters unless the gateways were kept in
+sync about locations, which they currently are not.

-Another way the proxy could speed things up is to cache some subset of
-content. Eg, analize what files are typically requested, and store another
-copy of those on the proxy. Perhaps prioritize storing smaller files, where
-latency tends to swamp transfer speed.
+Another way the cluster gateway could speed things up is to cache some
+subset of content. Eg, analize what files are typically requested, and
+store another copy of those on the proxy. Perhaps prioritize storing
+smaller files, where latency tends to swamp transfer speed.

 ## proxying to special remotes

@ -446,7 +468,7 @@ So overall, it seems better to do proxy-side encryption. But it may be
 worth adding a special remote that does its own client-side encryption
 in front of the proxy.

-## cycles
+## cycles of proxies

 A repo can advertise that it proxies for a repo which has the same uuid as
 itself. Or there can be a larger cycle involving a proxy that proxies to a
@ -454,36 +476,43 @@ proxy, etc.

 Since the proxied repo uuid is communicated to git-annex-shell via 
 --uuid, a repo that advertises proxying for itself will be connected to
-with its own uuid. No proxying is done in this case. Same happens with a
-larger cycle.
-
-Instantiating remotes needs to identity cycles and break them. Otherwise
-it would construct an infinite number of proxied remotes with names
-like "foo-foo-foo-foo-..." or "foo-bar-foo-bar-..."
-
-Once `git-annex copy --to proxy` is implemented, and the proxy decides
-where to send content that is being sent directly to it, cycles will
-become an issue with that as well.
+with its own uuid. No proxying is done in that case.

 What if repo A is a proxy and has repo B as a remote. Meanwhile, repo B is
-a proxy and has repo A as a remote?
+a proxy and has repo A as a remote? git-annex-shell on repo A will get
+A's uuid, and so will operate on it directly without proxying. So larger
+cycles are also not a problem on the proxy side.

-An upload to repo A will start by checking if repo B wants the content and if so,
-start an upload to repo B. Then the same happens on repo B, leading it to
-start an upload to repo A. 
+On the client side, instantiating remotes needs to identity cycles and
+break them. Otherwise it would construct an infinite number of proxied
+remotes with names like "foo-foo-foo-foo-..." or "foo-bar-foo-bar-..."

-At this point, it might be possible for git-annex to detect the cycle,
-if the proxy uses a transfer lock file. If repo B or repo A had some other
-remote that is not part of a cycle, they could deposit the upload there and
-the upload still succeed. Otherwise the upload would fail, which is
-probably the best that can be done with such a broken configuration.
+## cycles of cluster proxies

-So, it seems like proxies would need to take transfer locks for uploads,
-even though the content is being proxied to elsewhere.
+If an PUT or REMOVE message is sent to a proxy for a cluster, and that
+repository has a remote that is also a proxy for the same cluster,
+the message gets repeated on to it. This can lead to cycles, which have to
+be broken.

-Dropping could have similar cycles with content presence locking, which
-needs to be thought through as well. A cycle of the actual dropContent
-operation might also be possible.
+To break the cycle, extend the P2P protocol with an additional message,
+like:
+
+    VIA uuid1 uuid2
+
+This indicates to a proxy that the message has been received via the other
+listed proxies. It can then avoid repeating the message out via any of
+those proxies. When repeating a message out to another proxy, just add
+the UUID of the local repository to the list.
+
+This will be an extension to the protocol, but so long as it's added in
+the same git-annex version that adds support for proxies, every cluster
+proxy will support it.
+
+This avoids cycles, but it does not avoid situations where there are
+multiple paths through a proxy network that reach the same node. In such a
+situation, a REMOVE might happen twice (no problem) or a PUT be received
+twice from different paths (one of them would fail due to the other one
+taking the transfer lock).

 ## exporttree=yes

--- a/doc/git-annex-extendcluster.mdwn
+++ b/doc/git-annex-extendcluster.mdwn
@ -0,0 +1,44 @@
+# NAME
+
+git-annex extendcluster - add an additional gateway to a cluster
+
+# SYNOPSIS
+
+git-annex extendcluster gateway clustername
+
+# DESCRIPTION
+
+This command is used to configure a repository to serve as an additional
+gateway to a cluster. It is run in that repository.
+
+The repository this command is run in should have a remote that is a
+gateway to the cluster. The `gateway` parameter is the name of that remote.
+The `clustername` parameter is the name of the cluster.
+
+The next step after running this command is to configure
+any additional cluster nodes that this gateway serves to the cluster,
+then run [[git-annex-updatecluster]]. See the documentation of that
+command for details about configuring nodes.
+
+After running this command in the new gateway repository, it typically
+also needs to be run in the other gateway repositories as well, 
+after adding the new gateway repository as a remote.
+
+# OPTIONS
+
+* The [[git-annex-common-options]](1) can be used.
+
+# SEE ALSO
+
+[[git-annex]](1)
+[[git-annex-initcluster]](1)
+[[git-annex-updatecluster]](1)
+[[git-annex-updateproxy]](1)
+
+<https://git-annex.branchable.com/tips/clusters/>
+
+# AUTHOR
+
+Joey Hess <id@joeyh.name>
+
+Warning: Automatically converted into a man page by mdwn2man. Edit with care.
--- a/doc/git-annex-initcluster.mdwn
+++ b/doc/git-annex-initcluster.mdwn
@ -0,0 +1,39 @@
+# NAME
+
+git-annex initcluster - initialize a new cluster
+
+# SYNOPSIS
+
+git-annex initcluster name [description]
+
+# DESCRIPTION
+
+This command initializes a new cluster with the specified name. If no
+description is provided, one will be set automatically.
+
+This command should be run in the repository that will serve as the gateway
+to the cluster.
+
+The next step after running this command is to configure
+the cluster nodes, then run [[git-annex-updatecluster]]. See the
+documentation of that command for details about configuring nodes.
+
+# OPTIONS
+
+* The [[git-annex-common-options]](1) can be used.
+
+# SEE ALSO
+
+[[git-annex]](1)
+[[git-annex-updatecluster]](1)
+[[git-annex-extendcluster]](1)
+[[git-annex-preferred-content]](1)
+[[git-annex-updateproxy]](1)
+
+<https://git-annex.branchable.com/tips/clusters/>
+
+# AUTHOR
+
+Joey Hess <id@joeyh.name>
+
+Warning: Automatically converted into a man page by mdwn2man. Edit with care.
--- a/doc/git-annex-preferred-content.mdwn
+++ b/doc/git-annex-preferred-content.mdwn
@ -8,7 +8,7 @@ Each repository has a preferred content setting, which specifies content
 that the repository wants to have present. These settings can be configured
 using `git annex vicfg` or `git annex wanted`.
 They are used by the `--auto` option, by `git annex sync --content`,
-and by the git-annex assistant.
+by clusters, and by the git-annex assistant.

 While preferred content expresses a preference, it can be overridden
 by simply using `git annex drop`. On the other hand, required content
--- a/doc/git-annex-required.mdwn
+++ b/doc/git-annex-required.mdwn
@ -9,7 +9,7 @@ git annex required `repository [expression]`
 # DESCRIPTION

 When run with an expression, configures the content that is required
-to be held in the archive.
+to be held in the repository.

 For example:

--- a/doc/git-annex-shell.mdwn
+++ b/doc/git-annex-shell.mdwn
@ -86,7 +86,9 @@ first "/~/" or "/~user/" is expanded to the specified home directory.
 * --uuid=UUID

  git-annex uses this to specify the UUID of the repository it was expecting
-  git-annex-shell to access, as a sanity check.
+  git-annex-shell to access. This is both a sanity check, and allows
+  git-annex shell to proxy access to remotes, when configured
+  by [[git-annex-update-proxy]].

 * Also the [[git-annex-common-options]](1) can be used.

--- a/doc/git-annex-updatecluster.mdwn
+++ b/doc/git-annex-updatecluster.mdwn
@ -0,0 +1,43 @@
+# NAME
+
+git-annex updatecluster - update records of cluster nodes
+
+# SYNOPSIS
+
+git-annex updatecluster
+
+# DESCRIPTION
+
+This command is used to record the nodes of a cluster in the git-annex
+branch, and set up proxying to the nodes. It should be run in the
+repository that will serve as a gateway to the cluster.
+
+It looks at the git config `remote.name.annex-cluster-node` of
+each remote. When that is set to the name of a cluster that has been
+initialized with `git-annex initcluster`, the node will be recorded in the
+git-annex branch.
+
+To remove a node from a cluster, unset `remote.name.annex-cluster-node`
+and run this command.
+
+To add additional gateways to a cluster, after running this command,
+use [[git-annex-extendcluster]].
+
+# OPTIONS
+
+* The [[git-annex-common-options]](1) can be used.
+
+# SEE ALSO
+
+[[git-annex]](1)
+[[git-annex-initcluster]](1)
+[[git-annex-extendcluster]](1)
+[[git-annex-updateproxy]](1)
+
+<https://git-annex.branchable.com/tips/clusters/>
+
+# AUTHOR
+
+Joey Hess <id@joeyh.name>
+
+Warning: Automatically converted into a man page by mdwn2man. Edit with care.
--- a/doc/git-annex-updateproxy.mdwn
+++ b/doc/git-annex-updateproxy.mdwn
@ -0,0 +1,44 @@
+# NAME
+
+git-annex updateproxy - update records with proxy configuration
+
+# SYNOPSIS
+
+git annex updateproxy
+
+# DESCRIPTION
+
+A git-annex repository can act as a proxy for its remotes. That allows
+annexed content to be stored and removed from the proxy's remotes, by
+repositories that do not have a direct connection to the remotes.
+
+By default, no proxying is done. To configure the local repository to act
+as a proxy for its remote named "foo", run `git config remote.foo.annex-proxy`
+true`.
+
+After setting or unsetting `remote.<name>.annex-proxy` git configurations,
+run `git-annex updateproxy` to record the proxy configuration in the
+git-annex branch. That tells other repositories about the proxy
+configuration.
+
+Suppose, for example, that remote "work" has had this command run in
+it. Then after pulling from "work", git-annex will know about an
+additional remote, "work-foo". That remote will be accessed using "work" as
+a proxy.
+
+Proxies can only be accessed via ssh.
+
+# OPTIONS
+
+* The [[git-annex-common-options]](1) can be used.
+
+# SEE ALSO
+
+[[git-annex]](1)
+[[git-annex-updatecluster]](1)
+
+# AUTHOR
+
+Joey Hess <id@joeyh.name>
+
+Warning: Automatically converted into a man page by mdwn2man. Edit with care.
--- a/doc/git-annex-wanted.mdwn
+++ b/doc/git-annex-wanted.mdwn
@ -9,7 +9,7 @@ git annex wanted `repository [expression]`
 # DESCRIPTION

 When run with an expression, configures the content that is preferred
-to be held in the archive. See [[git-annex-preferred-content]](1)
+to be held in the repository. See [[git-annex-preferred-content]](1)

 For example:

--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@ -252,7 +252,6 @@ content from the key-value store.
  
  See [[git-annex-configremote]](1) for details.

-
 * `renameremote`

  Renames a special remote.
@ -327,6 +326,31 @@ content from the key-value store.
  
  See [[git-annex-required]](1) for details.

+* `initcluster`
+
+  Initializes a new cluster.
+  
+  See [[git-annex-initcluster](1) for details.
+
+* `updatecluster`
+
+  Update records of cluster nodes.
+  
+  See [[git-annex-updatecluster](1) for details.
+
+* `extendcluster`
+
+  Adds an additional gateway to a cluster.
+  
+  See [[git-annex-extendcluster](1) for details.
+
+
+* `updateproxy`
+
+  Update records with proxy configuration.
+  
+  See [[git-annex-updateproxy](1) for details.
+
 * `schedule repository [expression]`

  Get or set scheduled jobs.
@ -1372,6 +1396,15 @@ repository, using [[git-annex-config]]. See its man page for a list.)
  set in global git configuration.
  For details, see <https://git-annex.branchable.com/tuning/>.

+* `annex.cluster.<name>`
+
+  This is set to make the repository be a gateway to a cluster.
+  The value is the cluster UUID. Note that cluster UUIDs are not
+  the same as repository UUIDs, and a repository UUID cannot be used here.
+
+  Usually this is set up by running [[git-annex-initcluster]] or
+  [[git-annex-extendcluster]].
+
 # CONFIGURATION OF REMOTES

 Remotes are configured using these settings in `.git/config`.
@ -1640,6 +1673,38 @@ Remotes are configured using these settings in `.git/config`.
  content of any file, even though its normal location tracking does not
  indicate that it does. This will cause git-annex to try to get all file
  contents from the remote. Can be useful in setting up a caching remote.
+	
+* `remote.<name>.annex-proxy`
+
+  Set to "true" to make the local repository able to act as a proxy to this
+  remote. 
+
+  After configuring this, run [[git-annex-updateproxy](1) to store
+  the new configuration in the git-annex branch.
+
+* `remote.<name>.annex-proxied-by`
+
+  Usually this is used internally, when git-annex sets up proxied remotes,
+  and will not need to be configured. The value is the UUID of the
+  git-annex repository that proxies access to this remote.
+
+* `remote.<name>.annex-cluster-node`
+
+  Set to the name of a cluster to make this remote be part of
+  the cluster. Names of multiple clusters can be separated by
+  whitespace to make a remote be part of more than one cluster.
+  
+  After configuring this, run [[git-annex-updatecluster](1) to store
+  the new configuration in the git-annex branch.
+
+* `remote.<name>.annex-cluster-gateway`
+
+  Set to the UUID of a cluster that this remote serves as a gateway for.
+  Multiple UUIDs can be listed, separated by whitespace. When the local
+  repository is also a gateway for that cluster, it will proxy for the
+  nodes of the remote gateway.
+  
+  Usually this is set up by running [[git-annex-extendcluster]].

 * `remote.<name>.annex-private`

--- a/doc/internals.mdwn
+++ b/doc/internals.mdwn
@ -288,7 +288,7 @@ For example:
 These log files store per-remote content identifiers for keys.
 A given key may have any number of content identifiers.

-The format is a timestamp, followed by the uuid of the remote,
+The format is a timestamp, followed by the UUID of the remote,
 followed by the content identifiers which are separated by colons.
 If a content identifier contains a colon or \r or \n, it will be base64
 encoded. Base64 encoded values are indicated by prefixing them with "!".
@ -308,6 +308,33 @@ For example, this logs that a remote has an object stored using both

 (When those chunks are removed from the remote, the 9 is changed to 0.)

+## `proxy.log`
+
+Used to record what repositories are accessible via a proxy.
+
+Each line starts with a timestamp, then the UUID of the repository
+that can serve as a proxy, and then a list of the remotes that it can
+proxy to, separated by spaces.
+
+Each remote in the list consists of a repository's UUID, 
+followed by a colon (`:`) and then a remote name.
+
+For example:
+
+    1317929100.012345s e605dca6-446a-11e0-8b2a-002170d25c55 26339d22-446b-11e0-9101-002170d25c55:foo c076460c-2290-11ef-be53-b7f0d194c863:bar
+
+## `cluster.log`
+
+Used to record the UUIDs of clusters, and the UUIDs of the nodes
+comprising each cluster. 
+
+Each line starts with a timestamp, then the UUID the cluster,
+followed by a list of the UUIDs of its nodes, separated by spaces.
+
+For example:
+
+    1317929100.012345s 5b070cc8-29b8-11ef-80e1-0fd524be241b 5c0c97d2-29b8-11ef-b1d2-5f3d1c80940d 5c40375e-29b8-11ef-814d-872959d2c013
+
 ## `schedule.log`

 Used to record scheduled events, such as periodic fscks.
--- a/doc/links/key_concepts.mdwn
+++ b/doc/links/key_concepts.mdwn
@ -4,4 +4,10 @@
 * [[how_it_works]]
 * [[special_remotes]]
 * [[workflows|workflow]]
+* [[preferred_content]]
 * [[sync]]
+
+### new features
+
+* [[tips/clusters]]
+* [[git-remote-annex|tips/storing_a_git_repository_on_any_special_remote]]
--- a/doc/tips/clusters.mdwn
+++ b/doc/tips/clusters.mdwn
@ -0,0 +1,217 @@
+A cluster is a collection of git-annex repositories which are combined to
+form a single logical repository.
+
+A cluster is accessed via a gateway repository. The gateway is not itself
+a node of the cluster.
+
+[[!toc ]]
+
+## using a cluster
+
+To use a cluster, your repository needs to have its gateway configured as a
+remote. Clusters can currently only be accessed via ssh. This gateway
+remote is added the same as any other remote:
+
+    git remote add bigserver me@bigserver:annex
+
+The gateway publishes information about the cluster to the git-annex
+branch. So you may need to fetch from it to learn about the cluster:
+
+    git fetch bigserver
+
+That will make available an additional remote for the cluster, eg
+"bigserver-mycluster", as well as some remotes for each node eg
+"bigserver-node1", "bigserver-node2", etc.
+
+You can get files from the cluster without caring which node it comes
+from:
+
+    $ git-annex get foo --from bigserver-mycluster
+    copy foo (from bigserver-mycluster...) ok
+
+And you can send files to the cluster, without caring what nodes
+they are stored to:
+
+    $ git-annex move bar --to bigserver-mycluster
+    move bar (to bigserver-mycluster...) ok
+
+In fact, a single upload like that can be sent to every node of the cluster
+at once, very efficiently.
+    
+    $ git-annex whereis bar
+	whereis bar (3 copies)
+	  	acae2ff6-6c1e-8bec-b8b9-397a3755f397 -- [bigserver-mycluster]
+	   	9f514001-6dc0-4d83-9af3-c64c96626892 -- node 1 [bigserver-node1]
+	   	d81e0b28-612e-4d73-a4e6-6dabbb03aba1 -- node 2 [bigserver-node2]
+	    5657baca-2f11-11ef-ae1a-5b68c6321dd9 -- node 3 [bigserver-node3]
+
+Notice that the file is shown as present in the cluster, as well as on
+individual nodes. But the cluster itself does not count as a copy of the file,
+so the 3 copies are the copies on individual nodes.
+
+Most other git-annex commands that operate on repositories can also operate on
+clusters.
+
+A cluster is not a git repository, and so `git pull bigserver-mycluster`
+will not work.
+
+## preferred content of clusters
+
+The preferred content of the cluster can be configured. This tells
+users what files the cluster as a whole should contain.
+
+To configure the preferred content of a cluster, as well as other related
+things like [[groups|git-annex-group]] and [[required_content]], it's easiest
+to do the configuration in a repository that has the cluster as a remote.
+
+For example:
+
+	$ git-annex wanted bigserver-mycluster standard
+	$ git-annex group bigserver-mycluster archive
+
+By default, when a file is uploaded to a cluster, it is stored on every node of
+the cluster. To control which nodes to store to, the [[preferred_content]] of
+each node can be configured.
+
+It's also a good idea to configure the preferred content of the cluster's
+gateway. To avoid files redundantly being stored on the gateway
+(which remember, is not a node of the cluster), you might make it not want
+any files:
+
+    $ git-annex wanted bigserver nothing
+
+## setting up a cluster
+
+A new cluster first needs to be initialized. Run [[git-annex-initcluster]] in
+the repository that will serve as the cluster's gateway. In the example above,
+this was the "bigserver" repository.
+
+	$ git-annex initcluster mycluster
+
+Once a cluster is initialized, the next step is to add nodes to it.
+To make a remote be a node of the cluster, configure 
+`git config remote.name.annex-cluster-node`, setting it to the
+name of the cluster.
+
+In the example above, the three cluster nodes were configured like this:
+
+	$ git remote add node1 /media/disk1/repo
+	$ git remote add node2 /media/disk2/repo
+	$ git remote add node3 /media/disk3/repo
+	$ git config remote.node1.annex-cluster-node mycluster
+	$ git config remote.node2.annex-cluster-node mycluster
+	$ git config remote.node3.annex-cluster-node mycluster
+
+Finally, run `git-annex updatecluster` to record the cluster configuration
+in the git-annex branch. That tells other repositories about the cluster.
+	
+	$ git-annex updatecluster
+	Added node node1 to cluster: mycluster
+	Added node node2 to cluster: mycluster
+	Added node node3 to cluster: mycluster
+	Started proxying for node1
+	Started proxying for node2
+	Started proxying for node3
+
+Operations that affect multiple nodes of a cluster can often be sped up by
+configuring annex.jobs in the repository that will serve the cluster to
+clients. In the example above, the nodes are all disk bound, so operating
+on more than one at a time will likely be faster.
+
+    $ git config annex.jobs cpus
+
+## adding additional gateways to a cluster
+
+A cluster can have more than one gateway. One way to use this is to
+make a cluster that is distributed across several locations.
+
+Suppose you have a datacenter in AMS, and one in NYC. There
+will be a gateway in each datacenter which provides access to the nodes
+there. And the gateways will relay data between each other as well.
+
+Start by setting up the cluster in Amsterdam. The process is the same
+as in the previous section.
+
+	AMS$ git-annex initcluster mycluster
+	AMS$ git remote add node1 /media/disk1/repo
+	AMS$ git remote add node2 /media/disk2/repo
+	AMS$ git config remote.node1.annex-cluster-node mycluster
+	AMS$ git config remote.node2.annex-cluster-node mycluster
+	AMS$ git-annex updatecluster
+    AMS$ git config annex.jobs cpus
+
+Now in a clone of the same repository in NYC, add AMS as a git remote
+accessed with ssh:
+
+    NYC$ git remote add AMS me@amsterdam.example.com:annex
+    NYC$ git fetch AMS
+
+Setting up the cluster in NYC is different, rather than using
+`git-annex initcluster` again (which would make a new, different
+cluster), we ask git-annex to extend the cluster from AMS:
+
+    NYC$ git-annex extendcluster AMS mycluster
+
+The rest of the setup process for NYC is the same, of course different
+nodes are added.
+	
+	NYC$ git remote add node3 /media/disk3/repo
+	NYC$ git remote add node4 /media/disk4/repo
+	NYC$ git config remote.node3.annex-cluster-node mycluster
+	NYC$ git config remote.node4.annex-cluster-node mycluster
+	NYC$ git-annex updatecluster
+    NYC$ git config annex.jobs cpus
+
+Finally, the AMS side of the cluster has to be updated, adding a git remote
+for NYC, and extending the cluster to there as well:
+
+    AMS$ git remote add NYC me@nyc.example.com:annex
+    AMS$ git-annex sync NYC
+    NYC$ git-annex extendcluster NYC mycluster
+
+A user can now add either AMS or NYC as a remote, and will have access
+to the entire cluster as either `AMS-mycluster` or `NYC-mycluster`.
+
+    user$ git-annex move foo --to AMS-mycluster
+    move foo (to AMS-mycluster...) ok
+
+Looking at where files end up, all the nodes are visible, not only those
+served by the current gateway.
+
+    user$ git-annex whereis foo
+	whereis foo (4 copies)
+	  	acfc1cb2-b8d5-8393-b8dc-4a419ea38183 -- cluster mycluster [AMS-mycluster]
+	   	11ab09a9-7448-45bd-ab81-3997780d00b3 -- node4 [AMS-NYC-node4]
+	   	36197d0e-6d49-4213-8440-71cbb121e670 -- node2 [AMS-node2]
+	   	43652651-1efa-442a-8333-eb346db31553 -- node3 [AMS-NYC-node3]
+	   	7fb5a77b-77a3-4032-b3e5-536698e308b3 -- node1 [AMS-node1]
+	ok
+
+Notice that remotes for cluster nodes have names indicating the path through
+the cluster used to access them. For example, "AMS-NYC-node3" is accessed via
+the AMS gateway, which then relays to NYC where node3 is located.
+
+## considerations for multi-gateway clusters
+
+When a cluster has multiple gateways, nothing keeps the git repositories on
+the gateways in sync. A branch pushed to one gateway will not be able to
+be pulled from another one. And gateways only learn about the locations of
+keys that are uploaded to the cluster via them. So in the example above,
+after an upload to AMS-mycluster, NYC-mycluster will only know that the
+key is stored in its nodes, but won't know that it's stored in nodes
+behind AMS. So, it's best to have a single git repository that is synced
+with, or perhaps run [[git-annex-remotedaemon]] on each gateway to keep
+its git repository in sync with the other gateways.
+
+Clusters can be constructed with any number of gateways, and any internal
+topology of connections between gateways. But there must always be a path
+from any gateway to all nodes of the cluster, otherwise a key won't
+be able to be stored from, or retrieved from some nodes.
+
+It's best to avoid there being multiple paths to a node that go via
+different gateways, since all paths will be tried in parallel when eg,
+uploading a key to the cluster.
+
+A breakdown in communication between gateways will temporarily split the
+cluster. When communication resumes, some keys may need to be copied to
+additional nodes.
--- a/doc/todo/git-annex_proxies.mdwn
+++ b/doc/todo/git-annex_proxies.mdwn
@ -11,7 +11,7 @@ repositories.
 Joey has received funding to work on this.
 Planned schedule of work:

-* June: git-annex proxy
+* June: git-annex proxies and clusters
 * July, part 1: git-annex proxy support for exporttree
 * July, part 2: p2p protocol over http
 * August: balanced preferred content
@ -24,7 +24,49 @@ Planned schedule of work:

 In development on the `proxy` branch.

-For June's work on [[design/passthrough_proxy]], implementation plan:
+For June's work on [[design/passthrough_proxy]], remaining todos:
+
+* Since proxying to special remotes is not supported yet, and won't be for
+  the first release, make it fail in a reasonable way.
+
+- or -
+
+* Proxying for special remotes.
+  Including encryption and chunking. See design for issues.
+
+# items deferred until later for [[design/passthrough_proxy]]
+
+* Indirect uploads when proxying for special remote
+  (to be considered). See design.
+
+* Getting a key from a cluster currently picks from amoung
+  the lowest cost remotes at random. This could be smarter,
+  eg prefer to avoid using remotes that are doing other transfers at the
+  same time.
+
+* The cost of a proxied node that is accessed via an intermediate gateway
+  is currently the same as a node accessed via the cluster gateway.
+  To fix this, there needs to be some way to tell how many hops through
+  gateways it takes to get to a node. Currently the only way is to
+  guess based on number of dashes in the node name, which is not satisfying.
+
+  Even counting hops is not very satisfying, one cluster gateway could
+  be much more expensive to traverse than another one.
+
+  If seriously tackling this, it might be worth making enough information
+  available to use spanning tree protocol for routing inside clusters.
+
+* Optimise proxy speed. See design for ideas.
+
+* Use `sendfile()` to avoid data copying overhead when
+  `receiveBytes` is being fed right into `sendBytes`.
+  Library to use:
+  <https://hackage.haskell.org/package/hsyscall-0.4/docs/System-Syscall.html>
+
+* Support using a proxy when its url is a P2P address.
+  (Eg tor-annex remotes.)
+
+# completed items for June's work on [[design/passthrough_proxy]]:

 * UUID discovery via git-annex branch. Add a log file listing UUIDs
  accessible via proxy UUIDs. It also will contain the names
@ -40,7 +82,7 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
 * Proxy should update location tracking information for proxied remotes,
  so it is available to other users who sync with it. (done)

-* Implement `git-annex updatecluster` command (done)
+* Implement `git-annex initcluster` and `git-annex updatecluster` commands (done)

 * Implement cluster UUID insertation on location log load, and removal
  on location log store. (done)
@ -48,66 +90,39 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
 * Omit cluster UUIDs when constructing drop proofs, since lockcontent will
  always fail on a cluster. (done)

-* Don't count cluster UUID as a copy. (done)
+* Don't count cluster UUID as a copy in numcopies checking etc. (done)

 * Tab complete proxied remotes and clusters in eg --from option. (done)

 * Getting a key from a cluster should proxy from one of the nodes that has
  it. (done)

-* Getting a key from a cluster currently always selects the lowest cost
-  remote, and always the same remote if cost is the same. Should
-  round-robin amoung remotes, and prefer to avoid using remotes that
-  other git-annex processes are currently using.
-
-* Implement upload with fanout and reporting back additional UUIDs over P2P
-  protocol. (done, but need to check for fencepost errors on resume of
-  incomplete upload with remotes at different points)
-
-* On upload to cluster, send to nodes where it's preferred content, and not
-  to other nodes.
+* Implement upload with fanout to multiple cluster nodes and reporting back
+  additional UUIDs over P2P protocol. (done)

 * Implement cluster drops, trying to remove from all nodes, and returning
-  which UUIDs it was dropped from. 
+  which UUIDs it was dropped from. (done)

-  Problem: May lock content on cluster
-  nodes to satisfy numcopies (rather than locking elsewhere) and so not be
-  able to drop from nodes. Avoid using cluster nodes when constructing drop
-  proof for cluster.
+* `git-annex testremote` works against proxied remote and cluster. (done)

-  Problem: When nodes are special remotes, may
-  treat nodes as copies while dropping from cluster, and so violate
-  numcopies. (But not mincopies.)
+* Avoid `git-annex sync --content` etc from operating on cluster nodes by
+  default since syncing with a cluster implicitly syncs with its nodes. (done)

-  Problem: `move --from cluster` in "does this make it worse"
-  check may fail to realize that dropping from multiple nodes does in fact
-  make it worse.
+* On upload to cluster, send to nodes where its preferred content, and not
+  to other nodes. (done)

-* On upload to a cluster, as well as fanout to nodes, if the key is
-  preferred content of the proxy repository, store it there.
-  (But not when preferred content is not configured.)
-  And on download from a cluster, if the proxy repository has the content,
-  get it from there to avoid the overhead of proxying to a node.
+* Support annex.jobs for clusters. (done)

-* Basic proxying to special remote support (non-streaming).
+* Add `git-annex extendcluster` command and extend `git-annex updatecluster` 
+  to support clusters with multiple gateways. (done)

-* Support proxies-of-proxies better, eg foo-bar-baz.
-  Currently, it does work, but have to run `git-annex updateproxy`
-  on foo in order for it to notice the bar-baz proxied remote exists,
-  and record it as foo-bar-baz. Make it skip recording proxies of
-  proxies like that, and instead automatically generate those from the log.
-  (With cycle prevention there of course.)
+* Support proxying for a remote that is proxied by another gateway of
+  a cluster. (done)

-* Cycle prevention including cluster-in-cluster cycles. See design.
+* Support distributed clusters: Make a proxy for a cluster repeat
+  protocol messages on to any remotes that have the same UUID as
+  the cluster. Needs extension to P2P protocol to avoid cycles.
+  (done)

-* Optimise proxy speed. See design for ideas.
-
-* Use `sendfile()` to avoid data copying overhead when
-  `receiveBytes` is being fed right into `sendBytes`.
-
-* Encryption and chunking. See design for issues.
-
-* Indirect uploads (to be considered). See design.
-
-* Support using a proxy when its url is a P2P address.
-  (Eg tor-annex remotes.)
+* Proxied cluster nodes should have slightly higher cost than the cluster
+  gateway. (done)
--- a/doc/todo/transitive_transfers.mdwn
+++ b/doc/todo/transitive_transfers.mdwn
@ -6,7 +6,7 @@ remotes.

 So this todo remains open, but is now only concerned with
 streaming an object that is being received from one remote out to another
-remote without first needing to buffer the whole object on disk.
+repository without first needing to buffer the whole object on disk.

 git-annex's remote interface does not currently support that.
 `retrieveKeyFile` stores the object into a file. And `storeKey`
@ -27,3 +27,7 @@ Recieving to a file, and sending from the same file as it grows is one
 possibility, since that would handle buffering, and it might avoid needing
 to change interfaces as much. It would still need a new interface since the
 current one does not guarantee the file is written in-order.
+
+A fifo is a possibility, but would certianly not work with remotes
+that don't write to the file in-order. Also resuming a download would not
+work with a fifo, the sending remote wouldn't know where to resume from.