P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS

Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS
is complete, when a PUT stores to additional repositories
than the expected on, the location log is updated with the
additional UUIDs that contain the content.

Started implementing PUT fanout to multiple remotes for clusters.
It is untested, and I fear fencepost errors in the relative
offset calculations. And it is missing proxying for the protocol
after DATA.
This commit is contained in:
Joey Hess 2024-06-18 12:07:01 -04:00
parent ca08f3fcc2
commit f18740699e
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
12 changed files with 206 additions and 61 deletions

View file

@ -55,7 +55,7 @@ any authentication.
The client sends the highest protocol version it supports:
VERSION 2
VERSION 3
The server responds with the highest protocol version it supports
that is less than or equal to the version the client sent:
@ -132,7 +132,14 @@ spaces, since it's not the last token in the line. Use '%' to indicate
whitespace.)
The server may respond with ALREADY-HAVE if it already
had the conent of that key. Otherwise, it responds with:
had the conent of that key.
In protocol version 2, the server can optionally reply with
ALREADY-HAVE-PLUS. The subsequent list of UUIDs are additional
UUIDs where the content is stored, in addition to the UUID where
the client was going to send it.
Otherwise, it responds with:
PUT-FROM Offset
@ -152,6 +159,10 @@ was being sent.
If the server successfully receives the data and stores the content,
it replies with SUCCESS. Otherwise, FAILURE.
In protocol version 2, the server can optionally reply with SUCCESS-PLUS.
The subsequent list of UUIDs are additional UUIDs where the content was
stored, in addition to the UUID where the client was sending it.
## Getting content from the server
To get content from the server, the client sends:

View file

@ -251,31 +251,19 @@ No other protocol extensions or special cases should be needed.
If we want to send a file to multiple repositories that are behind the same
proxy, it would be wasteful to upload it through the proxy repeatedly.
Perhaps a good user interface to this is `git-annex copy --to proxy`.
The proxy could fan out the upload and store it in one or more nodes behind
it. Using preferred content to select which nodes to use.
This would need `storeKey` to be changed to allow returning a UUID (or UUIDs)
where the content was actually stored.
This is certianly needed when doing `git-annex copy --to remote-cluster`,
the cluster picks the nodes to store the content in, and it needs to report
back some UUID that is different than the cluster UUID, in order for the
location log to get updated. (Cluster UUIDs are not written to the location
log.) So this will need a change to the P2P protocol to support reporting
back additional UUIDs where the content was stored.
Alternatively, `git-annex copy --to proxy-foo` could notice that proxy-bar
also wants the content, and fan out a copy to there. Then it could
record in its git-annex branch that the content is present in proxy-bar.
If the user later does `git-annex copy --to proxy-bar`, it would avoid
another upload (and the user would learn at that point that it was in
proxy-bar). This avoids needing to change the `storeKey` interface.
Should a proxy always fanout? if `git-annex copy --to proxy` is what does
fanout, and `git-annex copy --to proxy-foo` doesn't, then the user has
content. But if the latter does fanout, that might be annoying to users who
want to use proxies, but want full control over what lands where, and don't
want to use preferred content to do it. So probably fanout should be
configurable. But it can't be configured client side, because the fanout
happens on the proxy. Seems like remote.name.annex-fanout could be set to
false to prevent fanout to a specific remote. (This is analagous to a
remote having `git-annex assistant` running on it, it might fan out uploads
to it to other repos, and only the owner of that repo can control it.)
Alternatively, fanout could be limited to clusters.
This might also be useful for proxies. `git-annex copy --to proxy-foo`
could notice that proxy-bar also wants the content, and fan out a copy to
there. But that might be annoying to users, who want full control over what
goes where when using a proxy. Seems it would need a config setting. But
since clusters will support fanout, it seems unncessary to make proxies
also support it.
A command like `git-annex push` would see all the instantiated remotes and
would pick ones to send content to. If fanout is done, this would