started on a design for P2P protocol over HTTP
Added to git-annex_proxies todo because this is something OpenNeuro would need in order to use the git-annex proxy. Sponsored-by: Dartmouth College's OpenNeuro project
This commit is contained in:
parent
d28adebd6b
commit
cbaf2172ab
4 changed files with 139 additions and 5 deletions
|
@ -72,7 +72,7 @@ on its own line, followed by a newline and the binary data.
|
||||||
The Len value tells how many bytes of data to read.
|
The Len value tells how many bytes of data to read.
|
||||||
|
|
||||||
DATA 3
|
DATA 3
|
||||||
foo1
|
foo
|
||||||
|
|
||||||
Note that there is no newline after the binary data; the next protocol
|
Note that there is no newline after the binary data; the next protocol
|
||||||
message will come immediately after it.
|
message will come immediately after it.
|
||||||
|
|
125
doc/design/p2p_protocol_over_http.mdwn
Normal file
125
doc/design/p2p_protocol_over_http.mdwn
Normal file
|
@ -0,0 +1,125 @@
|
||||||
|
[[!toc ]]
|
||||||
|
|
||||||
|
## motivation
|
||||||
|
|
||||||
|
The [[P2P protocol]] is a custom protocol that git-annex speaks over a ssh
|
||||||
|
connection (mostly). This is a design working on supporting the P2P
|
||||||
|
protocol over HTTP.
|
||||||
|
|
||||||
|
Upload of annex objects to git remotes that use http is currently not
|
||||||
|
supported by git-annex, and this would be a generally very useful addition.
|
||||||
|
|
||||||
|
For use cases such as OpenNeuro's javascript client, ssh is too difficult
|
||||||
|
to support, so they currently use a special remote that talks to a http
|
||||||
|
endpoint in order to upload objects. Implementing this would let them
|
||||||
|
talk to git-annex over http.
|
||||||
|
|
||||||
|
With the [[passthrough_proxy]], this would let clients configure a single
|
||||||
|
http remote that accesses a more complicated network of git-annex
|
||||||
|
repositories.
|
||||||
|
|
||||||
|
## approach 1: encapsulation
|
||||||
|
|
||||||
|
One approach is to encapsulate the P2P protocol inside HTTP. This has the
|
||||||
|
benefit of being simple to think about. It is not very web-native though.
|
||||||
|
|
||||||
|
There would be a single API endpoint. The client connects and sends a
|
||||||
|
request that encapsulates one or more lines in the P2P protocol. The server
|
||||||
|
sends a response that encapsulates one or more lines in the P2P
|
||||||
|
protocol.
|
||||||
|
|
||||||
|
For example (eliding the full HTTP responses, only showing the data):
|
||||||
|
|
||||||
|
> POST /git-annex HTTP/1.0
|
||||||
|
> Content-Type: x-git-annex-p2p
|
||||||
|
> Content-Length: ...
|
||||||
|
>
|
||||||
|
> AUTH 79a5a1f4-07e8-11ef-873d-97f93ca91925
|
||||||
|
< AUTH_SUCCESS ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6
|
||||||
|
|
||||||
|
> POST /git-annex HTTP/1.0
|
||||||
|
> Content-Type: x-git-annex-p2p
|
||||||
|
> Content-Length: ...
|
||||||
|
>
|
||||||
|
> VERSION 1
|
||||||
|
< VERSION 1
|
||||||
|
|
||||||
|
> POST /git-annex HTTP/1.0
|
||||||
|
> Content-Type: x-git-annex-p2p
|
||||||
|
> Content-Length: ...
|
||||||
|
>
|
||||||
|
> CHECKPRESENT SHA1--foo
|
||||||
|
< SUCCESS
|
||||||
|
|
||||||
|
> POST /git-annex HTTP/1.0
|
||||||
|
> Content-Type: x-git-annex-p2p
|
||||||
|
> Content-Length: ...
|
||||||
|
>
|
||||||
|
> PUT bar SHA1--bar
|
||||||
|
< PUT-FROM 0
|
||||||
|
|
||||||
|
> POST /git-annex HTTP/1.0
|
||||||
|
> Content-Type: x-git-annex-p2p
|
||||||
|
> Content-Length: ...
|
||||||
|
>
|
||||||
|
> DATA 3
|
||||||
|
> foo
|
||||||
|
> VALID
|
||||||
|
< SUCCESS
|
||||||
|
|
||||||
|
Note that, since VERSION is negotiated in one request, the HTTP server
|
||||||
|
needs to know that a series of requests are part of the same P2P protocol
|
||||||
|
session. In the example above, it would not have a good way to do that.
|
||||||
|
One solution would be to add a session identifier UUID to each request.
|
||||||
|
|
||||||
|
## approach 2: HTTP API
|
||||||
|
|
||||||
|
Another approach is to define a web-native API with endpoints that
|
||||||
|
correspond to each action in the P2P protocol.
|
||||||
|
|
||||||
|
Something like this:
|
||||||
|
|
||||||
|
> GET /git-annex/v1/AUTH?clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925 HTTP/1.0
|
||||||
|
< AUTH_SUCCESS ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6
|
||||||
|
|
||||||
|
> GET /git-annex/v1/CHECKPRESENT?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.0
|
||||||
|
> SUCCESS
|
||||||
|
|
||||||
|
> GET /git-annex/v1/PUT-FROM?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.0
|
||||||
|
< PUT-FROM 0
|
||||||
|
|
||||||
|
> POST /git-annex/v1/PUT?key=SHA1--foo&associatedfile=bar&put-from=0&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.0
|
||||||
|
> Content-Type: application/octet-stream
|
||||||
|
> Content-Length: 4
|
||||||
|
> foo1
|
||||||
|
< SUCCESS
|
||||||
|
|
||||||
|
(In the last example above "foo" is the content, there is an additional byte at the end that
|
||||||
|
is 1 for VALID and 0 for INVALID. This seems better than needing an entire
|
||||||
|
other request to indicate validitity.)
|
||||||
|
|
||||||
|
This needs a more complex spec. But it's easier for others to implement,
|
||||||
|
especially since it does not need a session identifier, so the HTTP server can
|
||||||
|
be stateless.
|
||||||
|
|
||||||
|
## HTTP GET
|
||||||
|
|
||||||
|
It should be possible to support a regular HTTP get of a key, with
|
||||||
|
no additional parameters, so that annex objects can be served to other clients
|
||||||
|
from this web server.
|
||||||
|
|
||||||
|
> GET /git-annex/key/SHA1--foo HTTP/1.0
|
||||||
|
< foo
|
||||||
|
|
||||||
|
Although this would be a special case, not used by git-annex, because the P2P
|
||||||
|
protocol's GET has the complication of offsets, and of the server sending
|
||||||
|
VALID/INVALID after the content, and of needing to know the client's UUID in
|
||||||
|
order to update the location log.
|
||||||
|
|
||||||
|
## Problem: CONNECT
|
||||||
|
|
||||||
|
The CONNECT message allows both sides of the P2P protocol to send DATA
|
||||||
|
messages in any order. This seems difficult to encapsulate in HTTP.
|
||||||
|
|
||||||
|
Probably this can be not implemented, it's probably not needed for a HTTP
|
||||||
|
remote?
|
|
@ -36,7 +36,15 @@ cluster.
|
||||||
|
|
||||||
A proxy would not hold the content of files itself. It would be a clone of
|
A proxy would not hold the content of files itself. It would be a clone of
|
||||||
the git repository though, probably. Uploads and downloads would stream
|
the git repository though, probably. Uploads and downloads would stream
|
||||||
through the proxy. The git-annex [[P2P_protocol]] could be relayed in this way.
|
through the proxy.
|
||||||
|
|
||||||
|
## protocol
|
||||||
|
|
||||||
|
The git-annex [[P2P_protocol]] would be relayed via the proxy,
|
||||||
|
which would be a regular git ssh remote.
|
||||||
|
|
||||||
|
There is also the possibility of relaying the P2P protocol over another
|
||||||
|
protocol such as HTTP, see [[P2P_protocol_over_http]].
|
||||||
|
|
||||||
## UUID discovery
|
## UUID discovery
|
||||||
|
|
||||||
|
|
|
@ -3,8 +3,9 @@ git-annex to be able to use proxies which sit in front of a cluster of
|
||||||
repositories.
|
repositories.
|
||||||
|
|
||||||
1. [[design/passthrough_proxy]]
|
1. [[design/passthrough_proxy]]
|
||||||
2. [[design/balanced_preferred_content]]
|
2. [[design/p2p_protocol_over_http]]
|
||||||
3. [[todo/track_free_space_in_repos_via_git-annex_branch]]
|
3. [[design/balanced_preferred_content]]
|
||||||
4. [[todo/proving_preferred_content_behavior]]
|
4. [[todo/track_free_space_in_repos_via_git-annex_branch]]
|
||||||
|
5. [[todo/proving_preferred_content_behavior]]
|
||||||
|
|
||||||
[[!tag projects/openneuro]]
|
[[!tag projects/openneuro]]
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue