git-annex/doc/design/p2p_protocol_over_http.mdwn
2024-07-02 16:14:45 -04:00

145 lines
4.9 KiB
Markdown

[[!toc ]]
## motivation
The [[P2P protocol]] is a custom protocol that git-annex speaks over a ssh
connection (mostly). This is a design working on supporting the P2P
protocol over HTTP.
Upload of annex objects to git remotes that use http is currently not
supported by git-annex, and this would be a generally very useful addition.
For use cases such as OpenNeuro's javascript client, ssh is too difficult
to support, so they currently use a special remote that talks to a http
endpoint in order to upload objects. Implementing this would let them
talk to git-annex over http.
With the [[passthrough_proxy]], this would let clients configure a single
http remote that accesses a more complicated network of git-annex
repositories.
## approach 1: encapsulation
One approach is to encapsulate the P2P protocol inside HTTP. This has the
benefit of being simple to think about. It is not very web-native though.
There would be a single API endpoint. The client connects and sends a
request that encapsulates one or more lines in the P2P protocol. The server
sends a response that encapsulates one or more lines in the P2P
protocol.
For example (eliding the full HTTP responses, only showing the data):
> POST /git-annex HTTP/1.0
> Content-Type: x-git-annex-p2p
> Content-Length: ...
>
> AUTH 79a5a1f4-07e8-11ef-873d-97f93ca91925
< AUTH-SUCCESS ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6
> POST /git-annex HTTP/1.0
> Content-Type: x-git-annex-p2p
> Content-Length: ...
>
> VERSION 1
< VERSION 1
> POST /git-annex HTTP/1.0
> Content-Type: x-git-annex-p2p
> Content-Length: ...
>
> CHECKPRESENT SHA1--foo
< SUCCESS
> POST /git-annex HTTP/1.0
> Content-Type: x-git-annex-p2p
> Content-Length: ...
>
> PUT bar SHA1--bar
< PUT-FROM 0
> POST /git-annex HTTP/1.0
> Content-Type: x-git-annex-p2p
> Content-Length: ...
>
> DATA 3
> foo
> VALID
< SUCCESS
Note that, since VERSION is negotiated in one request, the HTTP server
needs to know that a series of requests are part of the same P2P protocol
session. In the example above, it would not have a good way to do that.
One solution would be to add a session identifier UUID to each request.
## approach 2: websockets
The client connects to the server over a websocket. From there on,
the protocol is encapsulated in websockets.
This seems nice and simple, but again not very web native.
Some requests like `LOCKCONTENT` seem likely to need full duplex
communication like websockets provide. But, it might be more web native to
only use websockets for that request, and not for everything.
## approach 3: HTTP API
Another approach is to define a web-native API with endpoints that
correspond to each action in the P2P protocol.
Something like this:
> POST /git-annex/v1/AUTH?clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925 HTTP/1.0
< AUTH-SUCCESS ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6
> POST /git-annex/v1/CHECKPRESENT?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.0
> SUCCESS
> POST /git-annex/v1/PUT-FROM?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.0
< PUT-FROM 0
> POST /git-annex/v1/PUT?key=SHA1--foo&associatedfile=bar&put-from=0&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.0
> Content-Type: application/octet-stream
> Content-Length: 4
> foo1
< SUCCESS
(In the last example above "foo" is the content, there is an additional byte at the end that
is 1 for VALID and 0 for INVALID. This seems better than needing an entire
other request to indicate validitity.)
This needs a more complex spec. But it's easier for others to implement,
especially since it does not need a session identifier, so the HTTP server can
be stateless.
A full draft protocol for this is being developed at [[p2p_protocol_over_http/draft1]].
## HTTP GET
It should be possible to support a regular HTTP get of a key, with
no additional parameters, so that annex objects can be served to other clients
from this web server.
> GET /git-annex/key/SHA1--foo HTTP/1.0
< foo
Although this would be a special case, not used by git-annex, because the P2P
protocol's GET has the complication of offsets, and of the server sending
VALID/INVALID after the content, and of needing to know the client's UUID in
order to update the location log.
## Problem: CONNECT
The CONNECT message allows both sides of the P2P protocol to send DATA
messages in any order. This seems difficult to encapsulate in HTTP.
Probably this can be not implemented, it's probably not needed for a HTTP
remote? This is used to tunnel git protocol over the P2P protocol, but for
a HTTP remote the git repository can be accessed over HTTP as well.
## security
Should support HTTPS and/or be limited to only HTTPS.
Authentication via http basic auth?