thoughts on CGI, and use json

This commit is contained in:
Joey Hess 2024-07-05 10:08:43 -04:00
parent 3f9569e27f
commit 95ba4d4480
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 121 additions and 112 deletions

View file

@ -133,6 +133,8 @@ To remove a key's content from the server, the client sends:
The server responds with either SUCCESS or FAILURE. The server responds with either SUCCESS or FAILURE.
Note that if the content was not present, SUCCESS will be returned.
In protocol version 2, the server can optionally reply with SUCCESS-PLUS In protocol version 2, the server can optionally reply with SUCCESS-PLUS
or FAILURE-PLUS. Each has a subsequent list of UUIDs of repositories or FAILURE-PLUS. Each has a subsequent list of UUIDs of repositories
that the content was removed from. that the content was removed from.

View file

@ -18,72 +18,80 @@ With the [[passthrough_proxy]], this would let clients configure a single
http remote that accesses a more complicated network of git-annex http remote that accesses a more complicated network of git-annex
repositories. repositories.
## approach 1: encapsulation ## integration with git
One approach is to encapsulate the P2P protocol inside HTTP. This has the A webserver that is configured to serve a git repository either serves the
benefit of being simple to think about. It is not very web-native though. files in the repository with dumb http, or uses the git-http-backend CGI
program for url paths under eg `/git/`.
There would be a single API endpoint. The client connects and sends a To integrate with that, git-annex would need a git-annex-http-backend CGI
request that encapsulates one or more lines in the P2P protocol. The server program, that the webserver is configured to run for url paths under
sends a response that encapsulates one or more lines in the P2P `/git/.*/annex/`.
protocol.
For example (eliding the full HTTP responses, only showing the data): So, for a remote with an url `http://example.com/git/foo`, git-annex would
use paths under `http://example.com/git/foo/annex/` to run its CGI.
> POST /git-annex HTTP/1.0 But, the CGI interface is a poor match for the P2P protocol.
> Content-Type: x-git-annex-p2p
> Content-Length: ...
>
> AUTH 79a5a1f4-07e8-11ef-873d-97f93ca91925
< AUTH-SUCCESS ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6
> POST /git-annex HTTP/1.0 A particular problem is that `LOCKCONTENT` would need to be in one CGI
> Content-Type: x-git-annex-p2p request, followed by another request to `UNLOCKCONTENT`. Unless
> Content-Length: ... git-annex-http-backend forked a daemon to keep the content locked, it would
> not be able to retain a file lock across the 2 requests. While the 10
> VERSION 1 minute retention lock would paper over that, UNLOCKCONTENT would not be
< VERSION 1 able to delete the retention lock, because there is no way to know if
another LOCKCONTENT was received later. So LOCKCONTENT would always lock
content for 10 minutes. Which would result in some undesirable behaviors.
> POST /git-annex HTTP/1.0 Another problem is with proxies and clusters. The CGI would need to open
> Content-Type: x-git-annex-p2p ssh (or http) connections to the proxied repositories and cluster nodes
> Content-Length: ... each time it is run. That would add a lot of latency to every request.
>
> CHECKPRESENT SHA1--foo
< SUCCESS
> POST /git-annex HTTP/1.0 And running a git-annex process once per CGI request also makes git-annex's
> Content-Type: x-git-annex-p2p own startup speed, which is ok but not great, add latency. And each time
> Content-Length: ... the CGI needed to change the git-annex branch, it would have to commit on
> shutdown. Lots of time and space optimisations would be prevented by using
> PUT bar SHA1--bar the CGI interface.
< PUT-FROM 0
> POST /git-annex HTTP/1.0 So, rather than having the CGI program do anything in the repository
> Content-Type: x-git-annex-p2p itself, have it pass each request through to a long-running server.
> Content-Length: ... (This does have the downside that files would get double-copied
> through the CGI, which adds some overhead.)
> DATA 3 A reasonable way to do that would be to have a webserver speaking a
> foo HTTP version of the git-annex P2P protocol and the CGI just talks to that.
> VALID
< SUCCESS
Note that, since VERSION is negotiated in one request, the HTTP server The CGI program then becomes tiny, and just needs to know the url to
needs to know that a series of requests are part of the same P2P protocol connect to the git-annex HTTP server.
session. In the example above, it would not have a good way to do that.
One solution would be to add a session identifier UUID to each request.
## approach 2: websockets Alternatively, a remote's configuration could include that url, and
then we don't need the complication and overhead of the CGI program at all.
Eg:
git config remote.origin.annex-url http://example.com:8080/
So, the rest of this design will focus on implementing that. The CGI
program can be added later if desired, so avoid users needing to configure
an additional thing.
Note that, one nice benefit of having a separate annex-url is it allows
having remote.origin.url on eg github, but with an annex-url configured
that remote can also be used as a git-annex repository.
## approach 1: websockets
The client connects to the server over a websocket. From there on, The client connects to the server over a websocket. From there on,
the protocol is encapsulated in websockets. the protocol is encapsulated in websockets.
This seems nice and simple, but again not very web native. This seems nice and simple to implement, but not very web native. Anyone
wanting to talk to this web server would need to understand the P2P
protocol. Just to upload a file would need to deal with AUTH,
AUTH-SUCCESS, AUTH-FAILURE, VERSION, PUT, ALREADY-HAVE, PUT-FROM, DATA,
INVALID, VALID, SUCCESS, and FAILURE messages. Seems like a lot.
Some requests like `LOCKCONTENT` seem likely to need full duplex Some requests like `LOCKCONTENT` do need full duplex communication like
communication like websockets provide. But, it might be more web native to websockets provide. But, it might be more web native to only use websockets
only use websockets for that request, and not for everything. for that request, and not for everything.
## approach 3: HTTP API ## approach 2: web-native API
Another approach is to define a web-native API with endpoints that Another approach is to define a web-native API with endpoints that
correspond to each action in the P2P protocol. correspond to each action in the P2P protocol.
@ -101,13 +109,13 @@ Something like this:
> POST /git-annex/v1/PUT?key=SHA1--foo&associatedfile=bar&put-from=0&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.0 > POST /git-annex/v1/PUT?key=SHA1--foo&associatedfile=bar&put-from=0&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.0
> Content-Type: application/octet-stream > Content-Type: application/octet-stream
> Content-Length: 4 > Content-Length: 20
> foo1 > foo
< SUCCESS > {"valid": true}
< {"stored": true}
(In the last example above "foo" is the content, there is an additional byte at the end that (In the last example above "foo" is the content, it is followed by a line of json.
is 1 for VALID and 0 for INVALID. This seems better than needing an entire This seems better than needing an entire other request to indicate validitity.)
other request to indicate validitity.)
This needs a more complex spec. But it's easier for others to implement, This needs a more complex spec. But it's easier for others to implement,
especially since it does not need a session identifier, so the HTTP server can especially since it does not need a session identifier, so the HTTP server can

View file

@ -73,14 +73,14 @@ Checks if a key is currently present on the server.
Example: Example:
> POST /git-annex/v3/checkpresent?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1 > POST /git-annex/v3/checkpresent?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
< SUCCESS < {"present": true}
There is one required additional parameter, `key`. There is one required additional parameter, `key`.
The body of the request is empty. The body of the request is empty.
The server responds with "SUCCESS" if the key is present The server responds with a JSON object with a "present" field that is true
or "FAILURE" if it is not present. if the key is present, or false if it is not present.
### lockcontent ### lockcontent
@ -106,24 +106,22 @@ Remove a key's content from the server.
Example: Example:
> POST /git-annex/v3/remove?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1 > POST /git-annex/v3/remove?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
< SUCCESS < {"removed": true}
There is one required additional parameter, `key`. There is one required additional parameter, `key`.
The body of the request is empty. The body of the request is empty.
The server responds with "SUCCESS" if the key was removed, The server responds with a JSON object with a "removed" field that is true
or "FAILURE" if the key was not able to be removed. if the key was removed (or was not present on the server),
or false if the key was not able to be removed.
The server can also respond with "SUCCESS-PLUS" or "FAILURE-PLUS". The JSON object can have an additional field "plusuuids" that is a list of
Each has a subsequent list of UUIDs of repositories UUIDs of other repositories that the content was removed from.
that the content was removed from. For example:
SUCCESS-PLUS 702ce472-38a1-11ef-864f-23851a2edf71 707dea20-38a1-11ef-96a4-fb7e8c8369f0 If the server does not allow removing the key due to a policy
(eg due to being read-only or append-only), it will respond with a JSON
If the server was prevented from trying to remove the key due to a policy object with an "error" field that has an error message as its value.
(eg due to being read-only or append-only, it will respond with "ERROR",
followed by a space and an error message.
## remove-before ## remove-before
@ -132,13 +130,15 @@ Remove a key's content from the server, but only before a specified time.
Example: Example:
> POST /git-annex/v3/remove-before?timestamp=4949292929&key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1 > POST /git-annex/v3/remove-before?timestamp=4949292929&key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
< SUCCESS < {"removed": true}
This is the same as the `remove` request, but with an additional parameter, This is the same as the `remove` request, but with an additional parameter,
`timestamp`. `timestamp`.
If the server's monotonic clock is past the specified timestamp, the If the server's monotonic clock is past the specified timestamp, the
removal will fail. This is used to avoid removing content after a point in removal will fail and the server will respond with: `{"removed": false}`
This is used to avoid removing content after a point in
time where it is no longer locked in other repostitories. time where it is no longer locked in other repostitories.
## gettimestamp ## gettimestamp
@ -148,12 +148,12 @@ Gets the current timestamp from the server.
Example: Example:
> POST /git-annex/v3/gettimestamp?clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1 > POST /git-annex/v3/gettimestamp?clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
< TIMESTAMP 59459392 < {"timestamp": 59459392}
The body of the request is empty. The body of the request is empty.
The server responds with "TIMESTAMP" followed by a space and the current The server responds with JSON object with a timestmap field that has the
value of its monotonic clock, as a number of seconds. current value of its monotonic clock, as a number of seconds.
Important: If multiple servers are serving this protocol for the same Important: If multiple servers are serving this protocol for the same
repository, they MUST all use the same monotonic clock. repository, they MUST all use the same monotonic clock.
@ -166,13 +166,14 @@ Example:
> POST /git-annex/v3/put?key=SHA1--foo&associatedfile=bar&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1 > POST /git-annex/v3/put?key=SHA1--foo&associatedfile=bar&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
> Content-Type: application/octet-stream > Content-Type: application/octet-stream
> Content-Length: 4 > Content-Length: 20
> foo1 > foo
< SUCCESS > {"valid": true}
< {"stored": true}
There is one required additional parameter, `key`. There is one required additional parameter, `key`.
There is are also these optional parameters: There are are also these optional parameters:
* `associatedfile` * `associatedfile`
@ -186,28 +187,27 @@ There is are also these optional parameters:
The body of the request is the content of the key, starting from the The body of the request is the content of the key, starting from the
specified offset or from the beginning. After the content of the key, specified offset or from the beginning. After the content of the key,
there is one more byte. there is a newline, followed by a JSON object.
The additional byte is "1" to indicate that the content was not changed The JSON object has a field "valid" that is true when the content
while it was being sent, or "0" to indicate that modified content was sent was not changed while it was being sent, or false when modified
and should be disregarded by the server. (This corresponds content was sent and should be disregarded by the server. (This corresponds
to the `VALID` and `INVALID` messages in the P2P protocol.) to the `VALID` and `INVALID` messages in the P2P protocol.)
The `Content-Type` header should be `application/octet-stream`. The `Content-Type` header should be `application/octet-stream`.
The `Content-Length` header should be set to the length of the body. The `Content-Length` header should be set to the length of the body.
The server responds with `SUCCESS` if it received the data and stored the The server responds with a JSON object with a field "stored"
content. If it was unable to do so, it responds with `FAILURE`. that is true if it received the data and stored the
content.
The server can also reply with `SUCCESS-PLUS`, which has a subsequent list of The JSON object can have an additional field "plusuuids" that is a list of
UUIDs of repositories that the content was stored to. For example: UUIDs of other repositories that the content was stored to.
SUCCESS-PLUS 702ce472-38a1-11ef-864f-23851a2edf71 707dea20-38a1-11ef-96a4-fb7e8c8369f0 If the server does not allow storing the key due to a policy
(eg due to being read-only or append-only), it will respond with a JSON
If the server was prevented from storing the key due to a policy object with an "error" field that has an error message as its value.
(eg due to being read-only), it will respond with "ERROR", followed
by a space and an error message.
### putoffset ### putoffset
@ -220,17 +220,18 @@ the `put` request failing.
Example: Example:
> POST /git-annex/v3/putoffset?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1 > POST /git-annex/v3/putoffset?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
< 10 < {"offset": 10}
There is one required additional parameter, `key`. There is one required additional parameter, `key`.
The body of the request is empty. The body of the request is empty.
The server responds with the largest allowable offset. The server responds with a JSON object with an "offset" field that
is the largest allowable offset.
If the server was prevented from storing the key due to a policy If the server does not allow storing the key due to a policy
(eg due to being read-only), it will respond with "ERROR", followed (eg due to being read-only or append-only), it will respond with a JSON
by a space and an error message. object with an "error" field that has an error message as its value.
[Implementation note: This will be implemented by sending `PUT` and [Implementation note: This will be implemented by sending `PUT` and
returning the `PUT-FROM` offset. To avoid leaving the P2P protocol stuck returning the `PUT-FROM` offset. To avoid leaving the P2P protocol stuck
@ -246,8 +247,9 @@ Example:
> POST /git-annex/v3/get?key=SHA1--foo&associatedfile=bar&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1 > POST /git-annex/v3/get?key=SHA1--foo&associatedfile=bar&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
< Content-Type: application/octet-stream < Content-Type: application/octet-stream
< Content-Length: 4 > Content-Length: 20
< foo1 > foo
> {"valid": true}
There is one required additional parameter, `key`. There is one required additional parameter, `key`.
@ -271,17 +273,14 @@ The server's response will have a `Content-Length` header
set to the length of the body. set to the length of the body.
The server's response body is the content of the key, from the specified The server's response body is the content of the key, from the specified
offset. After the content of the key, there is one more byte. offset. After the content of the key, there is a newline, followed by a
JSON object.
The additional byte is "1" to indicate that the content was not changed The JSON object has a field "valid" that is true when the content
while it was being sent, or "0" to indicate that modified content was sent was not changed while it was being sent, or false when whatever
and should be discarded by the client. (This corresponds content was sent is not the actual content of the key and should be
to the `VALID` and `INVALID` messages in the P2P protocol.) disregared. (This corresponds to the `VALID` and `INVALID` messages
in the P2P protocol.)
Note that, if the server is not able to send the content of the requested
key, its response body will consist of "0", eg 0 bytes of content which is
not valid. On the other hand, a response body of "1" is used for an empty
key which is valid.
## simple HTTP GET ## simple HTTP GET
@ -301,6 +300,6 @@ the content of a key.
this HTTP protocol to support it. this HTTP protocol to support it.
`CONNECT` is not supported, and due to the bi-directional message passing `CONNECT` is not supported, and due to the bi-directional message passing
nature of it, it cannot easily be done over HTTP. It should not be nature of it, it cannot easily be done over HTTP (would need websockets).
necessary anyway, because the git repository itself can be accessed over It should not be necessary anyway, because the git repository itself can be
HTTP. accessed over HTTP.