thoughts on CGI, and use json
This commit is contained in:
parent
3f9569e27f
commit
95ba4d4480
3 changed files with 121 additions and 112 deletions
|
@ -133,6 +133,8 @@ To remove a key's content from the server, the client sends:
|
|||
|
||||
The server responds with either SUCCESS or FAILURE.
|
||||
|
||||
Note that if the content was not present, SUCCESS will be returned.
|
||||
|
||||
In protocol version 2, the server can optionally reply with SUCCESS-PLUS
|
||||
or FAILURE-PLUS. Each has a subsequent list of UUIDs of repositories
|
||||
that the content was removed from.
|
||||
|
|
|
@ -18,72 +18,80 @@ With the [[passthrough_proxy]], this would let clients configure a single
|
|||
http remote that accesses a more complicated network of git-annex
|
||||
repositories.
|
||||
|
||||
## approach 1: encapsulation
|
||||
## integration with git
|
||||
|
||||
One approach is to encapsulate the P2P protocol inside HTTP. This has the
|
||||
benefit of being simple to think about. It is not very web-native though.
|
||||
A webserver that is configured to serve a git repository either serves the
|
||||
files in the repository with dumb http, or uses the git-http-backend CGI
|
||||
program for url paths under eg `/git/`.
|
||||
|
||||
There would be a single API endpoint. The client connects and sends a
|
||||
request that encapsulates one or more lines in the P2P protocol. The server
|
||||
sends a response that encapsulates one or more lines in the P2P
|
||||
protocol.
|
||||
To integrate with that, git-annex would need a git-annex-http-backend CGI
|
||||
program, that the webserver is configured to run for url paths under
|
||||
`/git/.*/annex/`.
|
||||
|
||||
For example (eliding the full HTTP responses, only showing the data):
|
||||
So, for a remote with an url `http://example.com/git/foo`, git-annex would
|
||||
use paths under `http://example.com/git/foo/annex/` to run its CGI.
|
||||
|
||||
> POST /git-annex HTTP/1.0
|
||||
> Content-Type: x-git-annex-p2p
|
||||
> Content-Length: ...
|
||||
>
|
||||
> AUTH 79a5a1f4-07e8-11ef-873d-97f93ca91925
|
||||
< AUTH-SUCCESS ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6
|
||||
But, the CGI interface is a poor match for the P2P protocol.
|
||||
|
||||
> POST /git-annex HTTP/1.0
|
||||
> Content-Type: x-git-annex-p2p
|
||||
> Content-Length: ...
|
||||
>
|
||||
> VERSION 1
|
||||
< VERSION 1
|
||||
A particular problem is that `LOCKCONTENT` would need to be in one CGI
|
||||
request, followed by another request to `UNLOCKCONTENT`. Unless
|
||||
git-annex-http-backend forked a daemon to keep the content locked, it would
|
||||
not be able to retain a file lock across the 2 requests. While the 10
|
||||
minute retention lock would paper over that, UNLOCKCONTENT would not be
|
||||
able to delete the retention lock, because there is no way to know if
|
||||
another LOCKCONTENT was received later. So LOCKCONTENT would always lock
|
||||
content for 10 minutes. Which would result in some undesirable behaviors.
|
||||
|
||||
> POST /git-annex HTTP/1.0
|
||||
> Content-Type: x-git-annex-p2p
|
||||
> Content-Length: ...
|
||||
>
|
||||
> CHECKPRESENT SHA1--foo
|
||||
< SUCCESS
|
||||
Another problem is with proxies and clusters. The CGI would need to open
|
||||
ssh (or http) connections to the proxied repositories and cluster nodes
|
||||
each time it is run. That would add a lot of latency to every request.
|
||||
|
||||
> POST /git-annex HTTP/1.0
|
||||
> Content-Type: x-git-annex-p2p
|
||||
> Content-Length: ...
|
||||
>
|
||||
> PUT bar SHA1--bar
|
||||
< PUT-FROM 0
|
||||
And running a git-annex process once per CGI request also makes git-annex's
|
||||
own startup speed, which is ok but not great, add latency. And each time
|
||||
the CGI needed to change the git-annex branch, it would have to commit on
|
||||
shutdown. Lots of time and space optimisations would be prevented by using
|
||||
the CGI interface.
|
||||
|
||||
> POST /git-annex HTTP/1.0
|
||||
> Content-Type: x-git-annex-p2p
|
||||
> Content-Length: ...
|
||||
>
|
||||
> DATA 3
|
||||
> foo
|
||||
> VALID
|
||||
< SUCCESS
|
||||
So, rather than having the CGI program do anything in the repository
|
||||
itself, have it pass each request through to a long-running server.
|
||||
(This does have the downside that files would get double-copied
|
||||
through the CGI, which adds some overhead.)
|
||||
A reasonable way to do that would be to have a webserver speaking a
|
||||
HTTP version of the git-annex P2P protocol and the CGI just talks to that.
|
||||
|
||||
Note that, since VERSION is negotiated in one request, the HTTP server
|
||||
needs to know that a series of requests are part of the same P2P protocol
|
||||
session. In the example above, it would not have a good way to do that.
|
||||
One solution would be to add a session identifier UUID to each request.
|
||||
The CGI program then becomes tiny, and just needs to know the url to
|
||||
connect to the git-annex HTTP server.
|
||||
|
||||
## approach 2: websockets
|
||||
Alternatively, a remote's configuration could include that url, and
|
||||
then we don't need the complication and overhead of the CGI program at all.
|
||||
Eg:
|
||||
|
||||
git config remote.origin.annex-url http://example.com:8080/
|
||||
|
||||
So, the rest of this design will focus on implementing that. The CGI
|
||||
program can be added later if desired, so avoid users needing to configure
|
||||
an additional thing.
|
||||
|
||||
Note that, one nice benefit of having a separate annex-url is it allows
|
||||
having remote.origin.url on eg github, but with an annex-url configured
|
||||
that remote can also be used as a git-annex repository.
|
||||
|
||||
## approach 1: websockets
|
||||
|
||||
The client connects to the server over a websocket. From there on,
|
||||
the protocol is encapsulated in websockets.
|
||||
|
||||
This seems nice and simple, but again not very web native.
|
||||
This seems nice and simple to implement, but not very web native. Anyone
|
||||
wanting to talk to this web server would need to understand the P2P
|
||||
protocol. Just to upload a file would need to deal with AUTH,
|
||||
AUTH-SUCCESS, AUTH-FAILURE, VERSION, PUT, ALREADY-HAVE, PUT-FROM, DATA,
|
||||
INVALID, VALID, SUCCESS, and FAILURE messages. Seems like a lot.
|
||||
|
||||
Some requests like `LOCKCONTENT` seem likely to need full duplex
|
||||
communication like websockets provide. But, it might be more web native to
|
||||
only use websockets for that request, and not for everything.
|
||||
Some requests like `LOCKCONTENT` do need full duplex communication like
|
||||
websockets provide. But, it might be more web native to only use websockets
|
||||
for that request, and not for everything.
|
||||
|
||||
## approach 3: HTTP API
|
||||
## approach 2: web-native API
|
||||
|
||||
Another approach is to define a web-native API with endpoints that
|
||||
correspond to each action in the P2P protocol.
|
||||
|
@ -101,13 +109,13 @@ Something like this:
|
|||
|
||||
> POST /git-annex/v1/PUT?key=SHA1--foo&associatedfile=bar&put-from=0&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.0
|
||||
> Content-Type: application/octet-stream
|
||||
> Content-Length: 4
|
||||
> foo1
|
||||
< SUCCESS
|
||||
> Content-Length: 20
|
||||
> foo
|
||||
> {"valid": true}
|
||||
< {"stored": true}
|
||||
|
||||
(In the last example above "foo" is the content, there is an additional byte at the end that
|
||||
is 1 for VALID and 0 for INVALID. This seems better than needing an entire
|
||||
other request to indicate validitity.)
|
||||
(In the last example above "foo" is the content, it is followed by a line of json.
|
||||
This seems better than needing an entire other request to indicate validitity.)
|
||||
|
||||
This needs a more complex spec. But it's easier for others to implement,
|
||||
especially since it does not need a session identifier, so the HTTP server can
|
||||
|
|
|
@ -73,14 +73,14 @@ Checks if a key is currently present on the server.
|
|||
Example:
|
||||
|
||||
> POST /git-annex/v3/checkpresent?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
|
||||
< SUCCESS
|
||||
< {"present": true}
|
||||
|
||||
There is one required additional parameter, `key`.
|
||||
|
||||
The body of the request is empty.
|
||||
|
||||
The server responds with "SUCCESS" if the key is present
|
||||
or "FAILURE" if it is not present.
|
||||
The server responds with a JSON object with a "present" field that is true
|
||||
if the key is present, or false if it is not present.
|
||||
|
||||
### lockcontent
|
||||
|
||||
|
@ -106,24 +106,22 @@ Remove a key's content from the server.
|
|||
Example:
|
||||
|
||||
> POST /git-annex/v3/remove?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
|
||||
< SUCCESS
|
||||
< {"removed": true}
|
||||
|
||||
There is one required additional parameter, `key`.
|
||||
|
||||
The body of the request is empty.
|
||||
|
||||
The server responds with "SUCCESS" if the key was removed,
|
||||
or "FAILURE" if the key was not able to be removed.
|
||||
The server responds with a JSON object with a "removed" field that is true
|
||||
if the key was removed (or was not present on the server),
|
||||
or false if the key was not able to be removed.
|
||||
|
||||
The server can also respond with "SUCCESS-PLUS" or "FAILURE-PLUS".
|
||||
Each has a subsequent list of UUIDs of repositories
|
||||
that the content was removed from. For example:
|
||||
The JSON object can have an additional field "plusuuids" that is a list of
|
||||
UUIDs of other repositories that the content was removed from.
|
||||
|
||||
SUCCESS-PLUS 702ce472-38a1-11ef-864f-23851a2edf71 707dea20-38a1-11ef-96a4-fb7e8c8369f0
|
||||
|
||||
If the server was prevented from trying to remove the key due to a policy
|
||||
(eg due to being read-only or append-only, it will respond with "ERROR",
|
||||
followed by a space and an error message.
|
||||
If the server does not allow removing the key due to a policy
|
||||
(eg due to being read-only or append-only), it will respond with a JSON
|
||||
object with an "error" field that has an error message as its value.
|
||||
|
||||
## remove-before
|
||||
|
||||
|
@ -132,13 +130,15 @@ Remove a key's content from the server, but only before a specified time.
|
|||
Example:
|
||||
|
||||
> POST /git-annex/v3/remove-before?timestamp=4949292929&key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
|
||||
< SUCCESS
|
||||
< {"removed": true}
|
||||
|
||||
This is the same as the `remove` request, but with an additional parameter,
|
||||
`timestamp`.
|
||||
|
||||
If the server's monotonic clock is past the specified timestamp, the
|
||||
removal will fail. This is used to avoid removing content after a point in
|
||||
removal will fail and the server will respond with: `{"removed": false}`
|
||||
|
||||
This is used to avoid removing content after a point in
|
||||
time where it is no longer locked in other repostitories.
|
||||
|
||||
## gettimestamp
|
||||
|
@ -148,12 +148,12 @@ Gets the current timestamp from the server.
|
|||
Example:
|
||||
|
||||
> POST /git-annex/v3/gettimestamp?clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
|
||||
< TIMESTAMP 59459392
|
||||
< {"timestamp": 59459392}
|
||||
|
||||
The body of the request is empty.
|
||||
|
||||
The server responds with "TIMESTAMP" followed by a space and the current
|
||||
value of its monotonic clock, as a number of seconds.
|
||||
The server responds with JSON object with a timestmap field that has the
|
||||
current value of its monotonic clock, as a number of seconds.
|
||||
|
||||
Important: If multiple servers are serving this protocol for the same
|
||||
repository, they MUST all use the same monotonic clock.
|
||||
|
@ -166,13 +166,14 @@ Example:
|
|||
|
||||
> POST /git-annex/v3/put?key=SHA1--foo&associatedfile=bar&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
|
||||
> Content-Type: application/octet-stream
|
||||
> Content-Length: 4
|
||||
> foo1
|
||||
< SUCCESS
|
||||
> Content-Length: 20
|
||||
> foo
|
||||
> {"valid": true}
|
||||
< {"stored": true}
|
||||
|
||||
There is one required additional parameter, `key`.
|
||||
|
||||
There is are also these optional parameters:
|
||||
There are are also these optional parameters:
|
||||
|
||||
* `associatedfile`
|
||||
|
||||
|
@ -186,28 +187,27 @@ There is are also these optional parameters:
|
|||
|
||||
The body of the request is the content of the key, starting from the
|
||||
specified offset or from the beginning. After the content of the key,
|
||||
there is one more byte.
|
||||
there is a newline, followed by a JSON object.
|
||||
|
||||
The additional byte is "1" to indicate that the content was not changed
|
||||
while it was being sent, or "0" to indicate that modified content was sent
|
||||
and should be disregarded by the server. (This corresponds
|
||||
The JSON object has a field "valid" that is true when the content
|
||||
was not changed while it was being sent, or false when modified
|
||||
content was sent and should be disregarded by the server. (This corresponds
|
||||
to the `VALID` and `INVALID` messages in the P2P protocol.)
|
||||
|
||||
The `Content-Type` header should be `application/octet-stream`.
|
||||
|
||||
The `Content-Length` header should be set to the length of the body.
|
||||
|
||||
The server responds with `SUCCESS` if it received the data and stored the
|
||||
content. If it was unable to do so, it responds with `FAILURE`.
|
||||
The server responds with a JSON object with a field "stored"
|
||||
that is true if it received the data and stored the
|
||||
content.
|
||||
|
||||
The server can also reply with `SUCCESS-PLUS`, which has a subsequent list of
|
||||
UUIDs of repositories that the content was stored to. For example:
|
||||
The JSON object can have an additional field "plusuuids" that is a list of
|
||||
UUIDs of other repositories that the content was stored to.
|
||||
|
||||
SUCCESS-PLUS 702ce472-38a1-11ef-864f-23851a2edf71 707dea20-38a1-11ef-96a4-fb7e8c8369f0
|
||||
|
||||
If the server was prevented from storing the key due to a policy
|
||||
(eg due to being read-only), it will respond with "ERROR", followed
|
||||
by a space and an error message.
|
||||
If the server does not allow storing the key due to a policy
|
||||
(eg due to being read-only or append-only), it will respond with a JSON
|
||||
object with an "error" field that has an error message as its value.
|
||||
|
||||
### putoffset
|
||||
|
||||
|
@ -220,17 +220,18 @@ the `put` request failing.
|
|||
Example:
|
||||
|
||||
> POST /git-annex/v3/putoffset?key=SHA1--foo&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
|
||||
< 10
|
||||
< {"offset": 10}
|
||||
|
||||
There is one required additional parameter, `key`.
|
||||
|
||||
The body of the request is empty.
|
||||
|
||||
The server responds with the largest allowable offset.
|
||||
The server responds with a JSON object with an "offset" field that
|
||||
is the largest allowable offset.
|
||||
|
||||
If the server was prevented from storing the key due to a policy
|
||||
(eg due to being read-only), it will respond with "ERROR", followed
|
||||
by a space and an error message.
|
||||
If the server does not allow storing the key due to a policy
|
||||
(eg due to being read-only or append-only), it will respond with a JSON
|
||||
object with an "error" field that has an error message as its value.
|
||||
|
||||
[Implementation note: This will be implemented by sending `PUT` and
|
||||
returning the `PUT-FROM` offset. To avoid leaving the P2P protocol stuck
|
||||
|
@ -246,8 +247,9 @@ Example:
|
|||
|
||||
> POST /git-annex/v3/get?key=SHA1--foo&associatedfile=bar&clientuuid=79a5a1f4-07e8-11ef-873d-97f93ca91925&serveruuid=ecf6d4ca-07e8-11ef-8990-9b8c1f696bf6 HTTP/1.1
|
||||
< Content-Type: application/octet-stream
|
||||
< Content-Length: 4
|
||||
< foo1
|
||||
> Content-Length: 20
|
||||
> foo
|
||||
> {"valid": true}
|
||||
|
||||
There is one required additional parameter, `key`.
|
||||
|
||||
|
@ -271,17 +273,14 @@ The server's response will have a `Content-Length` header
|
|||
set to the length of the body.
|
||||
|
||||
The server's response body is the content of the key, from the specified
|
||||
offset. After the content of the key, there is one more byte.
|
||||
offset. After the content of the key, there is a newline, followed by a
|
||||
JSON object.
|
||||
|
||||
The additional byte is "1" to indicate that the content was not changed
|
||||
while it was being sent, or "0" to indicate that modified content was sent
|
||||
and should be discarded by the client. (This corresponds
|
||||
to the `VALID` and `INVALID` messages in the P2P protocol.)
|
||||
|
||||
Note that, if the server is not able to send the content of the requested
|
||||
key, its response body will consist of "0", eg 0 bytes of content which is
|
||||
not valid. On the other hand, a response body of "1" is used for an empty
|
||||
key which is valid.
|
||||
The JSON object has a field "valid" that is true when the content
|
||||
was not changed while it was being sent, or false when whatever
|
||||
content was sent is not the actual content of the key and should be
|
||||
disregared. (This corresponds to the `VALID` and `INVALID` messages
|
||||
in the P2P protocol.)
|
||||
|
||||
## simple HTTP GET
|
||||
|
||||
|
@ -301,6 +300,6 @@ the content of a key.
|
|||
this HTTP protocol to support it.
|
||||
|
||||
`CONNECT` is not supported, and due to the bi-directional message passing
|
||||
nature of it, it cannot easily be done over HTTP. It should not be
|
||||
necessary anyway, because the git repository itself can be accessed over
|
||||
HTTP.
|
||||
nature of it, it cannot easily be done over HTTP (would need websockets).
|
||||
It should not be necessary anyway, because the git repository itself can be
|
||||
accessed over HTTP.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue