draft async extension

This commit is contained in:
Joey Hess 2020-08-11 16:42:09 -04:00
parent db1c6da84b
commit ddf69bf5b8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 157 additions and 13 deletions
doc
design
todo/idea__58___external_special_remote___34__async__34___protocol_for_transfers

View file

@ -43,12 +43,12 @@ the version of the protocol it is using.
Recent versions of git-annex respond with a message indicating
protocol extensions that it supports. Older versions of
git-annex do not send this message.
git-annex do not send this message.
EXTENSIONS INFO
EXTENSIONS INFO ASYNC
The special remote can respond to that with its own EXTENSIONS message, which
could have its own protocol extension details, but none are currently used.
The special remote can respond to that with its own EXTENSIONS message, listing
any extensions it wants to use.
(It's also fine to reply with UNSUPPORTED-REQUEST.)
EXTENSIONS
@ -132,8 +132,9 @@ The following requests *must* all be supported by the special remote.
It's important that, while a Key is being stored, `CHECKPRESENT`
not indicate it's present until all the data has been transferred.
While the transfer is running, the remote can send any number of
`PROGRESS` messages. Once the transfer is done, it finishes by sending
one of these replies:
`PROGRESS` messages to indicate its progress. It can also send any of the
other special remote messages. Once the transfer is done, it finishes by
sending one of these replies:
* `TRANSFER-SUCCESS STORE|RETRIEVE Key`
Indicates the transfer completed successfully.
* `TRANSFER-FAILURE STORE|RETRIEVE Key ErrorMsg`
@ -170,10 +171,10 @@ the special remote can reply with `UNSUPPORTED-REQUEST`.
Sent to indicate protocol extensions which git-annex is capable
of using. The list is a space-delimited list of protocol extension
keywords. The remote can reply to this with its own EXTENSIONS list.
See the section on extensions below for details.
* `EXTENSIONS List`
Sent in response to a EXTENSIONS request, the List could be used to indicate
protocol extensions that the special remote uses, but there are currently
no such extensions.
Sent in response to a EXTENSIONS request, to indicate the protocol
extensions that the special remote is using.
* `LISTCONFIGS`
Requests the remote to return a list of settings it uses (with
`GETCONFIG` and `SETCONFIG`). Providing a list makes `git annex initremote`
@ -381,14 +382,12 @@ handling a request.
* `DEBUG message`
Tells git-annex to display the message if --debug is enabled.
(git-annex does not send a reply to this message.)
These messages are protocol extensions; it's only safe to send them to
git-annex after it sent a EXTENSIONS that included the name of the message.
* `INFO message`
Tells git-annex to display the message to the user.
When git-annex is in --json mode, the message will be emitted immediately
in its own json object, with an "info" field.
This message is a protocol extension; it's only safe to send it to
git-annex after it sent an EXTENSIONS that included INFO.
(git-annex does not send a reply to this message.)
## general messages
@ -403,6 +402,17 @@ remote.
git-annex will not talk to it any further. If the program receives
an ERROR from git-annex, it can exit with its own ERROR.
## extensions
These protocol extensions are currently supported.
* `INFO`
This makes the `INFO` message available to use.
* `ASYNC`
This lets multiple actions be performed at the same time by
a single external special remote program, rather than starting multiple
programs. See the [[async_appendix]] for details.
## signals
The external special remote program should not block SIGINT, or SIGTERM.

View file

@ -0,0 +1,117 @@
(This is a draft and not implemented yet.)
This is an appendix to the [[external_special_remote_protocol]].
[[!toc]]
## introduction
Normally, an external special remote can only be used to do one thing at a
time. When git-annex has concurrency enabled, it will start up multiple
processes for the same external special remote.
This extension lets a single external special remote process handle
multiple concurrent requests, which can be useful if multiple processes
would use too many resources, or if it can be better coordinated using a
single process.
## protocol overview
This extension is negotiated by git-annex sending an `EXTENSIONS` message
that includes `ASYNC`, and the external special remote responding in kind.
The rest of the protocol startup is as usual.
VERSION 1
EXTENSIONS INFO ASYNC
EXTENSIONS ASYNC
PREPARE
PREPARE-SUCCESS
Suppose git-annex wants to make some transfers. So it sends:
TRANSFER RETRIEVE Key1 file1
The special remote can at this point send any of the
[special remote messages](https://git-annex.branchable.com/design/external_special_remote_protocol/#index5h2)
it needs as usual, like `GETCONFIG` and `DIRHASH`, getting responses back from
git-annex. git-annex will not send any other requests yet.
(This is the only time it can send those messages, because git-annex
is waiting on its reply here.)
Once it's ready to start the async transfer, the special remote sends
`START-ASYNC`, with an identifier for this async job. (The identifier can
be anything you want to use, but the key is generally a good choice.)
START-ASYNC Key1
Once that's sent, git-annex can send its next request immediately,
while that transfer is still running. For example, it might request a
second transfer, and the special remote can reply when it's started that
transfer too:
TRANSFER RETRIEVE Key2 file2
START-ASYNC Key2
To indicate progress of transfers, the special remote can send
`UPDATE-ASYNC` messages, followed by usual PROGRESS messages:
UPDATE-ASYNC Key1
PROGRESS 10
UPDATE-ASYNC Key2
PROGRESS 500
UPDATE-ASYNC Key1
PROGRESS 20
Once a transfer is done, the special remote indicates this with an
`END-ASYNC` message, followed by the usual `TRANSFER-SUCCESS` or
`TRANSFER-FAILURE`:
END-ASYNC Key2
TRANSFER-SUCCESS RETRIEVE Key2
UPDATE-ASYNC Key1
PROGRESS 100
END-ASYNC Key1
TRANSFER-SUCCESS RETRIEVE Key1
This is not limited to transfers. Any and all requests that git-annex
makes can be handled async if the special remote wants to. For example:
CHECKPRESENT Key3
START-ASYNC Key3
CHECKPRESENT Key4
START-ASYNC Key4
REMOVE Key5
START_ASYNC Key5
END-ASYNC Key3
CHECKPRESENT-SUCCESS Key3
END-ASYNC Key4
CHECKPRESENT-FAILURE Key4
END-ASYNC Key5
REMOVE-SUCCESS Key5
## non-async replies
It's also fine to not use `START-ASYNC` for a request, and instead
use the usual protocol for the reply. This will prevent git-annex from
sending any other requests until it sees the reply.
Since git-annex only runs one external special remote process for
async-capable remotes, anything not processed async may result in
suboptimal performance, when the user has requested concurrency.
## added messages
Here's the details about the additions to the protocol.
* `START-ASYNC JobId`
Can be sent in response to any request git-annex sends. Indicates that
the request will be performed async. This lets git-annex immediately
send its next request, without waiting for this one to finish.
The JobId is an arbitrary string, typically a number or key etc.
* `END-ASYNC JobId`
Indicates that an async job is complete. Must be followed by
a protocol reply, indicating the result of the job.
* `UPDATE-ASYNC JobId`
Used to send additional information about an async job. Must be followed
by a protocol message giving the information. git-annex does not send any
reply. Used only for PROGRESS so far.

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2020-08-11T18:34:03Z"
content="""
[[design/external_special_remote_protocol/async_appendix]] has a draft
protocol extension.
I improved on the design, so any and all requests can be handled async,
or sequentially, as the external special remote prefers. Had to add async
job ids, but the protocol simplicity was worth it.
(Implementation will be something like, a thread relaying to and from the
special remote, with requests sent to it when it's not blocked, and with
its async replies sent back to the corresponding requester based on the
JobId.)
"""]]