501 lines
22 KiB
Markdown
501 lines
22 KiB
Markdown
Communication between git-annex and a program implementing an external
|
|
special remote uses this protocol.
|
|
|
|
[[!toc]]
|
|
|
|
## starting the program
|
|
|
|
The external special remote program has a name like
|
|
`git-annex-remote-$bar`. When
|
|
`git annex initremote foo type=external externaltype=$bar` is run,
|
|
git-annex finds the appropriate program in PATH.
|
|
|
|
The program is started by git-annex when it needs to access the special
|
|
remote, and may be left running for a long period of time. This allows
|
|
it to perform expensive setup tasks, etc. Note that git-annex may choose to
|
|
start multiple instances of the program (eg, when multiple git-annex
|
|
commands are run concurrently in a repository).
|
|
|
|
## protocol overview
|
|
|
|
Communication is via stdin and stdout. Therefore, the external special
|
|
remote must avoid doing any prompting, or outputting anything like eg,
|
|
progress to stdout. (Such stuff can be sent to stderr instead.)
|
|
|
|
The protocol is line based. Messages are sent in either direction, from
|
|
git-annex to the special remote, and from the special remote to git-annex.
|
|
|
|
In order to avoid confusing interactions, one or the other has control
|
|
at any given time, and is responsible for sending requests, while the other
|
|
only sends replies to the requests.
|
|
|
|
Each protocol line starts with a command, which is followed by the
|
|
command's parameters (a fixed number per command), each separated by a
|
|
single space. The last parameter may contain spaces. Parameters may be
|
|
empty, but the separating spaces are still required in that case.
|
|
|
|
## example session
|
|
|
|
The special remote is responsible for sending the first message, indicating
|
|
the version of the protocol it is using.
|
|
|
|
VERSION 1
|
|
|
|
Recent versions of git-annex respond with a message indicating
|
|
protocol extensions that it supports. Older versions of
|
|
git-annex do not send this message.
|
|
|
|
EXTENSIONS INFO ASYNC
|
|
|
|
The special remote can respond to that with its own EXTENSIONS message, listing
|
|
any extensions it wants to use.
|
|
(It's also fine to reply with UNSUPPORTED-REQUEST.)
|
|
|
|
EXTENSIONS
|
|
|
|
Next, git-annex will generally send a message telling the special
|
|
remote to start up. (Or it might send an INITREMOTE or EXPORTSUPPORTED or
|
|
LISTCONFIGS, or perhaps other things in the future, so don't hardcode this
|
|
order.)
|
|
|
|
PREPARE
|
|
|
|
The special remote can now ask git-annex for its configuration, as needed,
|
|
and check that it's valid. git-annex responds with the configuration values
|
|
|
|
GETCONFIG directory
|
|
VALUE /media/usbdrive/repo
|
|
GETCONFIG automount
|
|
VALUE true
|
|
|
|
Once the special remote is satisfied with its configuration and is
|
|
ready to go, it tells git-annex that it's done with the PREPARE step:
|
|
|
|
PREPARE-SUCCESS
|
|
|
|
Now git-annex will make a request. Let's suppose it wants to store a key.
|
|
|
|
TRANSFER STORE somekey tmpfile
|
|
|
|
The special remote can then start reading the tmpfile and storing it.
|
|
While it's doing that, the special remote can send messages back to
|
|
git-annex to indicate what it's doing, or ask for other information.
|
|
It will typically send progress messages, indicating how many
|
|
bytes have been sent:
|
|
|
|
PROGRESS 10240
|
|
PROGRESS 20480
|
|
|
|
Once the key has been stored, the special remote tells git-annex the result:
|
|
|
|
TRANSFER-SUCCESS STORE somekey
|
|
|
|
Now git-annex will send its next request.
|
|
|
|
Once git-annex is done with the special remote, it will close its stdin.
|
|
The special remote program can then exit.
|
|
|
|
## git-annex request messages and replies
|
|
|
|
These are messages git-annex sends to the special remote program.
|
|
|
|
Once the special remote has finished performing the request, it should
|
|
send one of the listed replies.
|
|
|
|
The following requests *must* all be supported by the special remote.
|
|
|
|
* `INITREMOTE`
|
|
Requests the remote to initialize itself. This is where any one-time
|
|
setup tasks can be done, for example creating an Amazon S3 bucket.
|
|
Note: This may be run repeatedly over time, as a remote is initialized in
|
|
different repositories, or as the configuration of a remote is changed.
|
|
(Both `git annex initremote` and `git-annex enableremote` run this.)
|
|
So any one-time setup tasks should be done idempotently.
|
|
* `INITREMOTE-SUCCESS`
|
|
Indicates the INITREMOTE succeeded and the remote is ready to use.
|
|
* `INITREMOTE-FAILURE ErrorMsg`
|
|
Indicates that INITREMOTE failed.
|
|
* `PREPARE`
|
|
Tells the remote that it's time to prepare itself to be used.
|
|
Only a few requests for details about the remote can come before this
|
|
(EXTENSIONS, INITREMOTE, EXPORTSUPPORTED, and LISTCONFIGS,
|
|
but others may be added later).
|
|
* `PREPARE-SUCCESS`
|
|
Sent as a response to PREPARE once the special remote is ready for use.
|
|
* `PREPARE-FAILURE ErrorMsg`
|
|
Sent as a response to PREPARE if the special remote cannot be used.
|
|
* `TRANSFER STORE|RETRIEVE Key File`
|
|
Requests the transfer of a key. For STORE, the File is the file to upload;
|
|
for RETRIEVE the File is where to store the download.
|
|
Note that the File should not influence the filename used on the remote.
|
|
Note that in some cases, the File may contain whitespace.
|
|
It's important that, while a Key is being stored, `CHECKPRESENT`
|
|
not indicate it's present until all the data has been transferred.
|
|
While the transfer is running, the remote can send any number of
|
|
`PROGRESS` messages to indicate its progress. It can also send any of the
|
|
other special remote messages. Once the transfer is done, it finishes by
|
|
sending one of these replies:
|
|
* `TRANSFER-SUCCESS STORE|RETRIEVE Key`
|
|
Indicates the transfer completed successfully.
|
|
* `TRANSFER-FAILURE STORE|RETRIEVE Key ErrorMsg`
|
|
Indicates the transfer failed.
|
|
* `CHECKPRESENT Key`
|
|
Requests the remote to check if a key is present in it.
|
|
* `CHECKPRESENT-SUCCESS Key`
|
|
Indicates that a key has been positively verified to be present in the
|
|
remote.
|
|
* `CHECKPRESENT-FAILURE Key`
|
|
Indicates that a key has been positively verified to not be present in the
|
|
remote.
|
|
* `CHECKPRESENT-UNKNOWN Key ErrorMsg`
|
|
Indicates that it is not currently possible to verify if the key is
|
|
present in the remote. (Perhaps the remote cannot be contacted.)
|
|
* `REMOVE Key`
|
|
Requests the remote to remove a key's contents.
|
|
* `REMOVE-SUCCESS Key`
|
|
Indicates the key has been removed from the remote. May be returned if
|
|
the remote didn't have the key at the point removal was requested.
|
|
* `REMOVE-FAILURE Key ErrorMsg`
|
|
Indicates that the key was unable to be removed from the remote.
|
|
|
|
Special remotes can optionally support tree exports and imports,
|
|
which makes the [[git-annex-export]] and [[git-annex-import]] commands
|
|
work with them. See the [[export_and_import_appendix]] for
|
|
additional requests that git-annex will make when using special remotes in
|
|
this way.
|
|
|
|
The following requests can optionally be supported. If not supported,
|
|
the special remote can reply with `UNSUPPORTED-REQUEST`.
|
|
|
|
* `EXTENSIONS List`
|
|
Sent to indicate protocol extensions which git-annex is capable
|
|
of using. The list is a space-delimited list of protocol extension
|
|
keywords. The remote can reply to this with its own EXTENSIONS list.
|
|
See the section on extensions below for details.
|
|
* `EXTENSIONS List`
|
|
Sent in response to a EXTENSIONS request, to indicate the protocol
|
|
extensions that the special remote is using.
|
|
* `LISTCONFIGS`
|
|
Requests the remote to return a list of settings it uses (with
|
|
`GETCONFIG` and `SETCONFIG`). Providing a list makes `git annex initremote`
|
|
work better, because it can check the user's input, and can also display
|
|
a list of settings with descriptions. Note that the user is not required
|
|
to provided all the settings listed here. A block of responses
|
|
can be made to this, which must always end with `CONFIGEND`.
|
|
(Do not include settings like "name" that are common to all external
|
|
special remotes.)
|
|
* `CONFIG Name Description`
|
|
Indicates the name and description of a config setting. The description
|
|
should be reasonably short. Example:
|
|
"CONFIG directory store data here"
|
|
* `CONFIGEND`
|
|
Indicates the end of the response block.
|
|
* `GETCOST`
|
|
Requests the remote to return a use cost. Higher costs are more expensive.
|
|
(See Config/Cost.hs for some standard costs.)
|
|
* `COST Int`
|
|
Indicates the cost of the remote.
|
|
* `GETAVAILABILITY`
|
|
Asks the remote if it is locally or globally available.
|
|
(Ie stored in the cloud vs on a local disk.)
|
|
If the remote replies with `UNSUPPORTED-REQUEST`, its availability
|
|
is assumed to be global. So, only remotes that are only reachable
|
|
locally need to worry about implementing this.
|
|
* `AVAILABILITY GLOBAL|LOCAL`
|
|
Indicates if the remote is globally or only locally available.
|
|
* `CLAIMURL Url`
|
|
Asks the remote if it wishes to claim responsibility for downloading
|
|
an url.
|
|
* `CLAIMURL-SUCCESS`
|
|
Indicates that the CLAIMURL url will be handled by this remote.
|
|
* `CLAIMURL-FAILURE`
|
|
Indicates that the CLAIMURL url wil not be handled by this remote.
|
|
* `CHECKURL Url`
|
|
Asks the remote to check if the url's content can currently be downloaded
|
|
(without downloading it).
|
|
* `CHECKURL-CONTENTS Size|UNKNOWN Filename`
|
|
Indicates that the requested url has been verified to exist.
|
|
The Size is the size in bytes, or use "UNKNOWN" if the size could not be
|
|
determined.
|
|
The Filename can be empty (in which case a default is used),
|
|
or can specify a filename that is suggested to be used for this url.
|
|
* `CHECKURL-MULTI Url1 Size1|UNKNOWN Filename1 Url2 Size2|UNKNOWN Filename2 ...`
|
|
Indicates that the requested url has been verified to exist,
|
|
and contains multiple files, which can each be accessed using
|
|
their own url. Each triplet of url, size, and filename should be listed,
|
|
one after the other.
|
|
Note that since a list is returned, neither the Url nor the Filename
|
|
can contain spaces.
|
|
* `CHECKURL-FAILURE ErrorMsg`
|
|
Indicates that the requested url could not be accessed.
|
|
* `WHEREIS Key`
|
|
Asks the remote to provide additional information about ways to access
|
|
the content of a key stored in it, such as eg, public urls.
|
|
This will be displayed to the user by eg, `git annex whereis`.
|
|
Note that users expect `git annex whereis` to run fast, without eg,
|
|
network access.
|
|
* `WHEREIS-SUCCESS String`
|
|
Indicates a location of a key. Typically an url, the string can
|
|
be anything that it makes sense to display to the user about content
|
|
stored in the special remote.
|
|
* `WHEREIS-FAILURE`
|
|
Indicates that no location is known for a key.
|
|
This is not needed when `SETURIPRESENT` is used, since such uris are
|
|
automatically displayed by `git annex whereis`.
|
|
* `GETINFO`
|
|
Requests the remote to send some information describing its
|
|
configuration, for display by `git annex info`. A block of responses
|
|
can be made to this, which must always end with `INFOEND`.
|
|
* `INFOFIELD Name`
|
|
Gives the name of an info field. The name can be anything you want to
|
|
be displayed to the user. Must be immediately followed by `INFOVALUE`.
|
|
* `INFOVALUE Value`
|
|
Gives the value of an info field.
|
|
* `INFOEND`
|
|
Indicates the end of the response block.
|
|
|
|
More optional requests may be added, without changing the protocol version,
|
|
so if an unknown request is seen, don't crash, just reply with
|
|
`UNSUPPORTED-REQUEST`.
|
|
|
|
## special remote messages
|
|
|
|
These messages may be sent by the special remote at any time that it's
|
|
handling a request.
|
|
|
|
* `VERSION Int`
|
|
Supported protocol version. Current version is 1. Must be sent first
|
|
thing at startup, as until it sees this git-annex does not know how to
|
|
talk with the special remote program!
|
|
(git-annex does not send a reply to this message, but may give up if it
|
|
doesn't support the necessary protocol version.)
|
|
* `PROGRESS Int`
|
|
Indicates the current progress of the transfer (in bytes). May be repeated
|
|
any number of times during the transfer process, but it's wasteful to
|
|
update the progress until at least another 1% of the file has been sent.
|
|
This is highly recommended for STORE. (It is optional but good for RETRIEVE.)
|
|
(git-annex does not send a reply to this message.)
|
|
* `DIRHASH Key`
|
|
Gets a two level hash associated with a Key. Something like "aB/Cd".
|
|
This is always the same for any given Key, so can be used for eg,
|
|
creating hash directory structures to store Keys in. This is the same
|
|
directory hash that git-annex uses inside `.git/annex/objects/`
|
|
(git-annex replies with VALUE followed by the value.)
|
|
* `DIRHASH-LOWER Key`
|
|
Gets a two level hash associated with a Key, using only lower-case.
|
|
Something like "abc/def".
|
|
This is always the same for any given Key, so can be used for eg,
|
|
creating hash directory structures to store Keys in. This is the same
|
|
directory hash that is used by eg, the directory special remote.
|
|
(git-annex replies with VALUE followed by the value.)
|
|
(First supported by git-annex version 6.20160511.)
|
|
* `SETCONFIG Setting Value`
|
|
Sets one of the special remote's configuration settings.
|
|
Normally this is sent during INITREMOTE, which allows these settings
|
|
to be stored in the git-annex branch, so will be available if the same
|
|
special remote is used elsewhere. (If sent after INITREMOTE, the changed
|
|
configuration will only be available while the remote is running.)
|
|
(git-annex does not send a reply to this message.)
|
|
* `GETCONFIG Setting`
|
|
Gets one of the special remote's configuration settings, which can have
|
|
been passed by the user when running `git annex initremote`, or
|
|
can have been set by a previous SETCONFIG. Can be run at any time.
|
|
It's recommended that special remotes that use this implement
|
|
LISTCONFIGS.
|
|
(git-annex replies with VALUE followed by the value. If the setting is
|
|
not set, the value will be empty.)
|
|
* `SETCREDS Setting User Password`
|
|
When some form of user and password is needed to access a special remote,
|
|
this can be used to securely store them for later use.
|
|
(Like SETCONFIG, this is normally sent only during INITREMOTE.)
|
|
The Setting indicates which value in a remote's configuration can be
|
|
used to store the creds.
|
|
Note that creds are normally only stored in the remote's configuration
|
|
when it's surely safe to do so; when gpg encryption is used, in which
|
|
case the creds will be encrypted using it. If creds are not stored in
|
|
the configuration, they'll only be stored in a local file.
|
|
(embedcreds can be set to yes by the user or by SETCONFIG to force
|
|
the creds to be stored in the remote's configuration).
|
|
(git-annex does not send a reply to this message.)
|
|
* `GETCREDS Setting`
|
|
Gets any creds that were previously stored in the remote's configuration
|
|
or a file.
|
|
(git-annex replies with "CREDS User Password". If no creds are found,
|
|
User and Password are both empty.)
|
|
* `GETUUID`
|
|
Queries for the UUID of the special remote being used.
|
|
(git-annex replies with VALUE followed by the UUID.)
|
|
* `GETGITDIR`
|
|
Queries for the path to the git directory of the repository that
|
|
is using the external special remote.
|
|
(git-annex replies with VALUE followed by the path.)
|
|
* `SETWANTED PreferredContentExpression`
|
|
Can be used to set the preferred content of a repository. Normally
|
|
this is not configured by a special remote, but it may make sense
|
|
in some situations to hint at the kind of content that should be stored
|
|
in the special remote. Note that if a unparsable expression is set,
|
|
git-annex will ignore it.
|
|
(git-annex does not send a reply to this message.)
|
|
* `GETWANTED`
|
|
Gets the current preferred content setting of the repository.
|
|
(git-annex replies with VALUE followed by the preferred content
|
|
expression.)
|
|
* `SETSTATE Key Value`
|
|
Can be used to store some form of state for a Key. The state stored
|
|
can be anything this remote needs to store, in any format.
|
|
It is stored in the git-annex branch. Note that this means that if
|
|
multiple repositories are using the same special remote, and store
|
|
different state, whichever one stored the state last will win. Also,
|
|
it's best to avoid storing much state, since this will bloat the
|
|
git-annex branch. Most remotes will not need to store any state.
|
|
(git-annex does not send a reply to this message.)
|
|
* `GETSTATE Key`
|
|
Gets any state that has been stored for the key.
|
|
(git-annex replies with VALUE followed by the state.)
|
|
* `SETURLPRESENT Key Url`
|
|
Records an URL where the Key can be downloaded from.
|
|
Note that this does not make git-annex think that the url is present on
|
|
the web special remote.
|
|
Keep in mind that this stores the url in the git-annex branch. This can
|
|
result in bloat to the branch if the url is large and/or does not delta
|
|
pack well with other information (such as the names of keys) already
|
|
stored in the branch.
|
|
(git-annex does not send a reply to this message.)
|
|
* `SETURLMISSING Key Url`
|
|
Records that the key can no longer be downloaded from the specified
|
|
URL.
|
|
(git-annex does not send a reply to this message.)
|
|
* `SETURIPRESENT Key Uri`
|
|
Records an URI where the Key can be downloaded from. Use with uris
|
|
that cannot be downloaded with http.
|
|
(git-annex does not send a reply to this message.)
|
|
* `SETURIMISSING Key Uri`
|
|
Records that the key can no longer be downloaded from the specified
|
|
URI.
|
|
(git-annex does not send a reply to this message.)
|
|
* `GETURLS Key Prefix`
|
|
Gets the recorded urls where a Key can be downloaded from.
|
|
Only urls that start with the Prefix will be returned. The Prefix
|
|
may be empty to get all urls.
|
|
(git-annex replies one or more times with VALUE for each url.
|
|
The final VALUE has an empty value, indicating the end of the url list.)
|
|
* `DEBUG message`
|
|
Tells git-annex to display the message if --debug is enabled.
|
|
(git-annex does not send a reply to this message.)
|
|
* `INFO message`
|
|
Tells git-annex to display the message to the user.
|
|
When git-annex is in --json mode, the message will be emitted immediately
|
|
in its own json object, with an "info" field.
|
|
This message is a protocol extension; it's only safe to send it to
|
|
git-annex after it sent an EXTENSIONS that included INFO.
|
|
(git-annex does not send a reply to this message.)
|
|
|
|
## general messages
|
|
|
|
These messages can be sent at any time by either git-annex or the special
|
|
remote.
|
|
|
|
* `ERROR ErrorMsg`
|
|
Generic error. Can be sent at any time if things get too messed up
|
|
to continue. When possible, use a more specific reply from the list above.
|
|
The special remote program should exit after sending this, as
|
|
git-annex will not talk to it any further. If the program receives
|
|
an ERROR from git-annex, it can exit with its own ERROR.
|
|
|
|
## extensions
|
|
|
|
These protocol extensions are currently supported.
|
|
|
|
* `INFO`
|
|
This makes the `INFO` message available to use.
|
|
* `ASYNC`
|
|
This lets multiple actions be performed at the same time by
|
|
a single external special remote program, rather than starting multiple
|
|
programs. See the [[async_appendix]] for details.
|
|
|
|
## signals
|
|
|
|
The external special remote program should not block SIGINT, or SIGTERM.
|
|
Doing so may cause git-annex to hang waiting on it to exit.
|
|
Of course it's ok to catch those signals and do some necessary cleanup
|
|
before exiting.
|
|
|
|
## long running network connections
|
|
|
|
Since an external special remote is started only when git-annex needs to
|
|
access the remote, and then left running, it's ok to open a network
|
|
connection in the PREPARE stage, and continue to use that network
|
|
connection as requests are made.
|
|
|
|
If you're unable to open a network connection, or the connection closes,
|
|
perhaps because the network is down, it's ok to fail to perform any
|
|
requests. Or you can try to reconnect when a new request is made.
|
|
|
|
Note that the external special remote program may be left running for
|
|
quite a long time, especially when the git-annex assistant is using it.
|
|
The assistant will detect when the system connects to a network, and will
|
|
start a new process the next time it needs to use a remote.
|
|
|
|
## claiming custom uri schemes for use with git-annex addurl
|
|
|
|
If a special remote has its own uri scheme, or some other way to identify a
|
|
particular url as being content that is stored in the special remote,
|
|
and can be downloaded by it, it can implement CLAIMURL and CHECKURL.
|
|
This lets git-annex addurl be used with such urls.
|
|
|
|
For example, the ipfs special remote implements CLAIMURL and CHECKURL
|
|
for "ipfs:ADDRESS" uris. And the bittorrent special remote implements them
|
|
for http urls ending in ".torrent".
|
|
|
|
When a special remote has claimed an url, commands like git-annex addurl
|
|
will use TRANSFER RETRIEVE to request it download the content of a key.
|
|
To find out what url to download, the special remote can use GETURLS
|
|
to find out what urls are recorded for the key.
|
|
|
|
For example, the ipfs special remote sends "GETURLS $KEY ipfs:",
|
|
in order to get only the "ipfs:" uris.
|
|
|
|
The special remote can also use SETURIPRESENT or SETURLPRESENT,
|
|
eg after transferring content to the remote it might know the uri or url
|
|
that can be used to download it. And SETURIMISSING or SETURLMISSING
|
|
can be used after removing content from the remote. This information can
|
|
then be looked up using GETURLS. But it's not necessary to do this in order
|
|
to simply claim an url, because git-annex addurl takes care of it.
|
|
|
|
For example, the ipfs special remote sends "SETURIPRESENT $KEY ipfs:ADDRESS"
|
|
after storing each key in ipfs. It can later look up that uri when
|
|
downloading the key, and the ipfs uri is also displayed by git-annex
|
|
whereis.
|
|
|
|
## readonly mode for http downloads
|
|
|
|
Some storage services allow downloading the content of a file using a
|
|
regular http connection, with no authentication. An external special remote
|
|
for such a storage service can support a readonly mode of operation.
|
|
|
|
It works like this:
|
|
|
|
* When a key's content is stored on the remote, use SETURLPRESENT to
|
|
tell git-annex the public url from which it can be downloaded.
|
|
* When a key's content is removed from the remote, use SETURLMISSING.
|
|
* Document that this external special remote can be used in readonly mode.
|
|
|
|
The user doesn't even need to install your external special remote
|
|
program to use such a remote! All they need to do is run:
|
|
`git annex enableremote $remotename readonly=true`
|
|
|
|
* The readonly=true parameter makes git-annex download content from the
|
|
urls recorded earlier by SETURLPRESENT.
|
|
|
|
## TODO
|
|
|
|
* When storing encrypted files stream the file up/down the pipe, rather
|
|
than using a temp file. Will probably involve less space and disk IO,
|
|
and makes the progress display better, since the encryption can happen
|
|
concurrently with the transfer. Also, no need to use PROGRESS in this
|
|
scenario, since git-annex can see how much data it has sent/received from
|
|
the remote. However, \n and probably \0 need to be escaped somehow in the
|
|
file data, which adds complication.
|
|
* uuid discovery during INITREMOTE.
|
|
* Hook into webapp. Needs a way to provide some kind of prompt to the user
|
|
in the webapp, etc.
|