diff --git a/doc/design/external_backend_protocol.mdwn b/doc/design/external_backend_protocol.mdwn new file mode 100644 index 0000000000..2fbdb176ce --- /dev/null +++ b/doc/design/external_backend_protocol.mdwn @@ -0,0 +1,178 @@ +**Draft** + +Communication between git-annex and a program implementing an external +[[backend|backends]] uses this protocol. + +[[!toc ]] + +## starting the program + +The external backend program has a name like `git-annex-backend-XFOO`. +When git-annex is configured to use a backend starting with "X", +or encounters a key in a repository starting with "X", it +looks for the corresponding external backend program in PATH. + +The program is started by git-annex when it needs to use it, and may be +left running for a long period of time. Note that git-annex may choose to +run multiple instances of the program. + +## protocol overview + +Communication is via stdin and stdout. While stderr is connected to the +console and so visible to the user, the program should avoid using it +except for in the most exceptional circumstances. + +The protocol is line based. git-annex sends a request, and the program +responds with a reply. + +Each protocol line starts with a command, which is followed by the +command's parameters (a fixed number per command), each separated by a +single space. The last parameter may contain spaces. Parameters may be +empty, but the separating spaces are still required in that case. + +## example session + +git-annex always starts by sending a message asking the program what protocol +version it uses. + + GETVERSION + +The program responds. + + VERSION 1 + +git-annex will next query the program about the properties of the keys it +uses (CANVERIFY, ISSTABLE, ISCRYPTOGRAPHICALLYSECURE), and the program will +respond to each query. + +Then git-annex may ask the program to generate a key. + + GENKEY somefile + +The program will respond with the key it generated, but if it needs to do +an expensive operation, such as hashing the file, it can first send +progress messages, indicating the position in the file it has processed. + + PROGRESS 1024 + PROGRESS 2048 + GENKEY-SUCCESS XFOO-s2048--dbd009 + +git-annex can also ask the program to verify if the content of a file +matches a key. + + VERIFYKEYCONTENT XFOO-s2048--dbd009 somefile + +Again the program can send progress messages as it works, finishing +with the result of the verification. + + PROGRESS 1024 + PROGRESS 2048 + VERIFYKEYCONTENT-SUCCESS + +## startup messages and replies + +These messages are sent to the program soon after starting it, and it should +reply with one of the listed replies. + +* `GETVERSION` + Always the first message sent. + Currently the only version of this protocol is version 1. + * `VERSION 1` +* `CANVERIFY` + Asks if the program can verify the content of files match a key it generated. + The verification does not need to be cryptographically secure, but should + catch data corruption. + * `CANVERIFY-YES` + * `CANVERIFY-NO` +* `ISSTABLE` + Asks the program if a key it has generated will always have the same + content. The answer to this is almost always yes; URL keys are an example + of a type of key that may have different content at different times. + * `ISSTABLE-YES` + * `ISSTABLE-NO` +* `ISCRYPTOGRAPHICALLYSECURE` + Asks the program if keys it generates are verified using a cryptographically + secure hash. Note that sha1 is *not* a cryptographically secure hash any + longer. A program can change its answer to this question as the state of the + art advances, and should aim to stay ahead of the state of the art by a + reasonable amount of time. + * ISCRYPTOGRAPHICALLYSECURE-YES` + * ISCRYPTOGRAPHICALLYSECURE-NO` + +## main messages and replies + +This is where work happens. + +* `GENKEY Contentfile` + The program should examine the ContentFile and from it generate a + key. While it is doing this, it can send any number of `PROGRESS` + messages indication the position in the file that it's gotten to. + * `GENKEY-SUCCESS Key` + * `GENKEY-FAILURE ErrorMsg` +* `VERIFYKEYCONTENT Key ContentFile` + The program should examine the ContentFile and verify that it has the + content it would expect for the Key. While it is doing this, it can + send any number of `PROGRESS` messages indication the position in the + file that it's gotten to. (If the program earlier sent CANVERIFY-NO, + it will not be asked to do this.) + * `VERIFYKEYCONTENT-SUCCESS` + * `VERIFYKEYCONTENT-FAILURE` + +## general messages + +These messages can be sent at any time by either git-annex or the program. + +* `ERROR ErrorMsg` + Generic error. Can be sent at any time if things get too messed up to + continue. When possible, use a more specific reply. + The program should exit after sending this, as git-annex will not talk to + it any further. If the program receives an ERROR from git-annex, it can + exit with its own ERROR. + +## considerations for generating keys + +See [[doc/internals/key_format]] for how to format a key. + +The backend name should match the name of the program, eg if the program +is git-annex-backend-XFOO, it should generate a key starting with "XFOO-". + +The backend name (and program name) has to be all uppercase, and should be +reasonably short (max 10 bytes or so), and should be entirely ascii +alphanumerics. Eg, use similar names to other [[backends]]. + +git-annex will automatically also support an "E" variant of the backend, +which adds a filename extension to the end of the key. It does this +entirely transparently to the program, so while the repository may be using +XFOOE keys, the program will always generate and verify XFOO keys. + +The key name is typically some kind of hash, but is not limited to a hash. +The length of it needs to be similar to the lengths of other git-annex +keys. Too long a key name will make it annoying to work with repositories +using them, or even cause problems due to filename length limits. 128 bytes +maximum, but shorter is better. + +It's important that, if the program responds with +ISCRYPTOGRAPHICALLYSECURE-YES, the key name contains only a hash, and not +other data from some other source. That other data could be used to try to +mount a sha1 collision attack against git, by embedding colliding material +in the key name, where users are unlikely to notice it. While git has +several things that make sha1 collision attacks difficult, we don't want +this chink in the armor. + +## program names must be unique + +It's important that two different programs don't use the same name, because +that would result in bad behavior if the wrong program were used with a +repository with keys generated by the other program. + +Here is a list of programs, to avoid picking the same name. Edit this page +to add yours to the list. + +* [[git-annex-backend-XFOO]] is a demo program implementing this protocol + with a shell script. + +## signals + +The program should not block SIGINT, or SIGTERM. Doing so may cause +git-annex to hang waiting on it to exit. Of course it's ok to catch those +signals and do some necessary cleanup before exiting. diff --git a/doc/design/external_backend_protocol/git-annex-backend-XFOO b/doc/design/external_backend_protocol/git-annex-backend-XFOO new file mode 100755 index 0000000000..0282fab4ae --- /dev/null +++ b/doc/design/external_backend_protocol/git-annex-backend-XFOO @@ -0,0 +1,57 @@ +#!/bin/sh +# Demo git-annex external backend program. +# +# Install in PATH as git-annex-backend-XFOO +# +# Copyright 2020 Joey Hess; licenced under the GNU GPL version 3 or higher. + +set -e + +hashfile { + local contentfile="$1" + # could send PROGRESS while doing this, but it's + # hard to implement that in shell + return "$(md5sum "$contentfile" | cut -d ' ' -f 1 || echo '')" +} + +while read line; do + set -- $line + case "$1" in + GETVERSION) + echo VERSION 1 + ;; + CANVERIFY) + echo CANVERIFY-YES + ;; + ISSTABLE) + echo ISSTABLE-YES + ;; + ISCRYPTOGRAPHICALLYSECURE) + # md5 is not cryptographically secure + echo ISCRYPTOGRAPHICALLYSECURE-NO + ;; + GENKEY) + contentfile="$2" + hash=$(hashfile "$contentfile") + if [ -n "$hash" ]; then + echo "GENKEY-SUCCESS" "XFOO--$hash" + else + echo "GENKEY-FAILURE" "md5sum failed" + fi + ;; + VERIFYKEYCONTENT) + key="$2" + contentfile="$3" + hash=$(hashfile "$contentfile") + khash=$(echo "$key" | sed 's/.*--//') + if [ "$hash" == "$khash" ]; then + echo "VERIFYKEYCONTENT-SUCCESS" + else + echo "VERIFYKEYCONTENT-FAILURE" + fi + ;; + *) + echo ERROR protocol error + ;; + esac +done diff --git a/doc/design/external_special_remote_protocol.mdwn b/doc/design/external_special_remote_protocol.mdwn index aee77cf8eb..e58264a5d8 100644 --- a/doc/design/external_special_remote_protocol.mdwn +++ b/doc/design/external_special_remote_protocol.mdwn @@ -194,7 +194,7 @@ the special remote can reply with `UNSUPPORTED-REQUEST`. (See Config/Cost.hs for some standard costs.) * `COST Int` Indicates the cost of the remote. -* `GETAVAILABILITY` +* `GETAVAILABILITY` Asks the remote if it is locally or globally available. (Ie stored in the cloud vs on a local disk.) If the remote replies with `UNSUPPORTED-REQUEST`, its availability @@ -227,7 +227,7 @@ the special remote can reply with `UNSUPPORTED-REQUEST`. can contain spaces. * `CHECKURL-FAILURE` Indicates that the requested url could not be accessed. -* `WHEREIS Key` +* `WHEREIS Key` Asks the remote to provide additional information about ways to access the content of a key stored in it, such as eg, public urls. This will be displayed to the user by eg, `git annex whereis`. diff --git a/doc/todo/external_backends/comment_11_56224638fde7b46ee6f52211474cd047._comment b/doc/todo/external_backends/comment_11_56224638fde7b46ee6f52211474cd047._comment new file mode 100644 index 0000000000..7a78ef77b5 --- /dev/null +++ b/doc/todo/external_backends/comment_11_56224638fde7b46ee6f52211474cd047._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2020-07-20T18:01:27Z" + content=""" +Wrote a draft [[design/external_backend_protocol]]. + +I wonder if it makes sense to require the programs to format and parse +their own keys; git-annex could break up the key and send the peices in. +The advantage though is that this lets a program decide whether or not to +include information like the size and mtime fields in the key or not. +And if more fields ever got added it would not need changes to the +protocol. I guess it's simple enough for format and parse, as shown by the +example shell program that does it. +"""]]