git-annex/doc/design/compute_special_remote_interface.mdwn
2025-02-19 14:16:36 -04:00

108 lines
3.9 KiB
Markdown

**draft**
The [[special_remotes/compute]] special remote uses this interface to run
compute programs.
When an compute special remote is initremoted, a program is specified:
git-annex initremote myremote type=compute program=git-annex-compute-foo
The user adds an annexed file that is computed by the program by running
a command like this:
git-annex addcomputed --to myremote \
--input raw=file.raw --value passes=10 \
--output photo=file.jpeg
That command and later `git-annex get` of a computed file both
run the program the same way.
The program is passed inputs to the computation via environment variables,
which are all prefixed with `"ANNEX_COMPUTE_"`.
In the example above, the program will be passed this environment:
ANNEX_COMPUTE_INPUT_raw=/path/.git/annex/objects/..
ANNEX_COMPUTE_VALUE_passes=10
Default values that are provided to `git-annex initremote` will also be set
in the environment. Eg `git-annex initremote myremote type=compute
program=foo passes=9` will set `ANNEX_COMPUTE_VALUE_passes=9` by default.
For security, the program should avoid exposing values from `ANNEX_COMPUTE_*`
variables to the shell unprotected, or otherwise executing them.
The program will also inherit other environment variables
that were set when git-annex was run, like PATH. (`ANNEX_COMPUTE_*`
environment variables are not inherited.)
The program is run in a temporary directory, which will be cleaned up after
it exits. It writes the files that it computes to that directory.
Before starting the main computation, the program must output a list of the
files that it will compute, in the form "COMPUTING Id filename".
Here "Id" is a short identifier for a particular file, which the
user specifies when running `git-annex addcomputed`.
In the example above, the program is expected to output something like:
COMPUTING photo out.jpeg
COMPUTING sidecar otherfile
If possible, the program should write the content of the file it is
computing directly to the file listed in COMPUTING, rather than writing to
somewhere else and renaming it at the end. If git-annex sees that the file
corresponding to the key it requested be computed is growing, it will use
its file size when displaying progress to the user.
The program can also output lines to stdout to indicate its current
progress.
PROGRESS 50%
Anything that the program outputs to stderr will be displayed to the user.
This stderr should be used for error messages, and possibly computation
output, but not for progress displays.
If the program exits nonzero, nothing it computed will be stored in the
git-annex repository.
The program must also support listing the inputs and outputs that it
supports. This allows `git-annex addcomputed` and `git-annex initremote` to
list inputs and outputs, and also lets them reject invalid inputs and
outputs.
In this mode, the program is run with a "list" parameter.
It should output lines, in the form:
INPUT[?] Name Description
VALUE[?] Name Description
OUTPUT Id Description
Use "INPUT" when a file is an input to the computation,
and "VALUE" for all other input values. Use "INPUT?" and "VALUE?"
for optional inputs and values.
The program can also optionally output a "REPRODUCIBLE" line.
That indicates that the results of its computations are
expected to be bit-for-bit reproducible.
That makes `git-annex addcomputed` behave as if the `--reproducible`
option is set.
An example `git-annex-compute-foo` shell script follows:
#!/bin/sh
set -e
if [ "$1" = list ]; then
echo "INPUT raw A photo in RAW format"
echo "VALUE? passes Number of passes"
echo "OUTPUT photo Computed JPEG"
echo "REPRODUCIBLE"
exit 0
fi
if [ -z "$ANNEX_COMPUTE_VALUE_passes" ]; then
ANNEX_COMPUTE_VALUE_passes=1
fi
echo "COMPUTING photo out.jpeg"
frobnicate --passes="$ANNEX_COMPUTE_VALUE_passes" \
<"$ANNEX_COMPUTE_INPUT_raw" >out.jpeg