This commit is contained in:
Joey Hess 2025-02-13 16:12:07 -04:00
parent bf6446528d
commit e6e69f8f93
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 88 additions and 0 deletions

View file

@ -0,0 +1,80 @@
**draft**
The [[special_remotes/compute]] special remote uses this interface to run
compute programs.
When an compute special remote is initremoted, a program is specified:
git-annex initremote myremote type=compute program=foo
That causes `git-annex-compute-foo` to be run to get files from that
compute special remote.
The environment variable `ANNEX_COMPUTE_KEY` is the key that the program
is requested to compute.
The program is run in a temporary directory, which will be cleaned up after it
exits. When it generates the content of a key, it should write it to a file
with the same name as the key, in that directory. Then it should
output the key in a line to stdout.
While usually this will be the requested key, the program can output any
number of other keys as well, all of which will be stored in the git-annex
repository when getting files from the compute special remote. When a
computation generates several files, this allows running it a single time
to get them all.
The program is passed environment variables to provide inputs to the
computation. These are all prefixed with `"ANNEX_COMPUTE_"`.
The names are taken from the `git-annex addcomputed` command that was used to
add a computed file to the repository.
For example, this command:
git-annex addcomputed file.gen --to foo \
--input raw=file.raw --value passes=10
Will result in this environment:
ANNEX_COMPUTE_KEY=SHA256--...
ANNEX_COMPUTE_raw=file.in
ANNEX_COMPUTE_INPUT_raw=/path/.git/annex/objects/..
ANNEX_COMPUTE_passes=10
For security, the program should avoid exposing values from `ANNEX_COMPUTE_*`
variables to the shell unprotected, or otherwise executing them.
The program will also inherit other environment variables
that were set when git-annex was run, like PATH.
Anything that the program outputs to stderr will be displayed to the user.
This stderr should be used for error messages, and possibly computation
output, but not for progress displays, since git-annex has its own progress
displays.
If possible, the program should write the content of the key it is
generating directly to the file, rather than writing to somewhere else and
renaming it at the end. If git-annex sees that the file corresponding to
the key it requested be computed is growing, it will use the file size when
displaying progress to the user.
Alternatively, if the program outputs a number on a line to stdout, this is
taken to be the number of bytes of the requested key that have been computed
so far. Or, the program can output a percentage eg "50%" on a line to stdout
to indicate what percent of the computation has been performed so far.
If the program exits nonzero, nothing it computed will be stored in the
git-annex repository.
An example `git-annex-compute-foo` shell script follows:
#!/bin/sh
set -e
if [ -z "$ANNEX_COMPUTE_passes" || -z "$ANNEX_COMPUTE_INPUT_raw" ]; then
echo "Missing expected inputs" >&2
exit 1
fi
frobnicate --passes="$ANNEX_COMPUTE_passes" \
<"$ANNEX_COMPUTE_INPUT_raw" >"$ANNEX_COMPUTE_KEY"
echo "$ANNEX_COMPUTE_KEY"

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 17"""
date="2025-02-13T20:10:52Z"
content="""
I've written up a draft interface for programs used by a compute special
remote: [[design/compute_special_remote_interface]]
"""]]