
Added annex.security.autoenable-compute-programs and only allow autoenabling special remotes that use compute programs on that list. The reason this is needed is a user might have some compute programs that are less safe to use than others. They might want to use an unsafe one only with one repository, where they are the only committer or other committers are trusted. They might be ok with others being used by any repository, and if so they can add them to the list. Another reason would be a user who has installed a compute program by accident. Eg, it might be included with git-annex at some point, or pulled in by some dependency. That user doesn't necessarily want that compute program to be used in an autoenabled special remote.
107 lines
4.1 KiB
Markdown
107 lines
4.1 KiB
Markdown
**draft**
|
|
|
|
The [[special_remotes/compute]] special remote uses this interface to run
|
|
compute programs.
|
|
|
|
When an compute special remote is initremoted, a program is specified:
|
|
|
|
git-annex initremote myremote type=compute program=git-annex-compute-foo
|
|
|
|
The user adds an annexed file that is computed by the program by running
|
|
a command like one of these:
|
|
|
|
git-annex addcomputed --to=myremote -- convert file.raw file.jpeg passes=10
|
|
git-annex addcomputed --to=myremote -- compress in out --level=9
|
|
git-annex addcomputed --to=myremote -- clip foo 2:01-3:00 combine with bar to baz
|
|
|
|
Whatever values the user passes to `git-annex addcomputed` are passed to
|
|
the program in `ARGV`, followed by any values that the user provided to
|
|
`git-annex initremote`.
|
|
|
|
To simplify the program's option parsing, any value that the user provides
|
|
that is in the form "foo=bar" will also result in an environment variable
|
|
being set, eg `ANNEX_COMPUTE_passes=10` or `ANNEX_COMPUTE_--level=9`.
|
|
|
|
For security, the program should avoid exposing user input to the shell
|
|
unprotected, or otherwise executing it.
|
|
|
|
The program is run in a temporary directory, which will be cleaned up after
|
|
it exits. Note that it may be run in a subdirectory of a temporary
|
|
directory. This is done when `git-annex addcomputed` was run in a subdirectory
|
|
of the git repository.
|
|
|
|
The content of any file in the repository can be an input to the
|
|
computation. The program requests an input by writing a line to stdout:
|
|
|
|
INPUT file.raw
|
|
|
|
Then it can read a line from stdin, which will be the path to the content
|
|
(eg a `.git/annex/objects/` path).
|
|
|
|
If the program needs multiple input files, it should output multiple
|
|
`INPUT` lines first, and then read multiple paths from stdin. This
|
|
allows retrieval of the inputs to potentially run in parallel.
|
|
|
|
If an input file is not available, the program's stdin will be closed
|
|
without a path being written to it. So when reading from stdin fails,
|
|
the program should exit.
|
|
|
|
When `git-annex addcomputed --fast` is being used to add a computation
|
|
to the git-annex repository without actually performing it, the
|
|
response to each "INPUT" will be an empty line rather than the path to
|
|
an input file. In that case, the program should proceed with the rest of
|
|
its output to stdout (eg "OUTPUT" and "REPRODUCIBLE"), but should not
|
|
perform any computation.
|
|
|
|
For each output file that it will compute, the program should write a
|
|
line to stdout:
|
|
|
|
OUTPUT file.jpeg
|
|
|
|
The filename of the output file is both the filename in the program's
|
|
temporary directory, and also the filename that will be added to the
|
|
git-annex repository by `git-annex compute`.
|
|
|
|
If git-annex sees that an output file is growing, it will use its file size
|
|
when displaying progress to the user. So if possible, the program should
|
|
write the content to the file it is computing directly, rather than writing
|
|
to somewhere else and renaming it at the end. But, if the program seeks
|
|
around and writes out of order, it should write to a file somewhere else
|
|
and rename it at the end.
|
|
|
|
The program can also output lines to stdout to indicate its current
|
|
progress:
|
|
|
|
PROGRESS 50%
|
|
|
|
The program can optionally also output a "REPRODUCIBLE" line. That
|
|
indicates that the results of its computations are expected to be
|
|
bit-for-bit reproducible. That makes `git-annex addcomputed` behave as if
|
|
the `--reproducible` option is set.
|
|
|
|
Anything that the program outputs to stderr will be displayed to the user.
|
|
This stderr should be used for error messages, and possibly computation
|
|
output, but not for progress displays.
|
|
|
|
If the program exits nonzero, nothing it computed will be stored in the
|
|
git-annex repository.
|
|
|
|
An example `git-annex-compute-foo` shell script follows:
|
|
|
|
#!/bin/sh
|
|
set -e
|
|
if [ "$1" != "convert" ]; then
|
|
echo "Usage: convert input output [passes=n]" >&2
|
|
exit 1
|
|
fi
|
|
if [ -z "$ANNEX_COMPUTE_passes" ]; then
|
|
ANNEX_COMPUTE_passes=1
|
|
fi
|
|
echo "INPUT $2"
|
|
read input
|
|
echo "OUTPUT $3"
|
|
echo REPRODUCIBLE
|
|
if [ -n "$input" ]; then
|
|
mkdir -p "$(dirname "$3")"
|
|
frobnicate --passes="$ANNEX_COMPUTE_passes" <"$input" >"$3"
|
|
fi
|