233 lines
8.3 KiB
Markdown
233 lines
8.3 KiB
Markdown
Do you ever check in original versions of files to `git-annex`, but then
|
|
convert them in some way? Maybe you check in original photos from a camera,
|
|
but then change them to a more useful file format, or smaller resolution.
|
|
Or you clip a video file. Or you crunch some data to a result.
|
|
|
|
If you check the computed file into `git-annex` too, and store it on
|
|
your remotes along with the original, that's a waste of disk space.
|
|
But it is so convenient to be able to `git-annex get` the computed file.
|
|
|
|
The [[compute special remote|special_remotes/compute]] is the solution to
|
|
this. It "stores" the computed file by remembering how to compute it from
|
|
input files. When you `git-annex get` the computed file from it, it re-runs
|
|
the computation on the original input file to produced the computed file.
|
|
|
|
[[!toc ]]
|
|
|
|
## using the compute special remote
|
|
|
|
There are many compute programs that each handle some type of computation,
|
|
and it's pretty easy to write your own compute program too. In this tip,
|
|
we'll use [[special_remotes/compute/git-annex-compute-imageconvert]],
|
|
which uses imagemagick to convert between image formats.
|
|
|
|
To follow along, install that program in PATH (and remember to make it
|
|
executable!) and make sure you have
|
|
[imagemagick](https://www.imagemagick.org/) installed.
|
|
|
|
First, initialize a compute remote:
|
|
|
|
# git-annex initremote imageconvert type=compute program=git-annex-compute-imageconvert
|
|
|
|
Now suppose you have a file `foo.jpeg`, and you want to add a computed
|
|
`foo.gif` to the git-annex repository.
|
|
|
|
# git-annex addcomputed --to=imageconvert foo.jpeg foo.gif
|
|
|
|
(The syntax of the `git-annex addcomputed` command will vary depending on the
|
|
program that a compute remote uses. Some may have multiple input files, or
|
|
multiple ouput files, or other options to control the computation. See
|
|
the documentation of each compute program for details.)
|
|
|
|
Now you have `foo.gif` and can use it as usual, including copying it to
|
|
other remotes. But it's already "stored" in the imageconvert remote,
|
|
as a computation. So to free up space, you can drop it:
|
|
|
|
# git-annex drop foo.gif
|
|
drop foo.gif ok
|
|
|
|
By the way, you can also add a computed file to the repository
|
|
without bothering to compute it yet! Just use `--fast`:
|
|
|
|
# git-annex addcomputed --fast --to=imageconvert bar.jpeg bar.gif
|
|
|
|
Now suppose you're in another clone of this same repository, and you want
|
|
these gifs.
|
|
|
|
# git-annex get foo.gif
|
|
get foo.gif (not available)
|
|
Maybe enable some of these special remotes (git annex enableremote ...):
|
|
8332f7ad-d54e-435e-803b-138c1cfa7b71 -- imageconvert
|
|
failed
|
|
|
|
With [[special_remotes/compute/git-annex-compute-imageconvert]] and
|
|
imagemagic installed, all you need to do is enable the special remote to
|
|
get the computed files from it:
|
|
|
|
# git-annex enableremote imageconvert
|
|
# git-annex get foo.gif
|
|
get foo.gif (from imageconvert...)
|
|
(getting input foo.jpeg from origin...)
|
|
ok
|
|
|
|
Notice that, when the input file is not present in the repository, getting
|
|
a file from a compute remote will first get the input file.
|
|
|
|
That's the basics of using the compute special remote.
|
|
|
|
## recomputation
|
|
|
|
What happens if the input file `foo.gif` is changed to a new version?
|
|
Will getting `foo.jpeg` from the compute remote base it on the new version
|
|
too? No. `foo.gif` is stuck on the original version of the input file that
|
|
was used to compute it.
|
|
|
|
But, it's easy to recompute the file with a new version of the input file.
|
|
Just `git-annex add` the new version of the input file, and then:
|
|
|
|
# git-annex recompute foo.gif
|
|
recompute foo.gif (foo.jpeg changed)
|
|
ok
|
|
|
|
You can use commands like `git diff` and `git status` to see the
|
|
change that made to `foo.gif`.
|
|
|
|
# git status --short foo.gif
|
|
M foo.gif
|
|
|
|
Now both the new and old versions of `foo.gif` are stored in the
|
|
imageconvert remote, and it can compute either as needed.
|
|
|
|
## reproducibility
|
|
|
|
You might be wondering, what happens if a computed file, such as `foo.gif`
|
|
isn't exactly the same identical file each time it's computed? For example,
|
|
what if there's a timestamp in there.
|
|
|
|
The answer is that, by default, files computed by a compute special remote
|
|
are not required, or guaranteed to be bit-for-bit reproducible. One gif
|
|
converted from a jpeg is much like any other converted from the same jpeg.
|
|
|
|
So git-annex usually treats all files computed in the same way from the
|
|
same input as interchangeable. (Unless the compute program indicates
|
|
that it produces reproducible files.)
|
|
|
|
Sometimes though, it's important that a file be bit-for-bit reproducible. And
|
|
you can ask git-annex to enforce this for computed files.
|
|
There is a `--reproducible` option for this, which you can pass to
|
|
`git-annex addcomputed` or to `git-annex recompute`.
|
|
|
|
Let's switch the computed `foo.gif` to a reproducible file:
|
|
|
|
# git-annex recompute --original --reproducible foo.gif
|
|
recompute foo.gif
|
|
ok
|
|
|
|
You can `git commit foo.gif` to store this change.
|
|
|
|
But first, let's check if that computation actually *is* reproducible.
|
|
This is easy, just drop it and get it from the compute remote again:
|
|
|
|
# git-annex drop foo.gif
|
|
drop foo.gif ok
|
|
# git-annex get foo.gif --from imageconvert
|
|
get foo.gif (from imageconvert...)
|
|
ok
|
|
|
|
If it turned out that the computation was not reproducible, getting the
|
|
file would fail, like this:
|
|
|
|
# git-annex get foo.gif --from imageconvert
|
|
get foo.gif (from imageconvert...)
|
|
Verification of content failed
|
|
|
|
This is because a reproducible file uses a regular [[backend]], which
|
|
by default uses a hash to verify the content of the file.
|
|
|
|
If it does turn out that a file that was expected to be reproducible isn't,
|
|
you can always convert it to an unreproducible file:
|
|
|
|
# git-annex recompute --original --unreproducible foo.gif
|
|
recompute foo.gif
|
|
ok
|
|
|
|
## writing your own compute programs
|
|
|
|
There is a whole little protocol that compute programs use to
|
|
communicate with git-annex. It's all documented at
|
|
[[design/compute_special_remote_interface]].
|
|
|
|
But it's really easy to write simple ones, and you don't need to
|
|
dive into all the details to do it. Let's walk through the code
|
|
to [[special_remotes/compute/git-annex-compute-imageconvert]],
|
|
which at 14 lines, is about as simple as one can be.
|
|
|
|
#!/bin/sh
|
|
|
|
It's a shell script.
|
|
|
|
set -e
|
|
|
|
If it fails to read input from standard input, or if a command fails, it
|
|
will exit nonzero.
|
|
|
|
if [ -z "$1" ] || [ -z "$2" ]; then
|
|
echo "Specify the input image file, followed by the output image file." >&2
|
|
echo "Example: foo.jpg foo.gif" >&2
|
|
exit 1
|
|
fi
|
|
|
|
It expects to be passed two parameters, which were "foo.jpeg" and "foo.gif" in
|
|
the examples above. And it outputs some usage to stderr otherwise. That is
|
|
displayed if the user runs `git-annex addcomputed` without the necessary
|
|
filenames.
|
|
|
|
echo "INPUT $1"
|
|
read input
|
|
|
|
It tells git-annex that the first filename is the input file. And git-annex
|
|
replies by telling it *where* the content of the input file is. This is the
|
|
path to a git-annex object file.
|
|
|
|
echo "OUTPUT $2"
|
|
read output
|
|
|
|
It tells git-annex that the second filename is the output file. And git-annex
|
|
replies by telling it the path it should write the output file to.
|
|
|
|
if [ -n "$input" ]; then
|
|
|
|
When `git-annex addcomputed --fast` is used, the program shouldn't actually
|
|
read the input file or compute the output file. git-annex indicates this by
|
|
not giving it a path to the input file. That's checked here.
|
|
|
|
convert "$input" "$output" >&2
|
|
|
|
This uses `convert` from imagemagick, and just converts the input file to
|
|
the output file.
|
|
|
|
Notice that stdout from `convert` is redirected to stderr. This is done
|
|
because the compute program is speaking this protocol with git-annex over
|
|
stdin and stdout, and we don't want random program output to mess that up.
|
|
|
|
fi
|
|
|
|
Closing the `if` above.
|
|
|
|
And that's all!
|
|
|
|
Now you know almost enough to write your own compute program. Editing this one
|
|
will be a good start.
|
|
|
|
**But first, a word about security.**
|
|
|
|
A user who enables a compute special remote and runs `git pull` followed by
|
|
`git-annex get` is running the compute program with inputs under the control
|
|
of anyone who has commit access to the repository.
|
|
|
|
So, it's important that your compute program be secure. Please see
|
|
the section on security in [[design/compute_special_remote_interface]]
|
|
for security considerations.
|
|
|
|
If you write a nice secure compute program, you can add it to the list
|
|
in [[special_remotes/compute]] so other people can use it.
|