add compute tip
This commit is contained in:
parent
a673fc7cfd
commit
e505ade963
2 changed files with 235 additions and 0 deletions
|
@ -26,6 +26,8 @@ program takes a dashed option, it can be provided after "--":
|
|||
|
||||
# git-annex initremote myremote type=compute program=git-annex-compute-foo -- --level=9
|
||||
|
||||
See [[tips/computing_annexed_files]] for more documentation.
|
||||
|
||||
## compute programs
|
||||
|
||||
To write programs used by the compute special remote, see the
|
||||
|
|
233
doc/tips/computing_annexed_files.mdwn
Normal file
233
doc/tips/computing_annexed_files.mdwn
Normal file
|
@ -0,0 +1,233 @@
|
|||
Do you ever check in original versions of files to `git-annex`, but then
|
||||
convert them in some way? Maybe you check in original photos from a camera,
|
||||
but then change them to a more useful file format, or smaller resolution.
|
||||
Or you clip a video file. Or you crunch some data to a result.
|
||||
|
||||
If you check the computed file into `git-annex` too, and store it on
|
||||
your remotes along with the original, that's a waste of disk space.
|
||||
But it is so convenient to be able to `git-annex get` the computed file.
|
||||
|
||||
The [[compute special remote|special_remotes/compute]] is the solution to
|
||||
this. It "stores" the computed file by remembering how to compute it from
|
||||
input files. When you `git-annex get` the computed file from it, it re-runs
|
||||
the computation on the original input file to produced the computed file.
|
||||
|
||||
[[!toc ]]
|
||||
|
||||
## using the compute special remote
|
||||
|
||||
There are many compute programs that each handle some type of computation,
|
||||
and it's pretty easy to write your own compute program too. In this tip,
|
||||
we'll use [[special_remotes/compute/git-annex-compute-imageconvert]],
|
||||
which uses imagemagick to convert between image formats.
|
||||
|
||||
To follow along, install that program in PATH (and remember to make it
|
||||
executable!) and make sure you have
|
||||
[imagemagick](https://www.imagemagick.org/) installed.
|
||||
|
||||
First, initialize a compute remote:
|
||||
|
||||
# git-annex initremote imageconvert type=compute program=git-annex-compute-imageconvert
|
||||
|
||||
Now suppose you have a file `foo.jpeg`, and you want to add a computed
|
||||
`foo.gif` to the git-annex repository.
|
||||
|
||||
# git-annex addcomputed --to=imageconvert foo.jpeg foo.gif
|
||||
|
||||
(The syntax of the `git-annex addcomputed` command will vary depending on the
|
||||
program that a compute remote uses. Some may have multiple input files, or
|
||||
multiple ouput files, or other options to control the computation. See
|
||||
the documentation of each compute program for details.)
|
||||
|
||||
Now you have `foo.gif` and can use it as usual, including copying it to
|
||||
other remotes. But it's already "stored" in the imageconvert remote,
|
||||
as a computation. So to free up space, you can drop it:
|
||||
|
||||
# git-annex drop foo.gif
|
||||
drop foo.gif ok
|
||||
|
||||
By the way, you can also add a computed file to the repository
|
||||
without bothering to compute it yet! Just use `--fast`:
|
||||
|
||||
# git-annex addcomputed --fast --to=imageconvert bar.jpeg bar.gif
|
||||
|
||||
Now suppose you're in another clone of this same repository, and you want
|
||||
these gifs.
|
||||
|
||||
# git-annex get foo.gif
|
||||
get foo.gif (not available)
|
||||
Maybe enable some of these special remotes (git annex enableremote ...):
|
||||
8332f7ad-d54e-435e-803b-138c1cfa7b71 -- imageconvert
|
||||
failed
|
||||
|
||||
With [[special_remotes/compute/git-annex-compute-imageconvert]] and
|
||||
imagemagic installed, all you need to do is enable the special remote to
|
||||
get the computed files from it:
|
||||
|
||||
# git-annex enableremote imageconvert
|
||||
# git-annex get foo.gif
|
||||
get foo.gif (from imageconvert...)
|
||||
(getting input foo.jpeg from origin...)
|
||||
ok
|
||||
|
||||
Notice that, when the input file is not present in the repository, getting
|
||||
a file from a compute remote will first get the input file.
|
||||
|
||||
That's the basics of using the compute special remote.
|
||||
|
||||
## recomputation
|
||||
|
||||
What happens if the input file `foo.gif` is changed to a new version?
|
||||
Will getting `foo.jpeg` from the compute remote base it on the new version
|
||||
too? No. `foo.gif` is stuck on the original version of the input file that
|
||||
was used to compute it.
|
||||
|
||||
But, it's easy to recompute the file with a new version of the input file.
|
||||
Just `git-annex add` the new version of the input file, and then:
|
||||
|
||||
# git-annex recompute foo.gif
|
||||
recompute foo.gif (foo.jpeg changed)
|
||||
ok
|
||||
|
||||
You can use commands like `git diff` and `git status` to see the
|
||||
change that made to `foo.gif`.
|
||||
|
||||
# git status --short foo.gif
|
||||
M foo.gif
|
||||
|
||||
Now both the new and old versions of `foo.gif` are stored in the
|
||||
imageconvert remote, and it can compute either as needed.
|
||||
|
||||
## reproducibility
|
||||
|
||||
You might be wondering, what happens if a computed file, such as `foo.gif`
|
||||
isn't exactly the same identical file each time it's computed? For example,
|
||||
what if there's a timestamp in there.
|
||||
|
||||
The answer is that, by default, files computed by a compute special remote
|
||||
are not required, or guaranteed to be bit-for-bit reproducible. One gif
|
||||
converted from a jpeg is much like any other converted from the same jpeg.
|
||||
|
||||
So git-annex usually treats all files computed in the same way from the
|
||||
same input as interchangeable. (Unless the compute program indicates
|
||||
that it produces reproducible files.)
|
||||
|
||||
Sometimes though, it's important that a file be bit-for-bit reproducible. And
|
||||
you can ask git-annex to enforce this for computed files.
|
||||
There is a `--reproducible` option for this, which you can pass to
|
||||
`git-annex addcomputed` or to `git-annex recompute`.
|
||||
|
||||
Let's switch the computed `foo.gif` to a reproducible file:
|
||||
|
||||
# git-annex recompute --original --reproducible foo.gif
|
||||
recompute foo.gif
|
||||
ok
|
||||
|
||||
You can `git commit foo.gif` to store this change.
|
||||
|
||||
But first, let's check if that computation actually *is* reproducible.
|
||||
This is easy, just drop it and get it from the compute remote again:
|
||||
|
||||
# git-annex drop foo.gif
|
||||
drop foo.gif ok
|
||||
# git-annex get foo.gif --from imageconvert
|
||||
get foo.gif (from imageconvert...)
|
||||
ok
|
||||
|
||||
If it turned out that the computation was not reproducible, getting the
|
||||
file would fail, like this:
|
||||
|
||||
# git-annex get foo.gif --from imageconvert
|
||||
get foo.gif (from imageconvert...)
|
||||
Verification of content failed
|
||||
|
||||
This is because a reproducible file uses a regular [[backend]], which
|
||||
by default uses a hash to verify the content of the file.
|
||||
|
||||
If it does turn out that a file that was expected to be reproducible isn't,
|
||||
you can always convert it to an unreproducible file:
|
||||
|
||||
# git-annex recompute --original --unreproducible foo.gif
|
||||
recompute foo.gif
|
||||
ok
|
||||
|
||||
## writing your own compute programs
|
||||
|
||||
There is a whole little protocol that compute programs use to
|
||||
communicate with git-annex. It's all documented at
|
||||
[[design/compute_special_remote_interface]].
|
||||
|
||||
But it's really easy to write simple ones, and you don't need to
|
||||
dive into all the details to do it. Let's walk through the code
|
||||
to [[special_remotes/compute/git-annex-compute-imageconvert]],
|
||||
which at 14 lines, is about as simple as one can be.
|
||||
|
||||
#!/bin/sh
|
||||
|
||||
It's a shell script.
|
||||
|
||||
set -e
|
||||
|
||||
If it fails to read input from standard input, or if a command fails, it
|
||||
will exit nonzero.
|
||||
|
||||
if [ -z "$1" ] || [ -z "$2" ]; then
|
||||
echo "Specify the input image file, followed by the output image file." >&2
|
||||
echo "Example: foo.jpg foo.gif" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
It expects to be passed two parameters, which were "foo.jpeg" and "foo.gif" in
|
||||
the examples above. And it outputs some usage to stderr otherwise. That is
|
||||
displayed if the user runs `git-annex addcomputed` without the necessary
|
||||
filenames.
|
||||
|
||||
echo "INPUT $1"
|
||||
read input
|
||||
|
||||
It tells git-annex that the first filename is the input file. And git-annex
|
||||
replies by telling it *where* the content of the input file is. This is the
|
||||
path to a git-annex object file.
|
||||
|
||||
echo "OUTPUT $2"
|
||||
read output
|
||||
|
||||
It tells git-annex that the second filename is the output file. And git-annex
|
||||
replies by telling it the path it should write the output file to.
|
||||
|
||||
if [ -n "$input" ]; then
|
||||
|
||||
When `git-annex addcomputed --fast` is used, the program shouldn't actually
|
||||
read the input file or compute the output file. git-annex indicates this by
|
||||
not giving it a path to the input file. That's checked here.
|
||||
|
||||
convert "$input" "$output" >&2
|
||||
|
||||
This uses `convert` from imagemagick, and just converts the input file to
|
||||
the output file.
|
||||
|
||||
Notice that stdout from `convert` is redirected to stderr. This is done
|
||||
because the compute program is speaking this protocol with git-annex over
|
||||
stdin and stdout, and we don't want random program output to mess that up.
|
||||
|
||||
fi
|
||||
|
||||
Closing the `if` above.
|
||||
|
||||
And that's all!
|
||||
|
||||
Now you know almost enough to write your own compute program. Editing this one
|
||||
will be a good start.
|
||||
|
||||
**But first, a word about security.**
|
||||
|
||||
A user who enables a compute special remote and runs `git pull` followed by
|
||||
`git-annex get` is running the compute program with inputs under the control
|
||||
of anyone who has commit access to the repository.
|
||||
|
||||
So, it's important that your compute program be secure. Please see
|
||||
the section on security in [[design/compute_special_remote_interface]]
|
||||
for security considerations.
|
||||
|
||||
If you write a nice secure compute program, you can add it to the list
|
||||
in [[special_remotes/compute]] so other people can use it.
|
Loading…
Add table
Add a link
Reference in a new issue