tip documenting mostly --batch --json including with -J
This commit is contained in:
parent
b6642dde8a
commit
38e9fc0161
1 changed files with 100 additions and 0 deletions
100
doc/tips/using_git-annex_from_your_program.mdwn
Normal file
100
doc/tips/using_git-annex_from_your_program.mdwn
Normal file
|
@ -0,0 +1,100 @@
|
|||
Want to write a program that uses git-annex? Lots of people have, see
|
||||
[[related_software]] for a list. This tip is an overview of ways git-annex
|
||||
facilitates being used by other programs.
|
||||
|
||||
## batch mode communication
|
||||
|
||||
Many git-annex commands have a --batch option. This is handy if your
|
||||
program is operating on a lot of files; rather than running git-annex once
|
||||
per file, you can construct batch pipelines and probably it will run a lot
|
||||
faster.
|
||||
|
||||
The way the --batch option typically works is, it makes the command read
|
||||
lines from standard input. Each line is a filename or whatever other thing
|
||||
the command operates on. It will reply with whatever output it usually
|
||||
outputs.
|
||||
|
||||
For example:
|
||||
|
||||
git-annex get --batch
|
||||
foo
|
||||
get foo (from origin...)
|
||||
(checksum...) ok
|
||||
bar
|
||||
|
||||
baz
|
||||
get baz (from origin...)
|
||||
(checksum...) ok
|
||||
|
||||
Notice the blank line it replies in response to "bar"? That's because,
|
||||
in this example, the file "bar" is already present, and it does not need to
|
||||
do anything to get it. Normally, git-annex silently skips files it does not
|
||||
need to operate on, but in batch mode, it will reply with a blank line when
|
||||
there's nothing to do for a given input line.
|
||||
|
||||
## JSON output
|
||||
|
||||
Notice that git-annex happened to output 2 lines per file in the example
|
||||
above. But it could output any number of lines. How can your program know
|
||||
when the output for one batch item is complete? Let alone parse it to
|
||||
determine if it succeeded or failed? Some git-annex commands don't have
|
||||
this problem, and are documented to output exactly one line, in a specific
|
||||
format, in batch mode.
|
||||
|
||||
For the rest, the answer is `--json`. Use it with `--batch` and now each
|
||||
batch mode request results in a json object being output:
|
||||
|
||||
git-annex get --batch --json
|
||||
foo
|
||||
{"command":"get","note":"from origin...\nchecksum...","success":true,"input":["foo"],"key":"SHA256E-s30--f888da7dd6c0d6f37e3847f390d848c9a8e1e2d876865a91aca7e5a6a83715e0","file":"foo"}
|
||||
bar
|
||||
|
||||
baz
|
||||
{"command":"get","note":"from origin...\nchecksum...","success":true,"input":["baz"],"key":"SHA256E-s1048576000--da87281c9f9ab6cef8f9362935f4fc864db94606d52212614894f1253461a762","file":"baz"}
|
||||
|
||||
Notice that it still outputs a blank line when there is nothing to do
|
||||
for a request, so be prepared for that in your JSON parser.
|
||||
|
||||
There are also `--json-progress`, which adds more JSON messages giving
|
||||
progress of transfers, and `--json-error-messages` which makes some error
|
||||
messages be included in JSON objects instead of going to stderr.
|
||||
|
||||
The format of git-annex's JSON output is not documented in full,
|
||||
because it varies from command to command. The shape is typically the same,
|
||||
but a few commands have a more custom JSON. Try a command and see
|
||||
what JSON it outputs and go from there. Fields won't be removed or renamed,
|
||||
but new ones might be added from time to time.
|
||||
|
||||
## concurrency and batch mode
|
||||
|
||||
If you use `-J` with `--batch`, some git-annex commands do support that,
|
||||
and will handle multiple batch requests concurrently.
|
||||
|
||||
Suppose you want to get files concurrently in batch mode as the user clicks
|
||||
on them, and display to the user in your GUI when each file transfer is
|
||||
complete. But a problem: How to know which file a JSON reply corresponds
|
||||
to, now that they are not always in the same order as you sent the
|
||||
requests?
|
||||
|
||||
git-annex get --batch -J3 --json
|
||||
./foo
|
||||
bar
|
||||
baz
|
||||
|
||||
{"command":"get","note":"from origin...\nchecksum...","success":true,"input":["baz"],"key":"SHA256E-s1048576000--da87281c9f9ab6cef8f9362935f4fc864db94606d52212614894f1253461a762","file":"baz"}
|
||||
{"command":"get","note":"from origin...\nchecksum...","success":true,"input":["./foo"],"key":"SHA256E-s30--f888da7dd6c0d6f37e3847f390d848c9a8e1e2d876865a91aca7e5a6a83715e0","file":"foo"}
|
||||
|
||||
You might think the "file" field is the thing to look at. And it can work.
|
||||
But, the example above shows a way it can fail to work. `./foo` was
|
||||
requested, but that got normalized internally, and the response has
|
||||
`"file":"foo"`
|
||||
|
||||
And looking at the "file" field won't help with other git-annex commands,
|
||||
such as addurl, where you don't request filenames.
|
||||
|
||||
What will always work is to look at the "input" field. This is
|
||||
always the exact input that git-annex was operating on when it output a
|
||||
JSON object. (Older versions of git-annex don't include that field, but
|
||||
those old versions don't run `--batch` concurrently either,
|
||||
so if it's omitted, you can assume the JSON objects are in the same order
|
||||
as you made requests.)
|
Loading…
Reference in a new issue