75 lines
3.4 KiB
Markdown
75 lines
3.4 KiB
Markdown
Make --batch -J run multiple jobs at the same time, so big batch operations
|
|
can potentially run faster.
|
|
|
|
Currently, it does not, but it still accepts the combination of options.
|
|
|
|
Hmm, probably most users of this will use --json too, and then the output
|
|
for at least some commands already contains enough information so they can
|
|
keep different async jobs straight:
|
|
|
|
(echo foo; echo bar) | git-annex add --batch --json
|
|
{"command":"add","success":true,"key":"SHA256E-s30--ce0c29e173009d77fa8803fae163b7c85920248add208bc378d465cae3087962","file":"foo"}
|
|
{"command":"add","success":true,"key":"SHA256E-s30--ce0c29e173009d77fa8803fae163b7c85920248add208bc378d465cae3087962","file":"bar"}
|
|
|
|
Other commands not though; addurl does not include the url in its --json
|
|
output (though it could be made to without breaking back-compat).
|
|
|
|
Also the filename in --json output may get normalized to something
|
|
other than the input. Eg, git-annex add, for "./foo" outputs "foo".
|
|
(Due to relPathCwdToFile in batchStart, which is there to handle
|
|
absolute filepaths. So even if that case were changed, it would need to
|
|
normalize absolute filepaths.)
|
|
|
|
Maybe the json output should include a field that mirrors the input that
|
|
resulted in this line, be it a filename or an url, or a directory, or
|
|
whatever.
|
|
|
|
{"command":"add","input":["./foo"],"success":true,"key":"SHA256E-s30--ce0c29e173009d77fa8803fae163b7c85920248add208bc378d465cae3087962","file":"foo"}
|
|
|
|
> This is now implemented on the batchasync branch, for all commands
|
|
> that support json output.
|
|
|
|
(Note that, --batch currently does not operate on directories.
|
|
Because of the one line or reply per line of input rule.
|
|
Adding such a field would go a long way toward allowing that,
|
|
if it were wanted later.)
|
|
|
|
And some --batch commands like lookupkey do not support --json
|
|
(though could be made to).
|
|
|
|
Some subcommands in --batch mode ouput a blank line instead of
|
|
json when they don't need to do anything to handle the thing they were told
|
|
to act on (eg, getting a file that is already present). (copy, drop, get,
|
|
move, add, find, whereis). That's a bit of a problem since when it's async
|
|
there would be no indication what thing they skipped,
|
|
except for their other output about the things they didn't skip.
|
|
|
|
> I looked at datalad's handling of this as an example, and it currently
|
|
> ignores the blank line, not caring when nothing needs to be done.
|
|
> Which seems about what you'd expect.. The only real reason the blank
|
|
> line is there currently is so a program can write one batch line
|
|
> and read one batch line in reply and keep things straight.
|
|
>
|
|
> Would any program really care if git-annex didn't need to get a file
|
|
> that was already present or skipped over a file that other options
|
|
> told it not to operate on? Probably not. So, can probably get away with
|
|
> leaving the blank line as-is.
|
|
|
|
---
|
|
|
|
Making the changes to --json output to address the above stuff, if
|
|
possible, feels cleaner than wrapping input and output with a job id like
|
|
the ASYNC extension does. Those changes would also generally be an
|
|
improvement to json output in non-async mode.
|
|
|
|
> Ok, the input field is implemented, and it seems we can probably
|
|
> ignore the complication of the blank line for skipped input.
|
|
> So, it should be possible to actually implement the concurrent
|
|
> batch mode now. --[[Joey]]
|
|
|
|
[[!tag projects/datalad]]
|
|
|
|
> [[implemented|done]].
|
|
|
|
> See <https://git-annex/bramchable.com/tips/using_git-annex_from_your_program/>
|
|
> for some docs. --[[Joey]]
|