This commit is contained in:
Joey Hess 2025-01-27 12:19:16 -04:00
parent 02c792b724
commit 754c0a001b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -0,0 +1,74 @@
[[!comment format=mdwn
username="joey"
subject="""comment 9"""
date="2025-01-27T14:46:43Z"
content="""
Circling back to this, I think the fork in the road is whether this is
about git-annex providing this and that feature to support external special
remotes that compute, or whether git-annex gets a compute special
remote of its own with some simpler/better extension interface
than the external special remote protocol.
Of course, git-annex having its own compute special remote would not
preclude other external special remotes that compute. And for that matter,
a single external special remote could implement an extension interface.
---
Thinking about how a generic compute special remote in git-annex could
work, multiple instances of it could be initremoted:
git-annex initremote convertfiles type=compute program=csv-to-xslx
git-annex initremote cutvideo type=compute program=ffmpeg-cut
Here the "program" parameter would cause a program like
`git-annex-compute-ffmpeg-cut` to be run to get files from that instance
of the compute special remote. The interface could be as simple as it
being run with the key that it is requested to compute, and outputting
the paths to the all keys it was able to compute. (So allowing for
"request one key, receive many".) Perhaps also with some way to indicate
progess of the computation.
It would make sense to store the details of computations in git-annex
metadata. And a compute program can use git-annex commands to get files
it depends on. Eg, `git-annex-compute-ffmpeg-cut` could run:
# look up the configured metadata
starttime=$(git-annex metadata --get compute-ffmpeg-starttime --key=$requested)
endtime=$(git-annex metadata --get compute-ffmpeg-endtime --key=$requested)
source=$(git-annex metadata --get compute-ffmpeg-source --key=$requested)
# get the source video file
git-annex get --key=$source
git-annex examinekey --format='${objectpath}' $source
It might be worth formalizing that a given computed key can depend on other
keys, and have git-annex always get/compute those keys first.
When asked to store a key in the compute special remote, it would verify
that the key can be generated by it. Using the same interface as used to
get a key.
This all leaves a chicken and egg problem, how does the user add a computed
file if they don't know the key yet?
The user could manually run the commands that generate the computed file,
then `git-annex add` it, and set the metadata. Then `git-annex copy --to`
the compute remote would verify if the file can be generated, and add it if
so. This seems awkward, but also nice to be able to do manually.
Or, something like VURL keys could be used, with an interface something
like this:
git-annex addcomputed foo --to ffmpeg-cut
--input compute-ffmpeg-source=input.mov
--set compute-ffmpeg-starttime=15:00
--set compute-ffmpeg-endtime=30:00
All that would do is generate some arbitrary VURL key or similar,
provisionally set the provided metadata (how?), and try to store the key
in the compute special remote. If it succeeds, stage an annex pointer
and commit the metadata. Since it's a VURL key, storing the key in the
compute special remote would also record the hash of the generated file
at that point.
"""]]