preparing to merge compute
This commit is contained in:
parent
4979df54d5
commit
e952753846
4 changed files with 111 additions and 2 deletions
|
@ -1,5 +1,9 @@
|
|||
git-annex (10.20250116) UNRELEASED; urgency=medium
|
||||
|
||||
* Added the compute special remote.
|
||||
* addcomputed: New command, adds a file that is generated by a compute
|
||||
special remote.
|
||||
* recompute: New command, recomputes computed files.
|
||||
* Support help.autocorrect settings "prompt", "never", and "immediate".
|
||||
* Allow setting remote.foo.annex-tracking-branch to a branch name
|
||||
that contains "/", as long as it's not a remote tracking branch.
|
||||
|
|
|
@ -0,0 +1,22 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""Re: DataLad exploration of the compute on demand space"""
|
||||
date="2025-03-06T17:39:04Z"
|
||||
content="""
|
||||
Thanks for explaining the design points of datalad-remake. Some
|
||||
different design choices than I have made, but mostly they strike me as
|
||||
implementing what is easier/possible from outside git-annex.
|
||||
|
||||
Eg, storing the compute inputs under `.datalad` in the branch is fine --
|
||||
and might even be useful if you want to make a branch that changes
|
||||
something in there -- but of course in the git-annex implementation it
|
||||
stores the equvilant thing in the git-annex branch.
|
||||
|
||||
I do hope I'm not closing off the design space from such differences
|
||||
by dropping a compute special remote right into git-annex. But I also
|
||||
expect that having a standard and easy way for at least simple
|
||||
computations will lead to a lot of contributions as others use it.
|
||||
|
||||
Your fMRI case seems like one that my compute remote could handle well
|
||||
and easily.
|
||||
"""]]
|
|
@ -0,0 +1,69 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 22"""
|
||||
date="2025-03-06T17:54:50Z"
|
||||
content="""
|
||||
I've merged the compute special remote now.
|
||||
See [[special_remotes/compute]], [[git-annex-addcomputed]]
|
||||
and [[git-annex-recompute]].
|
||||
|
||||
I have opened [[todo/compute_special_remote_remaining_todos]] with
|
||||
some various ways that I want to improve it further. Including, notably,
|
||||
computing on inputs from submodules, which is not currently supported at
|
||||
all.
|
||||
|
||||
----
|
||||
|
||||
Here I'll go down mih's original and quite useful design criteria and see
|
||||
how the compute special remote applies to them:
|
||||
|
||||
### Generate annex keys (that have never existed)
|
||||
|
||||
`git-annex addcomputed --fast`
|
||||
|
||||
### Re-generate annex keys
|
||||
|
||||
`git-annex addcomputed` optionally with the --reproducible option,
|
||||
followed by a later `git-annex get`
|
||||
|
||||
Another thing that fits under this heading is when one of the original
|
||||
input files has gotten modified, and you want to compute a new version of
|
||||
the output file from it, using the same method as was used to compute it
|
||||
before. That's `git-annex recompute $output_file`
|
||||
|
||||
### Worktree provisioning?
|
||||
|
||||
This is the main thing I didn't implement. Given that git-annex is working
|
||||
with large files and needs to support various filesystems and OS's that
|
||||
lack hardlinks and softlinks, it's hard to do this inexpensively.
|
||||
|
||||
Also, it turned out to make sense for the compute program to request
|
||||
the input files it needs, since this lets git-annex learn what the input
|
||||
files are, so it can make them available when regenerating a computed file
|
||||
later. And so the protocol just has git-annex respond with the path to
|
||||
the content of the file.
|
||||
|
||||
### Request one key, receive many
|
||||
|
||||
This is supported. (So is using multiple inputs to produce one (or more)
|
||||
outputs.)
|
||||
|
||||
### Instruction deposition
|
||||
|
||||
`git-annex addcomputed`
|
||||
|
||||
### Storage redundancy tests
|
||||
|
||||
It did make sense to have it automatically `git-annex get` the inputs.
|
||||
Well, I think it makes sense in most cases, this may become a tunable
|
||||
setting of the compute special remote.
|
||||
|
||||
### Trust
|
||||
|
||||
Handled by requiring the user install a `git-annex-compute-foo` command
|
||||
in PATH, and provide the name of the command to `initremote`.
|
||||
|
||||
And for later `enableremote` or `autoenable=true`, it will only
|
||||
allow programs that are listed in the annex.security.allowed-compute-programs
|
||||
git config.
|
||||
"""]]
|
|
@ -1,3 +1,19 @@
|
|||
This is the remainder of my todo list while I was building the
|
||||
compute special remote. --[[Joey]]
|
||||
|
||||
* write a tip showing how to use this
|
||||
|
||||
* Write some simple compute programs so we have something to start with.
|
||||
|
||||
- convert between images eg jpeg to png
|
||||
- run a command in a singularity container (that is one of the inputs)
|
||||
- run a wasm binary (that is one of the inputs)
|
||||
|
||||
* compute on input files in submodules
|
||||
|
||||
* annex.diskreserve can be violated if getting a file computes it but also
|
||||
some other output files, which get added to the annex.
|
||||
|
||||
* would be nice to have a way to see what computations are used by a
|
||||
compute remote for a file. Put it in `whereis` output? But it's not an
|
||||
url. Maybe a separate command? That would also allow querying for eg,
|
||||
|
@ -27,8 +43,6 @@
|
|||
So it, seems that, for this to be done, recompute would need to stage the
|
||||
pointer file.
|
||||
|
||||
* compute on files in submodules
|
||||
|
||||
* recompute could ingest keys for other files than the one being
|
||||
recomputed, and remember them. Then recomputing those files could just
|
||||
use those keys, without re-running a computation. (Better than --others
|
Loading…
Add table
Add a link
Reference in a new issue