preparing to merge compute

2025-03-06 14:22:45 -04:00 · 2025-03-06 14:22:45 -04:00 · e952753846
commit e952753846
parent 4979df54d5
4 changed files with 111 additions and 2 deletions
--- a/4
+++ b/4
@ -1,5 +1,9 @@
 git-annex (10.20250116) UNRELEASED; urgency=medium

+  * Added the compute special remote.
+  * addcomputed: New command, adds a file that is generated by a compute
+    special remote.
+  * recompute: New command, recomputes computed files.
  * Support help.autocorrect settings "prompt", "never", and "immediate".
  * Allow setting remote.foo.annex-tracking-branch to a branch name
    that contains "/", as long as it's not a remote tracking branch.
--- a/doc/todo/compute_special_remote/comment_21_2546562f7a00e082cd0500debc904cf3._comment
+++ b/doc/todo/compute_special_remote/comment_21_2546562f7a00e082cd0500debc904cf3._comment
@ -0,0 +1,22 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""Re: DataLad exploration of the compute on demand space"""
+ date="2025-03-06T17:39:04Z"
+ content="""
+Thanks for explaining the design points of datalad-remake. Some
+different design choices than I have made, but mostly they strike me as
+implementing what is easier/possible from outside git-annex. 
+
+Eg, storing the compute inputs under `.datalad` in the branch is fine --
+and might even be useful if you want to make a branch that changes
+something in there -- but of course in the git-annex implementation it
+stores the equvilant thing in the git-annex branch.
+
+I do hope I'm not closing off the design space from such differences
+by dropping a compute special remote right into git-annex. But I also
+expect that having a standard and easy way for at least simple
+computations will lead to a lot of contributions as others use it.
+
+Your fMRI case seems like one that my compute remote could handle well
+and easily.
+"""]]
--- a/doc/todo/compute_special_remote/comment_22_d1561153a3916411ed8caa92fa53893c._comment
+++ b/doc/todo/compute_special_remote/comment_22_d1561153a3916411ed8caa92fa53893c._comment
@ -0,0 +1,69 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 22"""
+ date="2025-03-06T17:54:50Z"
+ content="""
+I've merged the compute special remote now.
+See [[special_remotes/compute]], [[git-annex-addcomputed]]
+and [[git-annex-recompute]].
+
+I have opened [[todo/compute_special_remote_remaining_todos]] with
+some various ways that I want to improve it further. Including, notably,
+computing on inputs from submodules, which is not currently supported at
+all.
+
+----
+
+Here I'll go down mih's original and quite useful design criteria and see
+how the compute special remote applies to them:
+
+### Generate annex keys (that have never existed)
+
+`git-annex addcomputed --fast`
+
+### Re-generate annex keys
+
+`git-annex addcomputed` optionally with the --reproducible option,
+followed by a later `git-annex get`
+
+Another thing that fits under this heading is when one of the original
+input files has gotten modified, and you want to compute a new version of
+the output file from it, using the same method as was used to compute it
+before. That's `git-annex recompute $output_file`
+
+### Worktree provisioning?
+
+This is the main thing I didn't implement. Given that git-annex is working
+with large files and needs to support various filesystems and OS's that
+lack hardlinks and softlinks, it's hard to do this inexpensively.
+
+Also, it turned out to make sense for the compute program to request
+the input files it needs, since this lets git-annex learn what the input
+files are, so it can make them available when regenerating a computed file
+later. And so the protocol just has git-annex respond with the path to
+the content of the file.
+
+### Request one key, receive many
+
+This is supported. (So is using multiple inputs to produce one (or more)
+outputs.)
+
+### Instruction deposition
+
+`git-annex addcomputed`
+
+### Storage redundancy tests
+
+It did make sense to have it automatically `git-annex get` the inputs.
+Well, I think it makes sense in most cases, this may become a tunable
+setting of the compute special remote.
+
+### Trust
+
+Handled by requiring the user install a `git-annex-compute-foo` command
+in PATH, and provide the name of the command to `initremote`.
+
+And for later `enableremote` or `autoenable=true`, it will only
+allow programs that are listed in the annex.security.allowed-compute-programs
+git config.
+"""]]
--- a/doc/todo/compute_special_remote_remaining_todos.mdwn
+++ b/doc/todo/compute_special_remote_remaining_todos.mdwn
@ -1,3 +1,19 @@
+This is the remainder of my todo list while I was building the
+compute special remote. --[[Joey]]
+
+* write a tip showing how to use this
+
+* Write some simple compute programs so we have something to start with.
+
+  - convert between images eg jpeg to png
+  - run a command in a singularity container (that is one of the inputs)
+  - run a wasm binary (that is one of the inputs)
+
+* compute on input files in submodules
+
+* annex.diskreserve can be violated if getting a file computes it but also
+  some other output files, which get added to the annex.
+
 * would be nice to have a way to see what computations are used by a
  compute remote for a file. Put it in `whereis` output? But it's not an
  url. Maybe a separate command? That would also allow querying for eg,
@ -27,8 +43,6 @@
  So it, seems that, for this to be done, recompute would need to stage the
  pointer file.

-* compute on files in submodules
-
 * recompute could ingest keys for other files than the one being
  recomputed, and remember them. Then recomputing those files could just
  use those keys, without re-running a computation. (Better than --others