From 2f11c65491c7680c2061b1e46815b9a5adb68a10 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 19 Feb 2025 15:14:52 -0400 Subject: [PATCH 1/3] comments --- ..._021858e8032eca84488ec2324ec25a6f._comment | 13 ++++ ..._fcba8049e659d3238b9f83286777f71f._comment | 65 +++++++++++++++++++ 2 files changed, 78 insertions(+) create mode 100644 doc/todo/compute_special_remote/comment_18_021858e8032eca84488ec2324ec25a6f._comment create mode 100644 doc/todo/compute_special_remote/comment_19_fcba8049e659d3238b9f83286777f71f._comment diff --git a/doc/todo/compute_special_remote/comment_18_021858e8032eca84488ec2324ec25a6f._comment b/doc/todo/compute_special_remote/comment_18_021858e8032eca84488ec2324ec25a6f._comment new file mode 100644 index 0000000000..4740ce806f --- /dev/null +++ b/doc/todo/compute_special_remote/comment_18_021858e8032eca84488ec2324ec25a6f._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 18""" + date="2025-02-19T18:29:58Z" + content=""" +I've started a `compute` branch which so far has documentation for +the [compute special remote](http://source.git-annex.branchable.com/?p=source.git;a=blob;f=doc/special_remotes/compute.mdwn;hb=refs/heads/compute), +[git-annex addcomputed](http://source.git-annex.branchable.com/?p=source.git;a=blob;f=doc/git-annex-addcomputed.mdwn;hb=refs/heads/compute), +and +[git-annex recompute](http://source.git-annex.branchable.com/?p=source.git;a=blob;f=doc/git-annex-recompute.mdwn;hb=refs/heads/compute) + +I am pretty happy with how this design is shaping up. +"""]] diff --git a/doc/todo/compute_special_remote/comment_19_fcba8049e659d3238b9f83286777f71f._comment b/doc/todo/compute_special_remote/comment_19_fcba8049e659d3238b9f83286777f71f._comment new file mode 100644 index 0000000000..f2e04df5a7 --- /dev/null +++ b/doc/todo/compute_special_remote/comment_19_fcba8049e659d3238b9f83286777f71f._comment @@ -0,0 +1,65 @@ +[[!comment format=mdwn + username="joey" + subject="""open questions""" + date="2025-02-19T18:39:41Z" + content=""" +One thing that I am unsure about is what should happen if `git-annex get foo` +needs the content of file `bar`, which is not present. Should it get `bar` from +a remote? Or should it fail to get `foo`? + +Consider that, in the case of `git-annex get foo --from computeremote`, the +user has asked it to get a file from that particular remote, not from +whatever remote contains `bar`. + +If the same compute remote can also compute `bar`, it seems quite reasonable +for `git-annex get foo --from computeremote` to also compute bar. (This is +similar to a single computation that generates two output files, in which +case getting one of them will get both of them.) + +And it seems reasonable for `git-annex get foo` with no specified remote +to also get or compute bar, from whereever. + +But, there is no way at the level of a special remote to tell the +difference between those two commands. + +Maybe the right answer is to define getting a file from a compute +special remote as including getting its inputs from other remotes. +Preferring getting them from the same compute special remote when possible, +and when not, using the lowest cost remote that works, same as `git-annx +get` does. + +---- + +A related problem is that, `foo` might be fairly small, but `bar` very +large. So getting a small object can require getting or generating other +large objects. Getting `bar` might fail because there is not enough space +to meet annex.diskreserve. Or the user might just be surprised that so much +disk space was eaten up. But dropping `bar` after computing `foo` also +doesn't seem like a good idea; the user might want to hang onto their copy +now that they have it, or perhaps move it to some faster remote. + +Maybe preferred content is the solution? After computing `foo` with `bar`, +keep the copy of `bar` if the local repository wants it, drop it otherwise. + +---- + +Progress display is also going to be complicated for this. There is no +way in the special remote interface to display the progress for `bar` +while getting `foo`. + +Probably the thing to do would be to add together the sizes of both files, +and display a combined progress meter. +It would be ok to not say when it's getting the input file. +This will need a way to set the size for a progress display to larger +than the size of the key. + +---- + +.... All 3 problems above go away if it doesn't automatically get input files +before computations and the computations instead just fail with an error +saying the input file is not present. + +But then consider the case where you just want every file in the repository. +`git-annex get .` failing to compute some files because their input files +happen to come after them in the directory listing is not good. +"""]] From a2fa2a8c5f526a5a2b4389daeb7dcb1c13b216e7 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 19 Feb 2025 16:03:34 -0400 Subject: [PATCH 2/3] update --- .../comment_19_fcba8049e659d3238b9f83286777f71f._comment | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/todo/compute_special_remote/comment_19_fcba8049e659d3238b9f83286777f71f._comment b/doc/todo/compute_special_remote/comment_19_fcba8049e659d3238b9f83286777f71f._comment index f2e04df5a7..e4631c5a39 100644 --- a/doc/todo/compute_special_remote/comment_19_fcba8049e659d3238b9f83286777f71f._comment +++ b/doc/todo/compute_special_remote/comment_19_fcba8049e659d3238b9f83286777f71f._comment @@ -28,6 +28,10 @@ Preferring getting them from the same compute special remote when possible, and when not, using the lowest cost remote that works, same as `git-annx get` does. +Or this could be a configuration of the compute special remote. Maybe some +would want to always get source files, and others would want to never get +source files? + ---- A related problem is that, `foo` might be fairly small, but `bar` very From 4f3d9f8115b38569efac5331d333c13f0ebaa37f Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 20 Feb 2025 13:27:59 -0400 Subject: [PATCH 3/3] update --- .../compute_special_remote_interface.mdwn | 26 +++++++++++-------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/doc/design/compute_special_remote_interface.mdwn b/doc/design/compute_special_remote_interface.mdwn index f82fdc22c5..8b1a732e7a 100644 --- a/doc/design/compute_special_remote_interface.mdwn +++ b/doc/design/compute_special_remote_interface.mdwn @@ -51,12 +51,16 @@ In the example above, the program is expected to output something like: If possible, the program should write the content of the file it is computing directly to the file listed in COMPUTING, rather than writing to -somewhere else and renaming it at the end. If git-annex sees that the file -corresponding to the key it requested be computed is growing, it will use -its file size when displaying progress to the user. +somewhere else and renaming it at the end. Except, when the program writes +the file it computes out of order, it should write to a file somewhere else +and rename it at the end. + +If git-annex sees that the file corresponding to the key it requested be +computed is growing, it will use its file size when displaying progress to +the user. The program can also output lines to stdout to indicate its current -progress. +progress: PROGRESS 50% @@ -67,23 +71,23 @@ output, but not for progress displays. If the program exits nonzero, nothing it computed will be stored in the git-annex repository. -The program must also support listing the inputs and outputs that it +When run with the "interface" parameter, the program must describe its +interface. This is a list of the inputs and outputs that it supports. This allows `git-annex addcomputed` and `git-annex initremote` to list inputs and outputs, and also lets them reject invalid inputs and outputs. -In this mode, the program is run with a "list" parameter. -It should output lines, in the form: +The output is lines, in the form: - INPUT[?] Name Description - VALUE[?] Name Description + INPUT[?] Id Description + VALUE[?] Id Description OUTPUT Id Description Use "INPUT" when a file is an input to the computation, and "VALUE" for all other input values. Use "INPUT?" and "VALUE?" for optional inputs and values. -The program can also optionally output a "REPRODUCIBLE" line. +The interface can also optionally include a "REPRODUCIBLE" line. That indicates that the results of its computations are expected to be bit-for-bit reproducible. That makes `git-annex addcomputed` behave as if the `--reproducible` @@ -93,7 +97,7 @@ An example `git-annex-compute-foo` shell script follows: #!/bin/sh set -e - if [ "$1" = list ]; then + if [ "$1" = interface ]; then echo "INPUT raw A photo in RAW format" echo "VALUE? passes Number of passes" echo "OUTPUT photo Computed JPEG"