Commit graph

46609 commits

Author SHA1 Message Date
yarikoptic
1e6324c179 Added a comment: Any way to annotate what are input files? 2025-03-08 14:51:20 +00:00
Joey Hess
9d6c052c27
symlink, don't hardlink
hardlink can cause problems with unlocked files
2025-03-07 17:15:54 -04:00
Joey Hess
45d7f3ca4b
disconnect stdio for wasm binaries 2025-03-07 17:15:21 -04:00
Joey Hess
18be4910d8
use pwd and quote it
Seems more portable and safe
2025-03-07 16:06:37 -04:00
Joey Hess
5ef1c44e07
case 2025-03-07 16:03:35 -04:00
Joey Hess
10e36759bf
layout 2025-03-07 16:03:09 -04:00
Joey Hess
dcd7c207a8
layout 2025-03-07 16:02:43 -04:00
Joey Hess
2391c2802a
add git-annex-compute-wasmedge 2025-03-07 16:02:11 -04:00
Joey Hess
ed51924211
redirect command stdout to stderr
Otherwise it will be interpreted as compute program protocol
2025-03-07 16:01:27 -04:00
Joey Hess
2c6dce83de
make OUTPUT subdirs
Simplifies compute programs.
2025-03-07 14:57:12 -04:00
Joey Hess
b4becb7167
Merge branch 'master' of ssh://git-annex.branchable.com 2025-03-07 14:50:11 -04:00
Joey Hess
81ce4264df
compute: add response to OUTPUT
This allows rejecting output filenames that are outside the repository,
and also handles converting eg "-foo" to "./-foo" to prevent a command
that it's passed to interpreting the output filename as a dashed option.
2025-03-07 14:47:34 -04:00
Joey Hess
6a8e57f0e9
remove todo I just added
If a compute program does this, it has a security hole. Not git-annex.
2025-03-07 13:29:57 -04:00
Joey Hess
78045f8e4f
todo 2025-03-07 13:24:11 -04:00
jasonb@ab4484d9961a46440958fa1a528e0fc435599057
b0d4fe5dd0 2025-03-07 04:13:24 +00:00
yarikoptic
27ef1a47df initial report on slow thaw 2025-03-06 22:40:35 +00:00
Joey Hess
1f59545ad0
improve 2025-03-06 14:54:05 -04:00
Joey Hess
138421449e
add git-annex-compute-imageconvert 2025-03-06 14:47:22 -04:00
Joey Hess
825a648670
prefix output with ./ in example 2025-03-06 14:42:07 -04:00
Joey Hess
b835c8c937
no longer a draft 2025-03-06 14:29:07 -04:00
Joey Hess
6f78341fbf
Merge branch 'compute' 2025-03-06 14:23:58 -04:00
Joey Hess
e952753846
preparing to merge compute 2025-03-06 14:22:45 -04:00
Joey Hess
4979df54d5
update 2025-03-06 13:34:51 -04:00
jerome.charousset@86fd8ed1bf55902989d7e70a11c38cb3a444b72d
203a730e28 Added a comment: Special use case for Scientific application 2025-03-06 17:02:22 +00:00
Joey Hess
1e9bb30c4e
update 2025-03-06 12:52:12 -04:00
Joey Hess
c6c6e2632d
avoid unncessary git-annex branch changes for recompute and addcomputed 2025-03-06 12:41:30 -04:00
Joey Hess
ccc454a791
computation progress display 2025-03-05 13:46:06 -04:00
matrss
629ab3f836 Added a comment 2025-03-05 15:40:44 +00:00
bpoldrack
9f045ed494 Added a comment 2025-03-05 14:23:57 +00:00
msz
62ab16aef3 Tag copy_file_range todo with projects/INM7 (came from our cluster) 2025-03-05 13:35:19 +00:00
msz
f1efad3b94 Added a comment: DataLad exploration of the compute on demand space 2025-03-05 13:31:41 +00:00
msz
e4232791dd Added a comment 2025-03-05 11:27:39 +00:00
kenta
5137cb6d16 filled out bug description 2025-03-05 00:00:19 +00:00
Joey Hess
4a4a614b0d
OsPath build fixes 2025-03-04 15:50:15 -04:00
Joey Hess
17ce1b4e7b
mark unused parameter
While unused, it seems to make sense to keep it, since it explains what
the function is doing.
2025-03-04 15:46:30 -04:00
Joey Hess
2e77c2b762
update todo 2025-03-04 15:02:02 -04:00
Joey Hess
a2fc471e14
safer git sha object filename
Rather than use the filename provided by INPUT, which could come from user
input, and so could be something that looks like a dashed parameter,
use a .git/object/<sha> filename.

This avoids user input passing through INPUT and back out, with the file
path then passed to a command, which could do something unexpected with
a dashed parameter, or other special parameter.

Added a note in the design about being careful of passing user input to
commands. They still have to be careful of that in general, just not in
this case.
2025-03-04 14:54:13 -04:00
Joey Hess
1ee4d018f3
cycle detection 2025-03-04 14:06:55 -04:00
Joey Hess
51538fa0a8
improve error message when unable to get an input file
In this case, the compute program is run the same as if addcomputed --fast
were used, so it should succeed, without outputting a computed file.

computeInputsUnavailable is in ComputeState for simplicity, but it is
not serialized with the rest of the ComputeState.
2025-03-04 13:13:18 -04:00
Joey Hess
f4e0d6a043
update location log after getting input file from remote 2025-03-04 12:51:38 -04:00
Joey Hess
4b6fabae65
better wording
Avoids this contradiction:

	(Auto enabling special remote foo...)

	  Not enabling compute special remote c2 because [..]
2025-03-04 12:43:50 -04:00
Joey Hess
4e6324131d
compute remote: get input files from other remotes
This needed some refactoring to avoid cycles, since Remote.Compute
cannot import Remote.List. Instead, it uses Annex.remotes. Which must be
populated by something else, but we know it has been, because something
is using Remote.Compute, which it must have found in the remote list,
which populates that.

In Remote.Compute, keyPossibilities' is called with all loggedLocations,
without the trustExclude DeadTrusted that keyLocations does. There is
another cycle there. This may be a problem if a dead repository is still
a remote.

This is missing cycle prevention, and it's certianly possible to make 2
files in the compute remote co-depend on one-another. Hopefully not in a
real world situation, but it an attacker could certainly do it. Cycle
prevention will need to be added to this.
2025-03-04 11:06:58 -04:00
Joey Hess
b395bd4f56
move showOutput into compute remote 2025-03-04 10:02:33 -04:00
Joey Hess
52f51d065a
rename config to annex.security.allowed-compute-programs
And require for enable as well as autoenable.

It seemed asking for trouble for `git-annex enable foo` to use whatever
compute program is stored in the git config, without verifying that the
user wants that program to be used.

Note that it would be good to allow `git-annex enable foo program=...`
to be used without the program being in the git config. Not implemented yet
though.
2025-03-03 16:12:03 -04:00
Joey Hess
f32d2aecce
autoenable security for compute special remote
Added annex.security.autoenable-compute-programs and only allow
autoenabling special remotes that use compute programs on that list.

The reason this is needed is a user might have some compute programs
that are less safe to use than others. They might want to use an unsafe
one only with one repository, where they are the only committer or other
committers are trusted. They might be ok with others being used by any
repository, and if so they can add them to the list.

Another reason would be a user who has installed a compute program by
accident. Eg, it might be included with git-annex at some point, or
pulled in by some dependency. That user doesn't necessarily want that
compute program to be used in an autoenabled special remote.
2025-03-03 15:52:56 -04:00
Joey Hess
89bfeada87
recompute: display one of the changed files 2025-03-03 15:12:19 -04:00
Joey Hess
b01a0d2323
avoid recomputing every time on git inputs 2025-03-03 14:56:49 -04:00
Joey Hess
a0d6a6ea2a
support git files as input to computations
Using GIT keys, like are used when exporting git files to special
remotes. Except here the GIT key refers to a file checked into the git
repo.

Note that, since the compute remote uses catObject to get the content,
a symlink that is checked into git does not get followed. This is important
for security, because following a symlink and adding the content to the
repo as an annex object would allow exfiltrating content from outside
the repository.

Instead, the behavior with a symlink is to run the computation on the
symlink target. This may turn out to be confusing, and it might be worth
addcomputed checking if the file in git is a symlink and erroring out.
Or it could follow symlinks as long as the destination is a file in the
repisitory.
2025-03-03 12:09:25 -04:00
Joey Hess
6ebab7fb00
factor out Annex.GitShaKey 2025-03-03 11:09:28 -04:00
Joey Hess
63d73d8d1b
record VURL key hashes in addcomputed and recompute 2025-03-03 10:57:56 -04:00