git-annex

Author	SHA1	Message	Date
Joey Hess	e37bf6351f	avoid shadowing warning	2025-03-19 14:46:24 -04:00
Joey Hess	b158e067c0	avoid reloading trust log	2025-03-19 09:44:44 -04:00
Joey Hess	70cb93a66b	checkPresent of compute remote checks inputs are available If an input file has been lost from all repositories, it is no longer possible to compute the output. This will avoid dropping content that was computed in such a situation, as well as making git-annex fsck --from the compute remote do its usual thing when content has gone missing. This implementation avoids recursing forever if there is a cycle, which should not be possible anyway. Note the use of RemoteStateHandle as a constructor here suggests that this may not handle sameas remotes right, since usually a RemoteStateHandle is constructed using the sameas uuid for a sameas remote. That assumes a compute remote can even have or be a sameas remote. Which doesn't seem to make sense, so I have not thought through what might happen here in detail.	2025-03-18 14:13:13 -04:00
Joey Hess	5f269513af	buffer responses to compute programs in a TQueue This avoids a potential problem where the program sends several INPUT before reading responses, so flushing the respose to the pipe could block. It's unlikely, but seemed worth making sure it can't happen.	2025-03-11 12:40:21 -04:00
Joey Hess	0ee644b417	close off newline injection attacks against compute special remote protocol	2025-03-11 12:04:58 -04:00
Joey Hess	5760a15c7c	avoid error on missing compute state in checkKey This improves eg `git-annex move --to` a compute remote that does not contain the key. Rather than erroring with "Missing compute state" when it checks if the key is in the remote, it proceeds to trying to store to it, which has a nice error message.	2025-03-11 11:49:47 -04:00
Joey Hess	0477a8d098	add INPUT-REQUIRED Used by git-annex-compute-singularity to make addcomputed --fast work. Also, simplified git-annex-compute-singularity; there is no need to hard link the container into place. singularity does not care about the extension of the container, so can just pass it the annex object file.	2025-03-11 11:46:31 -04:00
Joey Hess	e0b7653495	added git-annex-compute-singularity And implemented SANDBOX, which it needs.	2025-03-10 16:41:26 -04:00
Joey Hess	657ff9a32e	compute protocol debugging	2025-03-10 15:14:59 -04:00
Joey Hess	9d9e34c187	compute: disallow output files that are not regular files Use case where this came up is a compute program using singularity, where the process inside the container will be allowed to write to the temp directory, so could make eg a /etc/shadow symlink, which could then be used to exfiltrate that from the system to wherever the annex object might be pushed to. It seemed better to fix this once in git-annex rather than in any such compute program.	2025-03-10 12:55:03 -04:00
Joey Hess	2c6dce83de	make OUTPUT subdirs Simplifies compute programs.	2025-03-07 14:57:12 -04:00
Joey Hess	81ce4264df	compute: add response to OUTPUT This allows rejecting output filenames that are outside the repository, and also handles converting eg "-foo" to "./-foo" to prevent a command that it's passed to interpreting the output filename as a dashed option.	2025-03-07 14:47:34 -04:00
Joey Hess	c6c6e2632d	avoid unncessary git-annex branch changes for recompute and addcomputed	2025-03-06 12:41:30 -04:00
Joey Hess	ccc454a791	computation progress display	2025-03-05 13:46:06 -04:00
Joey Hess	4a4a614b0d	OsPath build fixes	2025-03-04 15:50:15 -04:00
Joey Hess	a2fc471e14	safer git sha object filename Rather than use the filename provided by INPUT, which could come from user input, and so could be something that looks like a dashed parameter, use a .git/object/<sha> filename. This avoids user input passing through INPUT and back out, with the file path then passed to a command, which could do something unexpected with a dashed parameter, or other special parameter. Added a note in the design about being careful of passing user input to commands. They still have to be careful of that in general, just not in this case.	2025-03-04 14:54:13 -04:00
Joey Hess	1ee4d018f3	cycle detection	2025-03-04 14:06:55 -04:00
Joey Hess	51538fa0a8	improve error message when unable to get an input file In this case, the compute program is run the same as if addcomputed --fast were used, so it should succeed, without outputting a computed file. computeInputsUnavailable is in ComputeState for simplicity, but it is not serialized with the rest of the ComputeState.	2025-03-04 13:13:18 -04:00
Joey Hess	f4e0d6a043	update location log after getting input file from remote	2025-03-04 12:51:38 -04:00
Joey Hess	4b6fabae65	better wording Avoids this contradiction: (Auto enabling special remote foo...) Not enabling compute special remote c2 because [..]	2025-03-04 12:43:50 -04:00
Joey Hess	4e6324131d	compute remote: get input files from other remotes This needed some refactoring to avoid cycles, since Remote.Compute cannot import Remote.List. Instead, it uses Annex.remotes. Which must be populated by something else, but we know it has been, because something is using Remote.Compute, which it must have found in the remote list, which populates that. In Remote.Compute, keyPossibilities' is called with all loggedLocations, without the trustExclude DeadTrusted that keyLocations does. There is another cycle there. This may be a problem if a dead repository is still a remote. This is missing cycle prevention, and it's certianly possible to make 2 files in the compute remote co-depend on one-another. Hopefully not in a real world situation, but it an attacker could certainly do it. Cycle prevention will need to be added to this.	2025-03-04 11:06:58 -04:00
Joey Hess	b395bd4f56	move showOutput into compute remote	2025-03-04 10:02:33 -04:00
Joey Hess	52f51d065a	rename config to annex.security.allowed-compute-programs And require for enable as well as autoenable. It seemed asking for trouble for `git-annex enable foo` to use whatever compute program is stored in the git config, without verifying that the user wants that program to be used. Note that it would be good to allow `git-annex enable foo program=...` to be used without the program being in the git config. Not implemented yet though.	2025-03-03 16:12:03 -04:00
Joey Hess	f32d2aecce	autoenable security for compute special remote Added annex.security.autoenable-compute-programs and only allow autoenabling special remotes that use compute programs on that list. The reason this is needed is a user might have some compute programs that are less safe to use than others. They might want to use an unsafe one only with one repository, where they are the only committer or other committers are trusted. They might be ok with others being used by any repository, and if so they can add them to the list. Another reason would be a user who has installed a compute program by accident. Eg, it might be included with git-annex at some point, or pulled in by some dependency. That user doesn't necessarily want that compute program to be used in an autoenabled special remote.	2025-03-03 15:52:56 -04:00
Joey Hess	a0d6a6ea2a	support git files as input to computations Using GIT keys, like are used when exporting git files to special remotes. Except here the GIT key refers to a file checked into the git repo. Note that, since the compute remote uses catObject to get the content, a symlink that is checked into git does not get followed. This is important for security, because following a symlink and adding the content to the repo as an annex object would allow exfiltrating content from outside the repository. Instead, the behavior with a symlink is to run the computation on the symlink target. This may turn out to be confusing, and it might be worth addcomputed checking if the file in git is a symlink and erroring out. Or it could follow symlinks as long as the destination is a file in the repisitory.	2025-03-03 12:09:25 -04:00
Joey Hess	63d73d8d1b	record VURL key hashes in addcomputed and recompute	2025-03-03 10:57:56 -04:00
Joey Hess	2bd64059f1	record VURL key hashes when getting from compute remote Like when getting from the web special remote, when the output of the computation has changed, record the new hash of the content as an equivilant key for the VURL key. Still needs to be done for addcomputed and recompute.	2025-02-27 16:21:42 -04:00
Joey Hess	e6ae5e8d56	many recompute improvements I've lost track of them all, but it includes: * Using the same key backend as was used in the original computation. * Fixing bug that prevented updating the source file key in the compute state * Handling --reproducible and --unreproducible. * recompute --original of a file using VURL, when the result is different, but the key remains the same, makes the object file be updated with the new content * Detecting some other ways the program behavior can change, just for completeness. * Also adds --backend to addcomputed.	2025-02-27 15:18:27 -04:00
Joey Hess	53d107ca47	refactor	2025-02-26 14:05:37 -04:00
Joey Hess	3bec89a3c3	started git-annex recompute The perform action of this still needs work to do the right thing. In particular, it currently behaves as if --others was always set. And, it duplicates a lot of code from addcomputed.	2025-02-26 11:54:09 -04:00
Joey Hess	eed522a0f8	addcomputed inherits extra initremote parameters This is limited because the remote config is a field/value map. So order is not preserved, and when 2 parameters have the same field name, only the last one will be passed.	2025-02-26 09:45:35 -04:00
Joey Hess	e702cb94ff	add compute remote uuid to compute state url Otherwise, two different compute remotes that happen to take the same input would use the same compute state url. Which seems wrong.	2025-02-25 18:44:40 -04:00
Joey Hess	16f529c05f	addcomputed --fast and --unreproducible working For these, use VURL and URL keys, with an "annex-compute:" URI prefix. These URL keys will look something like this: URL--annex-compute&cbar4,63pconvert,3-f4d3d72cf3f16ac9c3e9a8012bde4462 Generally it's too long so most of it gets md5summed. It's a little ugly, but it's what fell out of the existing URL key generation machinery. I did consider special casing to eg "URL--annex-compute&c4d3d72cf3f16ac9c3e9a8012bde4462". But it seems at least possibly useful that the name of the file that was computed is visible and perhaps one or two words of the git-annex compute command parameters. Note that two different output files from the same computation will get the same URL key. And these keys should remain stable.	2025-02-25 16:43:15 -04:00
Joey Hess	a154e91513	add git-annex addcomputed Working pretty well. Mostly. But: * Does not yet support inputs that are non-annexed files checked into git * --fast is currently broken (will need something like VURL keys) * --unreproducible still uses a checksumming backend, so drop and get again will likely fail (needs probably to use an URL key or something like one) The compute special remote seems to work pretty well too. Eg, getting from it works, and dropping content that is present in it works.	2025-02-25 15:50:08 -04:00
Joey Hess	2e1fe1620e	handle comutations in subdirs of the git repository Eg, a computation might be run in "foo/" and refer to "../bar" as an input or output. So, the subdir is part of the computation state. Also, prevent input or output of files that are outside the git repository. Of course, the program can access any file on disk if it wants to; this is just a guard against mistakes. And it may also be useful if the program comunicates with something less trusted than it, eg a container image, so input/output files communicated by that are not the source of security problems.	2025-02-25 15:08:38 -04:00
Joey Hess	ce05a92ee7	add field desc	2025-02-24 16:41:02 -04:00
Joey Hess	40be51c98a	reimplement using new compute program interface	2025-02-24 16:01:03 -04:00
Joey Hess	e0b46ef7ad	compute special remote mostly implemented Except for some of the hard parts: progress displays, incremental verification, and getting inputs before running a computation. Untested! In order to test this, git-annex addcomputed needs to be implemented.	2025-02-21 15:02:53 -04:00
Joey Hess	e897229088	wip	2025-02-20 17:23:15 -04:00
Joey Hess	c1b53dbbd0	wip	2025-02-20 13:27:47 -04:00

40 commits