git-annex

Author	SHA1	Message	Date
Joey Hess	51538fa0a8	improve error message when unable to get an input file In this case, the compute program is run the same as if addcomputed --fast were used, so it should succeed, without outputting a computed file. computeInputsUnavailable is in ComputeState for simplicity, but it is not serialized with the rest of the ComputeState.	2025-03-04 13:13:18 -04:00
Joey Hess	b395bd4f56	move showOutput into compute remote	2025-03-04 10:02:33 -04:00
Joey Hess	a0d6a6ea2a	support git files as input to computations Using GIT keys, like are used when exporting git files to special remotes. Except here the GIT key refers to a file checked into the git repo. Note that, since the compute remote uses catObject to get the content, a symlink that is checked into git does not get followed. This is important for security, because following a symlink and adding the content to the repo as an annex object would allow exfiltrating content from outside the repository. Instead, the behavior with a symlink is to run the computation on the symlink target. This may turn out to be confusing, and it might be worth addcomputed checking if the file in git is a symlink and erroring out. Or it could follow symlinks as long as the destination is a file in the repisitory.	2025-03-03 12:09:25 -04:00
Joey Hess	63d73d8d1b	record VURL key hashes in addcomputed and recompute	2025-03-03 10:57:56 -04:00
Joey Hess	e6ae5e8d56	many recompute improvements I've lost track of them all, but it includes: * Using the same key backend as was used in the original computation. * Fixing bug that prevented updating the source file key in the compute state * Handling --reproducible and --unreproducible. * recompute --original of a file using VURL, when the result is different, but the key remains the same, makes the object file be updated with the new content * Detecting some other ways the program behavior can change, just for completeness. * Also adds --backend to addcomputed.	2025-02-27 15:18:27 -04:00
Joey Hess	9c2c3002a6	fix recompute of renamed files When a computed file has been renamed, a recompute needs to write to the new filename. I decided to remove --others because it's not clear what it should do in the face of renames. Should it update only other files that have not been renamed? Or update files that use the old key to the new key anywhere in the tree? Or write the other files to the cwd, ignoring renames? Since --others is just a way to save on compute time, adding this complexity at this point seems like a bad idea. May revisit later. Added temporary TODO-compute file	2025-02-27 11:27:26 -04:00
Joey Hess	d6a010a615	recompute closer to working properly Proper behavior without --others implemented. And eliminated most of the code duplication through refactoring. Also, changed it to not stage recomputed files. This way, git diff will show files that have differences.	2025-02-26 15:52:52 -04:00
Joey Hess	53d107ca47	refactor	2025-02-26 14:05:37 -04:00
Joey Hess	3bec89a3c3	started git-annex recompute The perform action of this still needs work to do the right thing. In particular, it currently behaves as if --others was always set. And, it duplicates a lot of code from addcomputed.	2025-02-26 11:54:09 -04:00
Joey Hess	d49f371acc	showOutput when the compute program eg displays usage, it needs to start on its own line	2025-02-26 09:47:56 -04:00
Joey Hess	eed522a0f8	addcomputed inherits extra initremote parameters This is limited because the remote config is a field/value map. So order is not preserved, and when 2 parameters have the same field name, only the last one will be passed.	2025-02-26 09:45:35 -04:00
Joey Hess	a5b53fa98a	todo	2025-02-25 18:45:55 -04:00
Joey Hess	e702cb94ff	add compute remote uuid to compute state url Otherwise, two different compute remotes that happen to take the same input would use the same compute state url. Which seems wrong.	2025-02-25 18:44:40 -04:00
Joey Hess	71e92a509a	use compute program REPRODUCIBLE by default	2025-02-25 17:10:41 -04:00
Joey Hess	233a6954b9	ingest when --unreproducible is used without --fast	2025-02-25 17:04:19 -04:00
Joey Hess	16f529c05f	addcomputed --fast and --unreproducible working For these, use VURL and URL keys, with an "annex-compute:" URI prefix. These URL keys will look something like this: URL--annex-compute&cbar4,63pconvert,3-f4d3d72cf3f16ac9c3e9a8012bde4462 Generally it's too long so most of it gets md5summed. It's a little ugly, but it's what fell out of the existing URL key generation machinery. I did consider special casing to eg "URL--annex-compute&c4d3d72cf3f16ac9c3e9a8012bde4462". But it seems at least possibly useful that the name of the file that was computed is visible and perhaps one or two words of the git-annex compute command parameters. Note that two different output files from the same computation will get the same URL key. And these keys should remain stable.	2025-02-25 16:43:15 -04:00
Joey Hess	a154e91513	add git-annex addcomputed Working pretty well. Mostly. But: * Does not yet support inputs that are non-annexed files checked into git * --fast is currently broken (will need something like VURL keys) * --unreproducible still uses a checksumming backend, so drop and get again will likely fail (needs probably to use an URL key or something like one) The compute special remote seems to work pretty well too. Eg, getting from it works, and dropping content that is present in it works.	2025-02-25 15:50:08 -04:00

17 commits