Commit graph

1690 commits

Author SHA1 Message Date
Joey Hess
a2fc471e14
safer git sha object filename
Rather than use the filename provided by INPUT, which could come from user
input, and so could be something that looks like a dashed parameter,
use a .git/object/<sha> filename.

This avoids user input passing through INPUT and back out, with the file
path then passed to a command, which could do something unexpected with
a dashed parameter, or other special parameter.

Added a note in the design about being careful of passing user input to
commands. They still have to be careful of that in general, just not in
this case.
2025-03-04 14:54:13 -04:00
Joey Hess
1ee4d018f3
cycle detection 2025-03-04 14:06:55 -04:00
Joey Hess
51538fa0a8
improve error message when unable to get an input file
In this case, the compute program is run the same as if addcomputed --fast
were used, so it should succeed, without outputting a computed file.

computeInputsUnavailable is in ComputeState for simplicity, but it is
not serialized with the rest of the ComputeState.
2025-03-04 13:13:18 -04:00
Joey Hess
f4e0d6a043
update location log after getting input file from remote 2025-03-04 12:51:38 -04:00
Joey Hess
4b6fabae65
better wording
Avoids this contradiction:

	(Auto enabling special remote foo...)

	  Not enabling compute special remote c2 because [..]
2025-03-04 12:43:50 -04:00
Joey Hess
4e6324131d
compute remote: get input files from other remotes
This needed some refactoring to avoid cycles, since Remote.Compute
cannot import Remote.List. Instead, it uses Annex.remotes. Which must be
populated by something else, but we know it has been, because something
is using Remote.Compute, which it must have found in the remote list,
which populates that.

In Remote.Compute, keyPossibilities' is called with all loggedLocations,
without the trustExclude DeadTrusted that keyLocations does. There is
another cycle there. This may be a problem if a dead repository is still
a remote.

This is missing cycle prevention, and it's certianly possible to make 2
files in the compute remote co-depend on one-another. Hopefully not in a
real world situation, but it an attacker could certainly do it. Cycle
prevention will need to be added to this.
2025-03-04 11:06:58 -04:00
Joey Hess
b395bd4f56
move showOutput into compute remote 2025-03-04 10:02:33 -04:00
Joey Hess
52f51d065a
rename config to annex.security.allowed-compute-programs
And require for enable as well as autoenable.

It seemed asking for trouble for `git-annex enable foo` to use whatever
compute program is stored in the git config, without verifying that the
user wants that program to be used.

Note that it would be good to allow `git-annex enable foo program=...`
to be used without the program being in the git config. Not implemented yet
though.
2025-03-03 16:12:03 -04:00
Joey Hess
f32d2aecce
autoenable security for compute special remote
Added annex.security.autoenable-compute-programs and only allow
autoenabling special remotes that use compute programs on that list.

The reason this is needed is a user might have some compute programs
that are less safe to use than others. They might want to use an unsafe
one only with one repository, where they are the only committer or other
committers are trusted. They might be ok with others being used by any
repository, and if so they can add them to the list.

Another reason would be a user who has installed a compute program by
accident. Eg, it might be included with git-annex at some point, or
pulled in by some dependency. That user doesn't necessarily want that
compute program to be used in an autoenabled special remote.
2025-03-03 15:52:56 -04:00
Joey Hess
a0d6a6ea2a
support git files as input to computations
Using GIT keys, like are used when exporting git files to special
remotes. Except here the GIT key refers to a file checked into the git
repo.

Note that, since the compute remote uses catObject to get the content,
a symlink that is checked into git does not get followed. This is important
for security, because following a symlink and adding the content to the
repo as an annex object would allow exfiltrating content from outside
the repository.

Instead, the behavior with a symlink is to run the computation on the
symlink target. This may turn out to be confusing, and it might be worth
addcomputed checking if the file in git is a symlink and erroring out.
Or it could follow symlinks as long as the destination is a file in the
repisitory.
2025-03-03 12:09:25 -04:00
Joey Hess
63d73d8d1b
record VURL key hashes in addcomputed and recompute 2025-03-03 10:57:56 -04:00
Joey Hess
2bd64059f1
record VURL key hashes when getting from compute remote
Like when getting from the web special remote, when the output of the
computation has changed, record the new hash of the content as an
equivilant key for the VURL key.

Still needs to be done for addcomputed and recompute.
2025-02-27 16:21:42 -04:00
Joey Hess
d2091730e9
refactor 2025-02-27 16:17:42 -04:00
Joey Hess
e6ae5e8d56
many recompute improvements
I've lost track of them all, but it includes:

* Using the same key backend as was used in the original computation.
* Fixing bug that prevented updating the source file key in the compute
  state
* Handling --reproducible and --unreproducible.
* recompute --original of a file using VURL, when the result is
  different, but the key remains the same, makes the object file
  be updated with the new content
* Detecting some other ways the program behavior can change, just for
  completeness.
* Also adds --backend to addcomputed.
2025-02-27 15:18:27 -04:00
Joey Hess
1704b5e327
refactoring 2025-02-27 14:54:03 -04:00
Joey Hess
53d107ca47
refactor 2025-02-26 14:05:37 -04:00
Joey Hess
3bec89a3c3
started git-annex recompute
The perform action of this still needs work to do the right thing.
In particular, it currently behaves as if --others was always set.
And, it duplicates a lot of code from addcomputed.
2025-02-26 11:54:09 -04:00
Joey Hess
eed522a0f8
addcomputed inherits extra initremote parameters
This is limited because the remote config is a field/value map. So order
is not preserved, and when 2 parameters have the same field name, only
the last one will be passed.
2025-02-26 09:45:35 -04:00
Joey Hess
e702cb94ff
add compute remote uuid to compute state url
Otherwise, two different compute remotes that happen to take the same
input would use the same compute state url. Which seems wrong.
2025-02-25 18:44:40 -04:00
Joey Hess
16f529c05f
addcomputed --fast and --unreproducible working
For these, use VURL and URL keys, with an "annex-compute:" URI prefix.

These URL keys will look something like this:

	URL--annex-compute&cbar4,63pconvert,3-f4d3d72cf3f16ac9c3e9a8012bde4462

Generally it's too long so most of it gets md5summed. It's a little
ugly, but it's what fell out of the existing URL key generation
machinery. I did consider special casing to eg
"URL--annex-compute&c4d3d72cf3f16ac9c3e9a8012bde4462". But it seems at
least possibly useful that the name of the file that was computed is
visible and perhaps one or two words of the git-annex compute command
parameters.

Note that two different output files from the same computation will get
the same URL key. And these keys should remain stable.
2025-02-25 16:43:15 -04:00
Joey Hess
a154e91513
add git-annex addcomputed
Working pretty well. Mostly. But:

* Does not yet support inputs that are non-annexed files checked into git
* --fast is currently broken (will need something like VURL keys)
* --unreproducible still uses a checksumming backend, so drop and get
  again will likely fail (needs probably to use an URL key or something
  like one)

The compute special remote seems to work pretty well too. Eg,
getting from it works, and dropping content that is present in it works.
2025-02-25 15:50:08 -04:00
Joey Hess
2e1fe1620e
handle comutations in subdirs of the git repository
Eg, a computation might be run in "foo/" and refer to "../bar" as an
input or output.

So, the subdir is part of the computation state.

Also, prevent input or output of files that are outside the git
repository. Of course, the program can access any file on disk if it
wants to; this is just a guard against mistakes. And it may also be
useful if the program comunicates with something less trusted than it,
eg a container image, so input/output files communicated by that are not
the source of security problems.
2025-02-25 15:08:38 -04:00
Joey Hess
ce05a92ee7
add field desc 2025-02-24 16:41:02 -04:00
Joey Hess
40be51c98a
reimplement using new compute program interface 2025-02-24 16:01:03 -04:00
Joey Hess
e0b46ef7ad
compute special remote mostly implemented
Except for some of the hard parts: progress displays, incremental
verification, and getting inputs before running a computation.

Untested! In order to test this, git-annex addcomputed needs to be
implemented.
2025-02-21 15:02:53 -04:00
Joey Hess
4f1eea9061
remove unused adjustedBranchRefresh associated file parameter 2025-02-21 14:51:02 -04:00
Joey Hess
e897229088
wip 2025-02-20 17:23:15 -04:00
Joey Hess
c1b53dbbd0
wip 2025-02-20 13:27:47 -04:00
Joey Hess
9b42f5fe89
improve apiurl description 2025-02-18 14:46:10 -04:00
Joey Hess
d394f0b020
git-lfs apiurl parameter
git-lfs: Added an optional apiurl parameter.

This needs version 1.2.5 of the haskell git-lfs library to be used.
stack.yaml updated to use that.

Note that git-annex enableremote can be used to add apiurl= to an existing
git-lfs special remote. To allow unsetting the apiurl and instead use
the probed url, support enableremote with apiurl set to an empty string.

Sponsored-by: Luke T. Shumaker
2025-02-18 14:11:21 -04:00
Joey Hess
25e4f84e8f
push down OsPath into CopyFile 2025-02-12 13:11:27 -04:00
Joey Hess
a149336a59
OsPath transition Windows build fixes
This gets it building on Windows again, with 1 test suite failure
(addurl).

Sponsored-by: Kevin Mueller
2025-02-11 15:40:53 -04:00
Joey Hess
5eef09a3cc
more OsPath conversion (650/749)
Sponsored-by: Nicholas Golder-Manning
2025-02-07 17:03:31 -04:00
Joey Hess
0b9e9cbf70
more OsPath conversion (502/749)
Sponsored-by: Kevin Mueller on Patreon
2025-02-05 13:29:58 -04:00
Joey Hess
b28433072c
more OsPath conversion (475/749)
Sponsored-by: Nicholas Golder-Manning
2025-02-05 12:14:56 -04:00
Joey Hess
85fa337f61
OsPath conversion of Remote.Adb
Note that the additional use of System.FilePath.Posix likely fixes a
problem if this were used on windows. The AndroidPath uses / directory
separators. Before this, on windows, \ would have been used.

The change to newtype AndroidPath is only documentation.
2025-02-05 11:16:20 -04:00
Joey Hess
4dc904bbad
more OsPath conversion
Sponsored-by: Leon Schuermann
2025-02-04 16:09:47 -04:00
Joey Hess
54f0710fd2
more OsPath conversion (464/749)
Sponsored-by: unqueued
2025-02-04 13:35:17 -04:00
Joey Hess
5cc8d9d03b
replace removeLink with removeFile
removeFile calls unlink so removes anything not a directory. So these
are replaceable in order to convert to OsPath.
2025-02-02 14:16:58 -04:00
Joey Hess
71195cce13
more OsPath conversion
Sponsored-by: k0ld
2025-02-01 14:06:38 -04:00
Joey Hess
474cf3bc8b
more OsPath conversion
Sponsored-by: Brock Spratlen
2025-02-01 11:54:19 -04:00
Joey Hess
c69e57aede
more OsPath conversion
Sponsored-by: Jack Hill
2025-01-30 15:46:32 -04:00
Joey Hess
a9f3a31a52
more OsPath conversion
Sponsored-by: Kevin Mueller
2025-01-29 16:24:51 -04:00
Joey Hess
0376bc5ee0
more OsPath conversion
Sponsored-by: Luke T. Shumaker
2025-01-28 16:31:19 -04:00
Joey Hess
7da6f83582
Merge branch 'master' into ospath 2025-01-28 16:00:03 -04:00
Joey Hess
2b12f9f4b7
windows build fix
and a little more bonus RawFilePath conversion
2025-01-28 15:59:45 -04:00
Joey Hess
22c2451e26
more OsPath conversion
Sponsored-by: mycroft
2025-01-28 15:46:00 -04:00
Joey Hess
9b79f0f43d
use file-io for readFile/writeFile/appendFile on ByteStrings
These are all straightforward, and easy small performance wins.

Sponsored-by: Nicholas Golder-Manning
2025-01-22 14:30:25 -04:00
Joey Hess
793ddecd4b
use openTempFile from file-io
And follow-on changes.

Note that relatedTemplate was changed to operate on a RawFilePath, and
so when it counts the length, it is now the number of bytes, not the
number of code points. This will just make it truncate shorter strings
in some cases, the truncation is still unicode aware.

When not building with the OsPath flag, toOsPath . fromRawFilePath and
fromRawFilePath . fromOsPath do extra conversions back and forth between
String and ByteString. That overhead could be avoided, but that's the
non-optimised build mode, so didn't bother.

Sponsored-by: unqueued
2025-01-22 11:41:43 -04:00
Joey Hess
1ceece3108
RawFilePath conversion of System.Directory
By using System.Directory.OsPath, which takes and returns OsString,
which is a ShortByteString. So, things like dirContents currently have the
overhead of copying that to a ByteString, but that should be less than
the overhead of using Strings which often in turn were converted to
RawFilePaths.

Added Utility.OsString and the OsString build flag. That flag is turned
on in the stack.yaml, and will be turned on automatically by cabal when
built with new enough libraries. The stack.yaml change is a bit ugly,
and that could be reverted for now if it causes any problems.

Note that Utility.OsString.toOsString on windows is avoiding only a
check of encoding that is documented as being unlikely to fail. I don't
think it can fail in git-annex; if it could, git-annex didn't contain
such an encoding check before, so at worst that should be a wash.
2025-01-20 19:17:33 -04:00