Commit graph

1724 commits

Author SHA1 Message Date
Joey Hess
6c2e84f6ec
Bump aws build dependency to 0.24.1
That's the version in Debian stable now. And this removes a lot of ifdefs.

Also I'm pretty sure a recent commit broke building with older versions of
aws, although that could be fixed with sufficent testing.
2025-08-13 15:32:39 -04:00
Joey Hess
3b1702e658
probe AWS datacenter
S3: When initremote is given the name of a bucket that already exists,
automatically set datacenter to the right value, rather than needing it to
be explicitly set.

This needs aws-0.23. But, initremote stores the datacenter value, so
a remote set up this way can be used with git-annex built with an older aws.

This is not done when signature=anonymous, because in that case,
using AWS.defaultRegion works fine for accessing buckets on other
datacenters.

It feels a bit round-about to need to do this probing. But without it,
the problem seems to be that, with a v4 signature, the location constraint
is included in the Authorization header. When that is the wrong location,
AWS S3 rejects it. I do wonder though if there is an easier way that I
am currently missing.

Sponsored-by: Dartmouth College's DANDI project
2025-08-13 15:23:31 -04:00
Joey Hess
fed5e00a18
fix default region reversion
Commit 215640096f caused the default
region for S3 to change to us-east-2. This was due to regionInfo having
an undocumented property that the first item in the list is for the
default region.

Avoid relying on regionInfo for defaultRegion.

Sponsored-by: Dartmouth College's DANDI project
2025-08-13 14:19:36 -04:00
Joey Hess
215640096f
S3: Default to signature=v4 when using an AWS endpoint
* S3: Default to signature=v4 when using an AWS endpoint, since some
  AWS regions need v4 and all support it. When host= is used to specify
  a different S3 host, the default remains signature=v2.
* webapp: Support setting up S3 buckets in regions that need v4
  signatures.

For the webapp, went ahead and added all current S3 regions
(except govcloud, which  is not usable by everyone).

Sponsored-by: Dartmouth College's DANDI project
2025-08-13 13:18:35 -04:00
Joey Hess
edc1d92059
document "anonymous" in ValueDesc 2025-08-13 12:51:27 -04:00
Joey Hess
d3fbda13e4
p2p --enable
p2p: Added --enable option, which can be used to enable P2P networks
provided by external commands git-annex-p2p-<netname>

Made git-annex p2p --enable tor behave the same as git-annex enable-tor,
to make tor a bit less of a special case. However, it canot be run as root,
since it cannot take the user id parameter.
2025-07-30 14:08:59 -04:00
Joey Hess
a6f8248465
add connProcess to P2PConnection
When using the new generic P2P transport to open an outgoing connection
to a peer, this will hold the pid of the git-annex-p2p-<netname>
command.

closeConnection simply waits for it. Rather than relying on garbage
collection of the closed handles to close it.

In Remote.Helper.Ssh, connProcess is set to Nothing, even though there
is a similar process being used there. That code stores the pid in
OpenConnection instead, and handles waiting for it itself. A bit ugly,
but not worth cleaning up at this point, maybe later.
2025-07-30 12:35:16 -04:00
Joey Hess
6a9e923c74
fix handling of linked worktrees on filesystems w/o symlinks
Fix bug in handling of linked worktrees on filesystems not supporting
symlinks, that caused annexed file content to be stored in the wrong
location inside the git directory, and also caused pointer files to not get
populated.

This parameterizes functions in Annex.Locations with a GitLocationMaker.
The uses of standardGitLocationMaker are in cases where the path returned
by a function should not change when in a linked worktree. For example,
gitAnnexLink uses standardGitLocationMaker because symlink targets should
always be to ".git/annex/objects" paths, even when in a linked worktree.
Hopefully I have gotten all uses of standardGitLocationMaker right.

This also assumes that all path construction to the annex directory
is done via the functions in Annex.Locations, and there is no other,
ad-hoc construction elsewhere. Thankfully, Annex.Locations has been around
since the beginning, and has been used consistently. I think.

---

In fixupUnusualRepos, when symlinks are supported, the .git file is replaced
with a symlink to the linked worktree git directory. And in that directory,
an "annex" symlink points to the main annex directory. In that case,
it's not necessary to set mainWorkTreePath. It would be ok to set it,
but not setting it in that case allows an optimisation of avoiding reading
the "commondir" file.

The change to make fixupUnusualRepos set mainWorkTreePath when the
repository is not initialized yet is done in case the initialization itself
writes to the annex directory. If that were the case, without setting
mainWorkTreePath, the annex symlink would not be set up yet, and so
it might have created the annex directory in the wrong place. Currently
that didn't happen, but now that mainWorkTreePath is available, using it
here avoids any such later problem.

---

This commit does not deal with the mess of a worktree that has
experienced this bug before. In particular, if `git-annex get` were
run in such a worktree, it would have stored the object files in the
linked worktree's git directory, rather than in the main git directory.
Such misplaced object files need to be dealt with; the plan is to make
git-annex fsck notice and fix them.

A worktree that has experienced this bug before will contain unpopulated
pointer files. Those may eventually get fixed up in regular usage of
git-annex, but git-annex fsck will also fix them up.

---

Finally, this has me pondering if all of git-annex's state files should
really be stored in one common place across all linked worktrees. Should
perhaps state files that are specific to the worktree be stored per-worktree?
That has not been the case when using git-annex on filesystems supporting
symlinks, but it *has* been the case on filesystems not supporting
symlinks. Perhaps this leads to some other buggy behavior in some cases.
Or perhaps to extra work being done.

For example, the keys database has an associated files table. Which depends
on the worktree. But reconcileStaged updates that table, so when git-annex
is used first in one worktree and then in another one, reconcileStaged will
update the table to reflect the current worktree. Which is extra work each
time a different worktree is used. But also, what if two git-annex
processes are running at the same time, in separate worktrees? Probably
this needs more thought and investigation.

So there is a risk that this commit exposes such buggy behavior in a
situation where it didn't happen before, due to the filesystem not
supporting symlinks. But, given how much this bug crippled using linked
worktrees in such a situation, I doubt that many people have been doing
that.
2025-07-14 13:20:39 -04:00
Joey Hess
73060eea51
annex.fastcopy
Added annex.fastcopy and remote.name.annex-fastcopy config setting. When
set, this allows the copy_file_range syscall to be used, which can eg allow
for server-side copies on NFS. (For fastest copying, also disable
annex.verify or remote.name.annex-verify.)

This is a simple implementation, that does not handle resuming as well as
it possibly could.

It can be used with both local git remotes (including on NFS), and
directory special remotes. Other types of remotes could in theory also
support it, so I've left the config documented as a general thing.
2025-06-03 15:01:38 -04:00
Joey Hess
2ee6c25c72
map: Fix buggy handling of remotes that are bare git repositories accessed via ssh
It was treating remote paths of a remote repo as if they were local paths,
and so trying to expand git directories and so forth on them. That led to
bad results, including a path like "foo.git" getting turned into
"foo.git.git"

Sponsored-by: Dartmouth College's OpenNeuro project
2025-04-22 15:21:01 -04:00
Joey Hess
9024d8e2d1
fixes for enabling and autoenabling mask special remotes 2025-04-11 13:18:23 -04:00
Joey Hess
f553cf411b
avoid cycles 2025-04-11 12:49:32 -04:00
Joey Hess
b81126ca48
does not support export or import 2025-04-11 12:42:49 -04:00
Joey Hess
90c502e675
mask special remote working
Still needs some handling of edge cases, cycles, etc.
2025-04-11 11:18:05 -04:00
Joey Hess
1313cc4d60
mask remotes, partial implementation
Everything implemented except for passing through to the masked remote.
Which should be trivial.
2025-04-10 13:10:07 -04:00
Joey Hess
e81fd72018
Added remote.name.annex-web-options config
Which is a per-remote version of the annex.web-options config.

Had to plumb RemoteGitConfig through to getUrlOptions. In cases where a
special remote does not use curl, there was no need to do that and I used
Nothing instead.

In the case of the addurl and importfeed commands, it seemed best to say
that running these commands is not using the web special remote per se,
so the config is not used for those commands.
2025-04-01 10:17:38 -04:00
Joey Hess
d06bb4b540
httpalso: Windows url fix 2025-03-26 11:42:58 -04:00
Joey Hess
e37bf6351f
avoid shadowing warning 2025-03-19 14:46:24 -04:00
Joey Hess
b158e067c0
avoid reloading trust log 2025-03-19 09:44:44 -04:00
Joey Hess
70cb93a66b
checkPresent of compute remote checks inputs are available
If an input file has been lost from all repositories, it is no longer
possible to compute the output. This will avoid dropping content that
was computed in such a situation, as well as making git-annex fsck --from
the compute remote do its usual thing when content has gone missing.

This implementation avoids recursing forever if there is a cycle,
which should not be possible anyway.

Note the use of RemoteStateHandle as a constructor here suggests that
this may not handle sameas remotes right, since usually a
RemoteStateHandle is constructed using the sameas uuid for a sameas
remote. That assumes a compute remote can even have or be a sameas remote.
Which doesn't seem to make sense, so I have not thought through what might
happen here in detail.
2025-03-18 14:13:13 -04:00
Joey Hess
bcfd554a0f
findcomputed: New command, displays information about computed files. 2025-03-18 12:55:48 -04:00
Joey Hess
5f269513af
buffer responses to compute programs in a TQueue
This avoids a potential problem where the program sends several INPUT
before reading responses, so flushing the respose to the pipe could
block. It's unlikely, but seemed worth making sure it can't happen.
2025-03-11 12:40:21 -04:00
Joey Hess
0ee644b417
close off newline injection attacks against compute special remote protocol 2025-03-11 12:04:58 -04:00
Joey Hess
5760a15c7c
avoid error on missing compute state in checkKey
This improves eg `git-annex move --to` a compute remote that does not
contain the key. Rather than erroring with "Missing compute state" when
it checks if the key is in the remote, it proceeds to trying to store to
it, which has a nice error message.
2025-03-11 11:49:47 -04:00
Joey Hess
0477a8d098
add INPUT-REQUIRED
Used by git-annex-compute-singularity to make addcomputed --fast work.

Also, simplified git-annex-compute-singularity; there is no need to hard
link the container into place. singularity does not care about the
extension of the container, so can just pass it the annex object file.
2025-03-11 11:46:31 -04:00
Joey Hess
e0b7653495
added git-annex-compute-singularity
And implemented SANDBOX, which it needs.
2025-03-10 16:41:26 -04:00
Joey Hess
657ff9a32e
compute protocol debugging 2025-03-10 15:14:59 -04:00
Joey Hess
9d9e34c187
compute: disallow output files that are not regular files
Use case where this came up is a compute program using singularity,
where the process inside the container will be allowed to write to the temp
directory, so could make eg a /etc/shadow symlink, which could then be
used to exfiltrate that from the system to wherever the annex object
might be pushed to.

It seemed better to fix this once in git-annex rather than in any such
compute program.
2025-03-10 12:55:03 -04:00
Joey Hess
2c6dce83de
make OUTPUT subdirs
Simplifies compute programs.
2025-03-07 14:57:12 -04:00
Joey Hess
81ce4264df
compute: add response to OUTPUT
This allows rejecting output filenames that are outside the repository,
and also handles converting eg "-foo" to "./-foo" to prevent a command
that it's passed to interpreting the output filename as a dashed option.
2025-03-07 14:47:34 -04:00
Joey Hess
c6c6e2632d
avoid unncessary git-annex branch changes for recompute and addcomputed 2025-03-06 12:41:30 -04:00
Joey Hess
ccc454a791
computation progress display 2025-03-05 13:46:06 -04:00
Joey Hess
4a4a614b0d
OsPath build fixes 2025-03-04 15:50:15 -04:00
Joey Hess
17ce1b4e7b
mark unused parameter
While unused, it seems to make sense to keep it, since it explains what
the function is doing.
2025-03-04 15:46:30 -04:00
Joey Hess
a2fc471e14
safer git sha object filename
Rather than use the filename provided by INPUT, which could come from user
input, and so could be something that looks like a dashed parameter,
use a .git/object/<sha> filename.

This avoids user input passing through INPUT and back out, with the file
path then passed to a command, which could do something unexpected with
a dashed parameter, or other special parameter.

Added a note in the design about being careful of passing user input to
commands. They still have to be careful of that in general, just not in
this case.
2025-03-04 14:54:13 -04:00
Joey Hess
1ee4d018f3
cycle detection 2025-03-04 14:06:55 -04:00
Joey Hess
51538fa0a8
improve error message when unable to get an input file
In this case, the compute program is run the same as if addcomputed --fast
were used, so it should succeed, without outputting a computed file.

computeInputsUnavailable is in ComputeState for simplicity, but it is
not serialized with the rest of the ComputeState.
2025-03-04 13:13:18 -04:00
Joey Hess
f4e0d6a043
update location log after getting input file from remote 2025-03-04 12:51:38 -04:00
Joey Hess
4b6fabae65
better wording
Avoids this contradiction:

	(Auto enabling special remote foo...)

	  Not enabling compute special remote c2 because [..]
2025-03-04 12:43:50 -04:00
Joey Hess
4e6324131d
compute remote: get input files from other remotes
This needed some refactoring to avoid cycles, since Remote.Compute
cannot import Remote.List. Instead, it uses Annex.remotes. Which must be
populated by something else, but we know it has been, because something
is using Remote.Compute, which it must have found in the remote list,
which populates that.

In Remote.Compute, keyPossibilities' is called with all loggedLocations,
without the trustExclude DeadTrusted that keyLocations does. There is
another cycle there. This may be a problem if a dead repository is still
a remote.

This is missing cycle prevention, and it's certianly possible to make 2
files in the compute remote co-depend on one-another. Hopefully not in a
real world situation, but it an attacker could certainly do it. Cycle
prevention will need to be added to this.
2025-03-04 11:06:58 -04:00
Joey Hess
b395bd4f56
move showOutput into compute remote 2025-03-04 10:02:33 -04:00
Joey Hess
52f51d065a
rename config to annex.security.allowed-compute-programs
And require for enable as well as autoenable.

It seemed asking for trouble for `git-annex enable foo` to use whatever
compute program is stored in the git config, without verifying that the
user wants that program to be used.

Note that it would be good to allow `git-annex enable foo program=...`
to be used without the program being in the git config. Not implemented yet
though.
2025-03-03 16:12:03 -04:00
Joey Hess
f32d2aecce
autoenable security for compute special remote
Added annex.security.autoenable-compute-programs and only allow
autoenabling special remotes that use compute programs on that list.

The reason this is needed is a user might have some compute programs
that are less safe to use than others. They might want to use an unsafe
one only with one repository, where they are the only committer or other
committers are trusted. They might be ok with others being used by any
repository, and if so they can add them to the list.

Another reason would be a user who has installed a compute program by
accident. Eg, it might be included with git-annex at some point, or
pulled in by some dependency. That user doesn't necessarily want that
compute program to be used in an autoenabled special remote.
2025-03-03 15:52:56 -04:00
Joey Hess
a0d6a6ea2a
support git files as input to computations
Using GIT keys, like are used when exporting git files to special
remotes. Except here the GIT key refers to a file checked into the git
repo.

Note that, since the compute remote uses catObject to get the content,
a symlink that is checked into git does not get followed. This is important
for security, because following a symlink and adding the content to the
repo as an annex object would allow exfiltrating content from outside
the repository.

Instead, the behavior with a symlink is to run the computation on the
symlink target. This may turn out to be confusing, and it might be worth
addcomputed checking if the file in git is a symlink and erroring out.
Or it could follow symlinks as long as the destination is a file in the
repisitory.
2025-03-03 12:09:25 -04:00
Joey Hess
63d73d8d1b
record VURL key hashes in addcomputed and recompute 2025-03-03 10:57:56 -04:00
Joey Hess
2bd64059f1
record VURL key hashes when getting from compute remote
Like when getting from the web special remote, when the output of the
computation has changed, record the new hash of the content as an
equivilant key for the VURL key.

Still needs to be done for addcomputed and recompute.
2025-02-27 16:21:42 -04:00
Joey Hess
d2091730e9
refactor 2025-02-27 16:17:42 -04:00
Joey Hess
e6ae5e8d56
many recompute improvements
I've lost track of them all, but it includes:

* Using the same key backend as was used in the original computation.
* Fixing bug that prevented updating the source file key in the compute
  state
* Handling --reproducible and --unreproducible.
* recompute --original of a file using VURL, when the result is
  different, but the key remains the same, makes the object file
  be updated with the new content
* Detecting some other ways the program behavior can change, just for
  completeness.
* Also adds --backend to addcomputed.
2025-02-27 15:18:27 -04:00
Joey Hess
1704b5e327
refactoring 2025-02-27 14:54:03 -04:00
Joey Hess
53d107ca47
refactor 2025-02-26 14:05:37 -04:00