git-annex

Author	SHA1	Message	Date
Joey Hess	640bc43c38	reject an insecure configuration A user might expect onlyencryptcreds=yes to do some useful encryption of the creds despite using encryption=shared. Prevent them from thinking they have somehow secured the creds in their repository in that case. Also reject onlyencryptcreds=yes encryption=none in case the user somehow thinks there is creds-only encryption going on in that case. Sponsored-by: Graham Spencer	2025-08-22 13:38:14 -04:00
Joey Hess	70b4220ddf	fix specialRemote confusion with tahoe tahoe: Fix bug that made initremote require an encryption= parameter, despite git-annex encryption not being used with this special remote, since tahoe handles encryption itself. The chunking parameters were also accepted and won't be any longer either. They were also not actually used. `c4ea3ca40a` was the commit. At that point specialRemote was being added to most remotes and I forgot tahoe doesn't need these parameters. Turns out that, when embedcreds=yes was used, it did not cause the introducer-furl and shared-convergence-secret to be encrypted, even though encryption= was specified. Which is only not a security hole because encryption= was not documented to work with the tahoe special remote at all! It might be nice to support onlyencryptcreds=yes with tahoe, and it would make sense to accept the encryption= parameter then, and only use it for encrypting the creds. That would take some work, since the encryption= parameter would need to be optional, and the usual encrypted special remote code couldn't be used. Sponsored-by: unqueued	2025-08-22 13:24:53 -04:00
Joey Hess	fe973a4833	error message typo	2025-08-22 12:59:06 -04:00
Joey Hess	d8127709e3	showOutput tahoe store and retrieve both output messages which cannot be shut up with --quiet. The messages go to stderr, which makes it hard to avoid displaying them without hiding problems. This kinda sucks. Using showOutput helps with output formatting, but with -J this output is still scrambled in with the progress output.	2025-08-22 12:54:43 -04:00
Joey Hess	98d4f07057	tahoe: Support tahoe-lafs command versions newer than 1.16 tahoe start was deprecated and removed in 2020. This feels like a very janky way to run a daemon, but it does work. Sponsored-by: k0ld	2025-08-22 12:35:53 -04:00
Joey Hess	bc18b11cb8	prevent changing onlyencryptcreds of existing remote That would break accessing data already stored in the remote, the same as changing encryption type would do. Sponsored-by: Jack Hill	2025-08-21 13:50:39 -04:00
Joey Hess	6b63fb7ea2	Don't allow the type of encryption of an existing special remote to be changed. eg, git-annex enableremote foo encryption=none will not remove encryption, and other encryption= settings don't change the type of encryption used. Either of which would render data stored in a special remote inaccessible. Probably fixes reversion introduced in `71f78fe45d`. That commit got rid of the hasEncryptionConfig check, which I think would have detected this before. I've not gone back to verify that. Sponsored-by: mycroft	2025-08-21 13:41:00 -04:00
Joey Hess	afff2bb47d	onlyencryptcreds=yes initremote: When onlyencryptcreds=yes is used along with embedcreds=yes, and encryption is enabled, only encrypt the embedded creds, without encrypting the content of the special remote. Useful for exporttree=yes/importtree=yes remotes. Sponsored-by: Joshua Antonishen	2025-08-20 15:14:01 -04:00
Joey Hess	c5bfbe07bc	remove now unused parameters	2025-08-20 14:23:03 -04:00
Joey Hess	6c2e84f6ec	Bump aws build dependency to 0.24.1 That's the version in Debian stable now. And this removes a lot of ifdefs. Also I'm pretty sure a recent commit broke building with older versions of aws, although that could be fixed with sufficent testing.	2025-08-13 15:32:39 -04:00
Joey Hess	3b1702e658	probe AWS datacenter S3: When initremote is given the name of a bucket that already exists, automatically set datacenter to the right value, rather than needing it to be explicitly set. This needs aws-0.23. But, initremote stores the datacenter value, so a remote set up this way can be used with git-annex built with an older aws. This is not done when signature=anonymous, because in that case, using AWS.defaultRegion works fine for accessing buckets on other datacenters. It feels a bit round-about to need to do this probing. But without it, the problem seems to be that, with a v4 signature, the location constraint is included in the Authorization header. When that is the wrong location, AWS S3 rejects it. I do wonder though if there is an easier way that I am currently missing. Sponsored-by: Dartmouth College's DANDI project	2025-08-13 15:23:31 -04:00
Joey Hess	fed5e00a18	fix default region reversion Commit `215640096f` caused the default region for S3 to change to us-east-2. This was due to regionInfo having an undocumented property that the first item in the list is for the default region. Avoid relying on regionInfo for defaultRegion. Sponsored-by: Dartmouth College's DANDI project	2025-08-13 14:19:36 -04:00
Joey Hess	215640096f	S3: Default to signature=v4 when using an AWS endpoint * S3: Default to signature=v4 when using an AWS endpoint, since some AWS regions need v4 and all support it. When host= is used to specify a different S3 host, the default remains signature=v2. * webapp: Support setting up S3 buckets in regions that need v4 signatures. For the webapp, went ahead and added all current S3 regions (except govcloud, which is not usable by everyone). Sponsored-by: Dartmouth College's DANDI project	2025-08-13 13:18:35 -04:00
Joey Hess	edc1d92059	document "anonymous" in ValueDesc	2025-08-13 12:51:27 -04:00
Joey Hess	d3fbda13e4	p2p --enable p2p: Added --enable option, which can be used to enable P2P networks provided by external commands git-annex-p2p-<netname> Made git-annex p2p --enable tor behave the same as git-annex enable-tor, to make tor a bit less of a special case. However, it canot be run as root, since it cannot take the user id parameter.	2025-07-30 14:08:59 -04:00
Joey Hess	a6f8248465	add connProcess to P2PConnection When using the new generic P2P transport to open an outgoing connection to a peer, this will hold the pid of the git-annex-p2p-<netname> command. closeConnection simply waits for it. Rather than relying on garbage collection of the closed handles to close it. In Remote.Helper.Ssh, connProcess is set to Nothing, even though there is a similar process being used there. That code stores the pid in OpenConnection instead, and handles waiting for it itself. A bit ugly, but not worth cleaning up at this point, maybe later.	2025-07-30 12:35:16 -04:00
Joey Hess	6a9e923c74	fix handling of linked worktrees on filesystems w/o symlinks Fix bug in handling of linked worktrees on filesystems not supporting symlinks, that caused annexed file content to be stored in the wrong location inside the git directory, and also caused pointer files to not get populated. This parameterizes functions in Annex.Locations with a GitLocationMaker. The uses of standardGitLocationMaker are in cases where the path returned by a function should not change when in a linked worktree. For example, gitAnnexLink uses standardGitLocationMaker because symlink targets should always be to ".git/annex/objects" paths, even when in a linked worktree. Hopefully I have gotten all uses of standardGitLocationMaker right. This also assumes that all path construction to the annex directory is done via the functions in Annex.Locations, and there is no other, ad-hoc construction elsewhere. Thankfully, Annex.Locations has been around since the beginning, and has been used consistently. I think. --- In fixupUnusualRepos, when symlinks are supported, the .git file is replaced with a symlink to the linked worktree git directory. And in that directory, an "annex" symlink points to the main annex directory. In that case, it's not necessary to set mainWorkTreePath. It would be ok to set it, but not setting it in that case allows an optimisation of avoiding reading the "commondir" file. The change to make fixupUnusualRepos set mainWorkTreePath when the repository is not initialized yet is done in case the initialization itself writes to the annex directory. If that were the case, without setting mainWorkTreePath, the annex symlink would not be set up yet, and so it might have created the annex directory in the wrong place. Currently that didn't happen, but now that mainWorkTreePath is available, using it here avoids any such later problem. --- This commit does not deal with the mess of a worktree that has experienced this bug before. In particular, if `git-annex get` were run in such a worktree, it would have stored the object files in the linked worktree's git directory, rather than in the main git directory. Such misplaced object files need to be dealt with; the plan is to make git-annex fsck notice and fix them. A worktree that has experienced this bug before will contain unpopulated pointer files. Those may eventually get fixed up in regular usage of git-annex, but git-annex fsck will also fix them up. --- Finally, this has me pondering if all of git-annex's state files should really be stored in one common place across all linked worktrees. Should perhaps state files that are specific to the worktree be stored per-worktree? That has not been the case when using git-annex on filesystems supporting symlinks, but it has been the case on filesystems not supporting symlinks. Perhaps this leads to some other buggy behavior in some cases. Or perhaps to extra work being done. For example, the keys database has an associated files table. Which depends on the worktree. But reconcileStaged updates that table, so when git-annex is used first in one worktree and then in another one, reconcileStaged will update the table to reflect the current worktree. Which is extra work each time a different worktree is used. But also, what if two git-annex processes are running at the same time, in separate worktrees? Probably this needs more thought and investigation. So there is a risk that this commit exposes such buggy behavior in a situation where it didn't happen before, due to the filesystem not supporting symlinks. But, given how much this bug crippled using linked worktrees in such a situation, I doubt that many people have been doing that.	2025-07-14 13:20:39 -04:00
Joey Hess	73060eea51	annex.fastcopy Added annex.fastcopy and remote.name.annex-fastcopy config setting. When set, this allows the copy_file_range syscall to be used, which can eg allow for server-side copies on NFS. (For fastest copying, also disable annex.verify or remote.name.annex-verify.) This is a simple implementation, that does not handle resuming as well as it possibly could. It can be used with both local git remotes (including on NFS), and directory special remotes. Other types of remotes could in theory also support it, so I've left the config documented as a general thing.	2025-06-03 15:01:38 -04:00
Joey Hess	2ee6c25c72	map: Fix buggy handling of remotes that are bare git repositories accessed via ssh It was treating remote paths of a remote repo as if they were local paths, and so trying to expand git directories and so forth on them. That led to bad results, including a path like "foo.git" getting turned into "foo.git.git" Sponsored-by: Dartmouth College's OpenNeuro project	2025-04-22 15:21:01 -04:00
Joey Hess	9024d8e2d1	fixes for enabling and autoenabling mask special remotes	2025-04-11 13:18:23 -04:00
Joey Hess	f553cf411b	avoid cycles	2025-04-11 12:49:32 -04:00
Joey Hess	b81126ca48	does not support export or import	2025-04-11 12:42:49 -04:00
Joey Hess	90c502e675	mask special remote working Still needs some handling of edge cases, cycles, etc.	2025-04-11 11:18:05 -04:00
Joey Hess	1313cc4d60	mask remotes, partial implementation Everything implemented except for passing through to the masked remote. Which should be trivial.	2025-04-10 13:10:07 -04:00
Joey Hess	e81fd72018	Added remote.name.annex-web-options config Which is a per-remote version of the annex.web-options config. Had to plumb RemoteGitConfig through to getUrlOptions. In cases where a special remote does not use curl, there was no need to do that and I used Nothing instead. In the case of the addurl and importfeed commands, it seemed best to say that running these commands is not using the web special remote per se, so the config is not used for those commands.	2025-04-01 10:17:38 -04:00
Joey Hess	d06bb4b540	httpalso: Windows url fix	2025-03-26 11:42:58 -04:00
Joey Hess	e37bf6351f	avoid shadowing warning	2025-03-19 14:46:24 -04:00
Joey Hess	b158e067c0	avoid reloading trust log	2025-03-19 09:44:44 -04:00
Joey Hess	70cb93a66b	checkPresent of compute remote checks inputs are available If an input file has been lost from all repositories, it is no longer possible to compute the output. This will avoid dropping content that was computed in such a situation, as well as making git-annex fsck --from the compute remote do its usual thing when content has gone missing. This implementation avoids recursing forever if there is a cycle, which should not be possible anyway. Note the use of RemoteStateHandle as a constructor here suggests that this may not handle sameas remotes right, since usually a RemoteStateHandle is constructed using the sameas uuid for a sameas remote. That assumes a compute remote can even have or be a sameas remote. Which doesn't seem to make sense, so I have not thought through what might happen here in detail.	2025-03-18 14:13:13 -04:00
Joey Hess	bcfd554a0f	findcomputed: New command, displays information about computed files.	2025-03-18 12:55:48 -04:00
Joey Hess	5f269513af	buffer responses to compute programs in a TQueue This avoids a potential problem where the program sends several INPUT before reading responses, so flushing the respose to the pipe could block. It's unlikely, but seemed worth making sure it can't happen.	2025-03-11 12:40:21 -04:00
Joey Hess	0ee644b417	close off newline injection attacks against compute special remote protocol	2025-03-11 12:04:58 -04:00
Joey Hess	5760a15c7c	avoid error on missing compute state in checkKey This improves eg `git-annex move --to` a compute remote that does not contain the key. Rather than erroring with "Missing compute state" when it checks if the key is in the remote, it proceeds to trying to store to it, which has a nice error message.	2025-03-11 11:49:47 -04:00
Joey Hess	0477a8d098	add INPUT-REQUIRED Used by git-annex-compute-singularity to make addcomputed --fast work. Also, simplified git-annex-compute-singularity; there is no need to hard link the container into place. singularity does not care about the extension of the container, so can just pass it the annex object file.	2025-03-11 11:46:31 -04:00
Joey Hess	e0b7653495	added git-annex-compute-singularity And implemented SANDBOX, which it needs.	2025-03-10 16:41:26 -04:00
Joey Hess	657ff9a32e	compute protocol debugging	2025-03-10 15:14:59 -04:00
Joey Hess	9d9e34c187	compute: disallow output files that are not regular files Use case where this came up is a compute program using singularity, where the process inside the container will be allowed to write to the temp directory, so could make eg a /etc/shadow symlink, which could then be used to exfiltrate that from the system to wherever the annex object might be pushed to. It seemed better to fix this once in git-annex rather than in any such compute program.	2025-03-10 12:55:03 -04:00
Joey Hess	2c6dce83de	make OUTPUT subdirs Simplifies compute programs.	2025-03-07 14:57:12 -04:00
Joey Hess	81ce4264df	compute: add response to OUTPUT This allows rejecting output filenames that are outside the repository, and also handles converting eg "-foo" to "./-foo" to prevent a command that it's passed to interpreting the output filename as a dashed option.	2025-03-07 14:47:34 -04:00
Joey Hess	c6c6e2632d	avoid unncessary git-annex branch changes for recompute and addcomputed	2025-03-06 12:41:30 -04:00
Joey Hess	ccc454a791	computation progress display	2025-03-05 13:46:06 -04:00
Joey Hess	4a4a614b0d	OsPath build fixes	2025-03-04 15:50:15 -04:00
Joey Hess	17ce1b4e7b	mark unused parameter While unused, it seems to make sense to keep it, since it explains what the function is doing.	2025-03-04 15:46:30 -04:00
Joey Hess	a2fc471e14	safer git sha object filename Rather than use the filename provided by INPUT, which could come from user input, and so could be something that looks like a dashed parameter, use a .git/object/<sha> filename. This avoids user input passing through INPUT and back out, with the file path then passed to a command, which could do something unexpected with a dashed parameter, or other special parameter. Added a note in the design about being careful of passing user input to commands. They still have to be careful of that in general, just not in this case.	2025-03-04 14:54:13 -04:00
Joey Hess	1ee4d018f3	cycle detection	2025-03-04 14:06:55 -04:00
Joey Hess	51538fa0a8	improve error message when unable to get an input file In this case, the compute program is run the same as if addcomputed --fast were used, so it should succeed, without outputting a computed file. computeInputsUnavailable is in ComputeState for simplicity, but it is not serialized with the rest of the ComputeState.	2025-03-04 13:13:18 -04:00
Joey Hess	f4e0d6a043	update location log after getting input file from remote	2025-03-04 12:51:38 -04:00
Joey Hess	4b6fabae65	better wording Avoids this contradiction: (Auto enabling special remote foo...) Not enabling compute special remote c2 because [..]	2025-03-04 12:43:50 -04:00
Joey Hess	4e6324131d	compute remote: get input files from other remotes This needed some refactoring to avoid cycles, since Remote.Compute cannot import Remote.List. Instead, it uses Annex.remotes. Which must be populated by something else, but we know it has been, because something is using Remote.Compute, which it must have found in the remote list, which populates that. In Remote.Compute, keyPossibilities' is called with all loggedLocations, without the trustExclude DeadTrusted that keyLocations does. There is another cycle there. This may be a problem if a dead repository is still a remote. This is missing cycle prevention, and it's certianly possible to make 2 files in the compute remote co-depend on one-another. Hopefully not in a real world situation, but it an attacker could certainly do it. Cycle prevention will need to be added to this.	2025-03-04 11:06:58 -04:00
Joey Hess	b395bd4f56	move showOutput into compute remote	2025-03-04 10:02:33 -04:00

1 2 3 4 5 ...

1,733 commits