git-annex

Author	SHA1	Message	Date
Joey Hess	8b5fc94d50	add optional object file location to storeKey This will be used by the next commit to simplify the proxy.	2024-07-01 10:42:27 -04:00
Joey Hess	f833a28844	Merge branch 'master' into proxy-specialremotes	2024-06-30 11:16:20 -04:00
Joey Hess	3d646703ee	list proxied remotes and cluster gateways in git-annex info Wanted to also list a cluster's nodes when showing info for the cluster, but that's hard because it needs getting the name of the proxying remote, which is some prefix of the cluster's name, but if the names contain dashes there's no good way to know which prefix it is.	2024-06-30 11:14:13 -04:00
Joey Hess	158d7bc933	fix handling of ERROR in response to REMOVE This allows an error message from a proxied special remote to be displayed to the client. In the case where removal from several nodes of a cluster fails, there can be several errors. What to do? I decided to only show the first error to the user. Probably in this case the user is not in a position to do anything about an error message, so best keep it simple. If the problem with the first node is fixed, they'll see the error from the next node.	2024-06-28 14:10:25 -04:00
Joey Hess	a6ea057f6b	fix handling of ERROR in response to CHECKPRESENT That error is now rethrown on the client, so it will be displayed. For example: $ git-annex fsck x --fast --from AMS-dir fsck x (special remote reports: directory /home/joey/tmp/bench2/dir is not accessible) failed No protocol version check is needed. Because in order to talk to a proxied special remote, the client has to be running the upcoming git-annex release. Which has this fix in it.	2024-06-28 13:46:27 -04:00
Joey Hess	c3a785204e	support a P2PConnection that uses TMVars rather than Handles This will allow having an internal thread speaking P2P protocol, which will be needed to support proxying to external special remotes. No serialization is done on the internal P2P protocol of course. When a ByteString is being exchanged, it may or may not be exactly the length indicated by DATA. While that has to be carefully managed for the serialized P2P protocol, here it would require buffering the whole lazy bytestring in memory to check its length when sending, so it's better to do length checks on the receiving side.	2024-06-28 11:22:29 -04:00
Joey Hess	20ef1262df	give proxied cluster nodes a higher cost than the cluster gateway This makes eg git-annex get default to using the cluster rather than an arbitrary node, which is better UI. The actual cost of accessing a proxied node vs using the cluster is basically the same. But using the cluster allows smarter load-balancing to be done on the cluster.	2024-06-27 15:21:03 -04:00
Joey Hess	0ef4183b00	Merge branch 'master' into proxy	2024-06-27 12:41:57 -04:00
Joey Hess	19137ae780	avoid unfiltered debugging from git-annex-shell When --debugfilter or annex.debugfilter is set, avoid propigating debug output from git-annex-shell, since it cannot be filtered. It would be possible to pass --debugfilter on to git-annex-shell, but it only started accepting that option in 2022. So it would break interop with older versions.	2024-06-27 12:37:25 -04:00
Joey Hess	3dad9446ce	distributed cluster cycle prevention Added BYPASS to P2P protocol, and use it to avoid cycling between cluster gateways. Distributed clusters are working well now!	2024-06-27 12:20:22 -04:00
Joey Hess	07e899c9d3	git-annex-shell: proxy nodes located beyond remote cluster gateways Walking a tightrope between security and convenience here, because git-annex-shell needs to only proxy for things when there has been an explicit, local action to configure them. In this case, the user has to have run `git-annex extendcluster`, which now sets annex-cluster-gateway on the remote. Note that any repositories that the gateway is recorded to proxy for will be proxied onward. This is not limited to cluster nodes, because checking the node log would not add any security; someone could add any uuid to it. The gateway of course then does its own checking to determine if it will allow proxying for the remote.	2024-06-26 12:56:16 -04:00
Joey Hess	202ea3ff2a	don't sync with cluster nodes by default Avoid `git-annex sync --content` etc from operating on cluster nodes by default since syncing with a cluster implicitly syncs with its nodes. This avoids a lot of unncessary work when a cluster has a lot of nodes just in checking if each node's preferred content is satisfied. And it avoids content being sent to nodes individually, so instead syncing with clusters always fanout uploads to nodes. The downside is that there are situations where a cluster's preferred content settings can be met, but those of its nodes are not. Or where a node does not contain a key, but the cluster does, and there are not enough copies of the key yet, so it would be desirable the send it there. I think that's an acceptable tradeoff. These kind of situations are ones where the cluster itself should probably be responsible for copying content to the node. Which it can do much less expensively than a client can. Part of the balanced preferred content design that I will be working on in a couple of months involves rebalancing clusters, so I expect to revisit this. The use of annex-sync config does allow running git-annex sync with a specific node, or nodes, and it will sync with it. And it's also possible to set annex-sync git configs to make it sync with a node by default. (Although that will require setting up an explicit git remote for the node rather than relying on the proxied remote.) Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster due to a cycle. And the Annex.Startup load of clusters happens too late for Remote.Git to use that. This does mean one redundant load of the cluster log, though only when there is a proxy.	2024-06-25 10:24:38 -04:00
Joey Hess	b8016eeb65	add annex-proxied This makes git-annex sync and similar not treat proxied remotes as git syncable remotes. Also, display in git-annex info remote when the remote is proxied.	2024-06-24 10:16:59 -04:00
Joey Hess	0c111fc96a	fix git-annex sync --content with proxied remotes Loading the remote list a second time was removing all proxied remotes. That happened because setting up the proxied remote added some config fields to the in-memory git config, and on the second load, it saw those configs and decided not to overwrite them with the proxy. Now on the second load, that still happens. But now, the proxied git configs are used to generate a remote same as if those configs were all set. The reason that didn't happen before was twofold, the gitremotes cache was not dropped, and the remote's url field was not set correctly. The problem with the remote's url field is that while it was marked as proxy inherited, all other proxy inherited fields are annex- configs. And the code to inherit didn't work for the url field. Now it all works, but git-annex sync is left running git push/pull on the proxied remote, which doesn't work. That still needs to be fixed.	2024-06-24 09:45:51 -04:00
Joey Hess	5b332a87be	dropping from clusters Dropping from a cluster drops from every node of the cluster. Including nodes that the cluster does not think have the content. This is different from GET and CHECKPRESENT, which do trust the cluster's location log. The difference is that removing from a cluster should make 100% the content is gone from every node. So doing extra work is ok. Compare with CHECKPRESENT where checking every node could make it very expensive, and the worst that can happen in a false negative is extra work being done. Extended the P2P protocol with FAILURE-PLUS to handle the case where a drop from one node succeeds, but a drop from another node fails. In that case the entire cluster drop has failed. Note that SUCCESS-PLUS is returned when dropping from a proxied remote that is not a cluster, when the protocol version supports it. This is because P2P.Proxy does not know when it's proxying for a single node cluster vs for a remote that is not a cluster.	2024-06-23 09:43:40 -04:00
Joey Hess	a6a04b7e5e	avoid storing SUCCESS-PLUS uuid when it is the remote uuid This is slightly belt and suspenders, but nothing guarantees that the peer avoids including its uuid in the SUCCESS-PLUS list as it's supposed to. And while it probably doesn't matter if the location log is updated redundantly, let's not find out.	2024-06-23 08:21:11 -04:00
Joey Hess	6eac3112e5	be quiet when reading cluster and proxy information at startup I had a transfer of 3 files fail like this: git-annex: transferrer protocol error: "(recording state in git...)" The remote had stalldetection enabled, although I didn't see it stall. So git-annex transferrer would have been started up. I guess that one of these new git-annex branch reads, that happens early, caused that message due to perhaps an uncommitted git-annex branch change. Since the transferrer speaks a protocol over stdout, it needs to be prevented from outputting other messages to stdout. Interestingly, startupAnnex is run after prepRunCommand, so if a command requests quiet output it would already be quiet. But the transferrer does not, instead it calls Annex.setOutput SerializedOutput in its start action.	2024-06-18 21:31:32 -04:00
Joey Hess	f18740699e	P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS is complete, when a PUT stores to additional repositories than the expected on, the location log is updated with the additional UUIDs that contain the content. Started implementing PUT fanout to multiple remotes for clusters. It is untested, and I fear fencepost errors in the relative offset calculations. And it is missing proxying for the protocol after DATA.	2024-06-18 16:21:40 -04:00
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00
Joey Hess	a4c9d4424c	remove Logs.Presence imports When imported along with Logs.Location, it can be an unused import and it won't warn, due to reexports. The point if this is really to show that Logs.Presence is not widely used, outside Logs/	2024-06-14 17:27:34 -04:00
Joey Hess	e224b99f36	whitespace	2024-06-12 13:24:25 -04:00
Joey Hess	f98605bce7	a local git remote cannot proxy Prevent listProxied from listing anything when the proxy remote's url is a local directory. Proxying does not work in that situation, because the proxied remotes have the same url, and so git-annex-shell is not run when accessing them, instead the proxy remote is accessed directly. I don't think there is any good way to support this. Even if the instantiated git repos for the proxied remotes somehow used an url that caused it to use git-annex-shell to access them, planned features like `git-annex copy --to proxy` accepting a key and sending it on to nodes behind the proxy would not work, since git-annex-shell is not used to access the proxy. So it would need to use something to access the proxy that causes git-annex-shell to be run and speaks P2P protocol over it. And we have that. It's a ssh connection to localhost. Of course, it would be possible to take ssh out of that mix, and swap in something that does not have encryption overhead and authentication complications, but otherwise behaves the same as ssh. And if the user wants to do that, GIT_SSH does exist.	2024-06-12 10:16:04 -04:00
Joey Hess	c6e0710281	proxying to local git remotes works This just happened to work correctly. Rather surprisingly. It turns out that openP2PSshConnection actually also supports local git remotes, by just running git-annex-shell with the path to the remote. Renamed "P2PSsh" to "P2PShell" to make this clear.	2024-06-12 10:10:11 -04:00
Joey Hess	09b5e53f49	set annex.uuid in proxy's Repo getRepoUUID looks at that, and was seeing the annex.uuid of the proxy. Which caused it to unncessarily set the git config. Probably also would have led to other problems.	2024-06-11 13:40:50 -04:00
Joey Hess	501d65eeab	started implementing git-annex-shell proxy So far, it negotiates VERSION with both parties. This is a tricky dance. Untested.	2024-06-10 18:01:36 -04:00
Joey Hess	649b87bedd	Merge branch 'master' into proxy	2024-06-10 14:26:18 -04:00
Joey Hess	9a8391078a	git-annex-shell: block relay requests connRepo is only used when relaying git upload-pack and receive-pack. That's only supposed to be used when git-annex-remotedaemon is serving git-remote-tor-annex connections over tor. But, it was always set, and so could be used in other places possibly. Fixed by making connRepo optional in the P2P protocol interface. In Command.EnableTor, it's not needed, because it only speaks the protocol in order to check that it's able to connect back to itself via the hidden service. So changed that to pass Nothing rather than the git repo. In Remote.Helper.Ssh, it's connecting to git-annex-shell p2pstdio, so is making the requests, so will never need connRepo. In git-annex-shell p2pstdio, it was accepting git upload-pack and receive-pack requests over the P2P protocol, even though nothing sent them. This is arguably a security hole, particularly if the user has set environment variables like GIT_ANNEX_SHELL_LIMITED to prevent git push/pull via git-annex-shell.	2024-06-10 14:16:27 -04:00
Joey Hess	4b940c92bb	proxied remotes working on client side Got the right git config settings inherited now. Note that the url config is not passed on to git, so it won't be able to access the proxied remote. That would need some kind of git-remote-annex but for proxied remotes anyway. Unsure yet if that will be needed.	2024-06-07 12:11:46 -04:00
Joey Hess	b43c835def	instantiate remotes that are behind a proxy remote Untested, but this should be close to working. The proxied remotes have the same url but a different uuid. When talking to current git-annex-shell, it will fail due to a uuid mismatch. Once it supports proxies, it will know that the presented uuid is for a remote that it proxies for. The check for any git config settings for a remote with the same name as the proxied remote is there for several reasons. One is security: Writing a name to the proxy log should not cause changes to how an existing, configured git remote operates in a different clone of the repo. It's possible that the user has been using a proxied remote, and decides to set a git config for it. We can't tell the difference between that scenario and an evil remote trying to eg, intercept a file upload by replacing their remote with a proxied remote. Also, if the user sets some git config, does it override the config inherited from the proxy remote? Seems a difficult question. Luckily, the above means we don't need to think through it. This does mean though, that in order for a user to change the config of a proxy remote, they have to manually set its annex-uuid and url, as well as the config they want to change. They may also have to set any of the inherited configs that they were relying on.	2024-06-06 17:15:32 -04:00
Joey Hess	f3f40e03b4	clarify comment Since remotes not being available is a thing, this was confusing.	2024-06-04 14:29:24 -04:00
Joey Hess	e64add7cdf	git-remote-annex: support importrree=yes remotes When exporttree=yes is also set. Probably it would also be possible to support ones with only importtree=yes, by enabling exporttree=yes for the remote only when using git-remote-annex, but let's keep this simple... I'm not sure what gets recorded in .git/annex/ state differently in the two cases that might cause a problem when doing that. Note that the full annex:: urls generated and displayed for such a remote omit the importree=yes. Which is ok, cloning from such an url uses an exporttree=remote, but the git-annex branch doesn't get written by this program, so once the real config is available from the git-annex branch, it will still function as an importree=yes remote.	2024-05-27 12:35:42 -04:00
Joey Hess	19418e81ee	git-remote-annex: Display full url when using remote with the shorthand url	2024-05-24 17:15:31 -04:00
Joey Hess	58301e40d2	sync with special remotes with an annex:: url Check explicitly for an annex:: url, not just any url. While no built-in special remotes set an url, except ones that can be synced with, it seems possible that some external special remote sets an url for its own use, but did not expect it to be used by git-annex sync et al. The assistant also syncs with them.	2024-05-24 14:57:29 -04:00
Joey Hess	6f1039900d	prevent using git-remote-annex with unsuitable special remote configs I hope to support importtree=yes eventually, but it does not currently work. Added remote.<name>.allow-encrypted-gitrepo that needs to be set to allow using it with encrypted git repos. Note that even encryption=pubkey uses a cipher stored in the git repo to encrypt the keys stored in the remote. While it would be possible to not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using encryption=pubkey, it doesn't currently work, and that would be a complication that I doubt is worth it.	2024-05-14 13:52:20 -04:00
Joey Hess	ff5193c6ad	Merge branch 'master' into git-remote-annex	2024-05-10 14:20:36 -04:00
Joey Hess	a89e8f6bad	skip remotes with an annex:: url These remotes are not regular git remotes, they are special remotes that git uses git-remote-annex to access. Sponsored-by: Jack Hill on Patreon	2024-05-07 15:02:20 -04:00
Yaroslav Halchenko	87e2ae2014	run codespell throughout fixing typos automagically === Do not change lines below === { "chain": [], "cmd": "codespell -w", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^	2024-05-01 15:46:21 -04:00
Joey Hess	d2709c6862	avoid accepting externaltype= and readonly= parameters for rclone I think readonly= doesn't make sense here? externaltype= certianly doesn't.	2024-04-17 15:41:55 -04:00
Joey Hess	d372553540	rclone special remote Added rclone special remote, which can be used without needing to install the git-annex-remote-rclone program. This needs a new version of rclone, which supports "rclone gitannex". This is implemented as a variant of an external special remote, that runs "rclone gitannex" instead of the usual git-annex-remote- command. Parameterized Remote.External to support that. Sponsored-by: Luke T. Shumaker on Patreon	2024-04-17 15:20:37 -04:00
Joey Hess	c64a73c7ea	startExternalAddonProcess add parameters Not used yet but intended to support eg running "rclone gitannex"	2024-04-17 13:09:10 -04:00
Joey Hess	631a3ca05d	typo fix	2024-04-17 12:59:22 -04:00
Joey Hess	7cef5e8f35	export tree: avoid confusing output about renaming files When a file in the export is renamed, and the remote's renameExport returned Nothing, renaming to the temp file would first say it was renaming, and appear to succeed, but actually what it did was delete the file. Then renaming from the temp file would not do anything, since the temp file is not present on the remote. This appeared as if a file got renamed to a temp file and left there. Note that exporttree=yes importree=yes remotes have their usual renameExport replaced with one that returns Nothing. (For reasons explained in Remote.Helper.ExportImport.) So this happened even with remotes that support renameExport. Fix by letting renameExport = Nothing when it's not supported at all. This avoids displaying the rename. Sponsored-by: Graham Spencer on Patreon	2024-03-09 13:50:26 -04:00
Joey Hess	e7652b0997	implement URL to VURL migration This needs the content to be present in order to hash it. But it's not possible for a module used by Backend.URL to call inAnnex because that would entail a dependency loop. So instead, rely on the fact that Command.Migrate calls inAnnex before performing a migration. But, Command.ExamineKey calls fastMigrate and the key may or may not exist, and it's not wanting to actually perform a migration in any case. To handle that, had to add an additional value to fastMigrate to indicate whether the content is inAnnex. Factored generateEquivilantKey out of Remote.Web. Note that migrateFromURLToVURL hardcodes use of the SHA256E backend. It would have been difficult not to, given all the dependency loop issues. But --backend and annex.backend are used to tell git-annex migrate to use VURL in any case, so there's no config knob that the user could expect to configure that. Sponsored-by: Brock Spratlen on Patreon	2024-03-01 16:42:02 -04:00
Joey Hess	1b0de3021a	avoid double checksum when downloading VURL from web for 1st time Sponsored-by: Jack Hill on Patreon	2024-03-01 13:44:40 -04:00
Joey Hess	cc17ac423b	implement isCryptographicallySecureKey for VURL Considerable difficulty to work around an import cycle. Had to move the list of backends (except for VURL) to Backend.Variety to VURL could use it. Sponsored-by: Kevin Mueller on Patreon	2024-02-29 17:26:35 -04:00
Joey Hess	e7b7ea78af	lift isCryptographicallySecure to Annex Needed for VURL backend. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-02-29 16:14:13 -04:00
Joey Hess	55bf01b788	add equivilant key log for VURL keys When downloading a VURL from the web, make sure that the equivilant key log is populated. Unfortunately, this does not hash the content while it's being downloaded from the web. There is not an interface in Backend currently for incrementally hash generation, only for incremental verification of an existing hash. So this might add a noticiable delay, and it has to show a "(checksum...") message. This could stand to be improved. But, that separate hashing step only has to happen on the first download of new content from the web. Once the hash is known, the VURL key can have its hash verified incrementally while downloading except when the content in the web has changed. (Doesn't happen yet because verifyKeyContentIncrementally is not implemented yet for VURL keys.) Note that the equivilant key log file is formatted as a presence log. This adds a tiny bit of overhead (eg "1 ") per line over just listing the urls. The reason I chose to use that format is it seems possible that there will need to be a way to remove an equivilant key at some point in the future. I don't know why that would be necessary, but it seemed wise to allow for the possibility. Downloads of VURL keys from other special remotes that claim urls, like bittorrent for example, does not popilate the equivilant key log. So for now, no checksum verification will be done for those. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-02-29 16:01:49 -04:00
Joey Hess	0f7143d226	support VURL backend Not yet implemented is recording hashes on download from web and verifying hashes. addurl --verifiable option added with -V short option because I expect a lot of people will want to use this. It seems likely that --verifiable will become the default eventually, and possibly rather soon. While old git-annex versions don't support VURL, that doesn't prevent using them with keys that use VURL. Of course, they won't verify the content on transfer, and fsck will warn that it doesn't know about VURL. So there's not much problem with starting to use VURL even when interoperating with old versions. Sponsored-by: Joshua Antonishen on Patreon	2024-02-29 13:48:51 -04:00
Joey Hess	20567e605a	add directional stalldetection and bwlimit configs Sponsored-by: Dartmouth College's DANDI project	2024-01-19 15:27:53 -04:00
Joey Hess	c02df79248	use watchFileSize in Remote.External.retrieveKeyFile external: Monitor file size when getting content from external special remotes and use that to update the progress meter, in case the external special remote program does not report progress. This relies on `703a70cafa` to prevent ever running the meter backwards. Sponsored-by: Dartmouth College's DANDI project	2024-01-19 14:34:30 -04:00

1 2 3 4 5 ...

1,591 commits