git-annex

Author	SHA1	Message	Date
Joey Hess	62750f0102	shut down RemoteSides cleanly Before it just exited without actually shutting down the RemoteSides, when the client hung up.	2024-06-28 13:19:57 -04:00
Joey Hess	c3a785204e	support a P2PConnection that uses TMVars rather than Handles This will allow having an internal thread speaking P2P protocol, which will be needed to support proxying to external special remotes. No serialization is done on the internal P2P protocol of course. When a ByteString is being exchanged, it may or may not be exactly the length indicated by DATA. While that has to be carefully managed for the serialized P2P protocol, here it would require buffering the whole lazy bytestring in memory to check its length when sending, so it's better to do length checks on the receiving side.	2024-06-28 11:22:29 -04:00
Joey Hess	41a0817188	make extendcluster also updatecluster This avoids the user forgetting to do it and simplifies the documentation.	2024-06-27 15:34:45 -04:00
Joey Hess	cf59d7f92c	GET and CHECKPRESENT amoung lowest cost cluster nodes Before it was using a node that might have had a higher cost. Also threw in a random selection from amoung the low cost nodes. Of course this is a poor excuse for load balancing, but it's better than nothing. Most of the time...	2024-06-27 14:36:55 -04:00
Joey Hess	3dad9446ce	distributed cluster cycle prevention Added BYPASS to P2P protocol, and use it to avoid cycling between cluster gateways. Distributed clusters are working well now!	2024-06-27 12:20:22 -04:00
Joey Hess	923953c9fe	fix cycle prevention code	2024-06-26 13:21:51 -04:00
Joey Hess	07e899c9d3	git-annex-shell: proxy nodes located beyond remote cluster gateways Walking a tightrope between security and convenience here, because git-annex-shell needs to only proxy for things when there has been an explicit, local action to configure them. In this case, the user has to have run `git-annex extendcluster`, which now sets annex-cluster-gateway on the remote. Note that any repositories that the gateway is recorded to proxy for will be proxied onward. This is not limited to cluster nodes, because checking the node log would not add any security; someone could add any uuid to it. The gateway of course then does its own checking to determine if it will allow proxying for the remote.	2024-06-26 12:56:16 -04:00
Joey Hess	1ec2fecf3f	set up proxies for cluster nodes that are themselves proxied via a remote When there are multiple gateways to a cluster, this sets up proxying for nodes that are accessed via a remote gateway. Eg, when running in nyc and amsterdam is the remote gateway, and it has node1 and node2, this sets up proxying for amsterdam-node1 and amsterdam-node2. A client that has nyc as a remote will see proxied remotes nyc-amsterdam-node1 and nyc-amsterdam-node2.	2024-06-26 11:24:55 -04:00
Joey Hess	02bf3ddc3f	updatecluster: support multiple gateways Just look at the existing proxied remotes that correspond to already existing nodes of the cluster, and keep those nodes in the cluster. While adding any remotes of the local repo that are configured as cluster nodes. This allows removing cluster nodes from the local repo and updating, without it also removing nodes provided by other gateways.	2024-06-26 10:51:14 -04:00
Joey Hess	0b72b85df5	added git-annex extendcluster This works, but updatecluster does not work yet in multi-gateway clusters, nor do gateways relay to other gateways.	2024-06-26 10:26:54 -04:00
Joey Hess	cec2848e8a	support annex.jobs for clusters	2024-06-25 14:54:20 -04:00
Joey Hess	b8016eeb65	add annex-proxied This makes git-annex sync and similar not treat proxied remotes as git syncable remotes. Also, display in git-annex info remote when the remote is proxied.	2024-06-24 10:16:59 -04:00
Joey Hess	bf6b309917	remove attempt to avoid git syncing with instantiate proxied remotes It didn't work. Actually, sync was skipping those remotes due to a bug.	2024-06-24 09:35:24 -04:00
Joey Hess	d0aec8f623	always check numcopies when moving from cluster When the destination does not start with a copy, the cluster has one or more copies. If more, dropping would reduce the number of copies, so numcopies must be checked. Considered checking how many nodes of the cluster contain a copy. If only 1 node does, it could allow a move without checking numcopies. The problem with that, though, is that other nodes of the cluster could have copies that we don't know about. And dropping from a cluster tries to drop from all nodes, so will drop even from those. So any drop from a cluster can remove more than 1 copy.	2024-06-23 12:00:50 -04:00
Joey Hess	5b332a87be	dropping from clusters Dropping from a cluster drops from every node of the cluster. Including nodes that the cluster does not think have the content. This is different from GET and CHECKPRESENT, which do trust the cluster's location log. The difference is that removing from a cluster should make 100% the content is gone from every node. So doing extra work is ok. Compare with CHECKPRESENT where checking every node could make it very expensive, and the worst that can happen in a false negative is extra work being done. Extended the P2P protocol with FAILURE-PLUS to handle the case where a drop from one node succeeds, but a drop from another node fails. In that case the entire cluster drop has failed. Note that SUCCESS-PLUS is returned when dropping from a proxied remote that is not a cluster, when the protocol version supports it. This is because P2P.Proxy does not know when it's proxying for a single node cluster vs for a remote that is not a cluster.	2024-06-23 09:43:40 -04:00
Joey Hess	f18740699e	P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS is complete, when a PUT stores to additional repositories than the expected on, the location log is updated with the additional UUIDs that contain the content. Started implementing PUT fanout to multiple remotes for clusters. It is untested, and I fear fencepost errors in the relative offset calculations. And it is missing proxying for the protocol after DATA.	2024-06-18 16:21:40 -04:00
Joey Hess	d34326ab76	factor out Annex.Proxy	2024-06-18 10:51:37 -04:00
Joey Hess	f0d6114286	refactor cluster code into own module	2024-06-18 10:36:04 -04:00
Joey Hess	ef26470810	ProxySelector data type	2024-06-17 19:19:15 -04:00
Joey Hess	7a839a983a	preparing for cluster node selection Support selecting what remote to proxy for each top-level P2P protocol message. This only needs to be extended now to support fanout to multiple nodes for PUT and REMOVE, and with a remote that fails for LOCKCONTENT and UNLOCKCONTENT. But a good first step would be to implement CHECKPRESENT and GET for clusters. Both should select a node that actually does have the content. That will allow a cluster to work for GET even when location tracking is out of date.	2024-06-17 15:51:10 -04:00
Joey Hess	291280ced2	started on git-annex-shell cluster support Works down to P2P protocol. The question now is, how to handle protocol version negotiation for clusters? Connecting to each node to find their protocol versions and using the lowest would be too expensive with a lot of nodes. So it seems that the cluster needs to pick its own protocol version to use with the client. Then it can either negotiate that same version with the nodes when it comes time to use them, or it can translate between multiple protocol versions. That seems complicated. Thinking it would be ok to refuse to use a node if it is not able to negotiate the same protocol version with it as with the client. That will mean that sometimes need nodes to be upgraded when upgrading the cluster's proxy. But protocol versions rarely change.	2024-06-17 15:10:04 -04:00
Joey Hess	c7ad44e4d1	work toward supporting proxying to multiple remotes at once For eg, upload fanout. Delay connecting to a remote until it's needed. When there are many proxied remotes, it would not do for the proxy to connect to each of them on startup; that could take a long time.	2024-06-17 14:16:44 -04:00
Joey Hess	b72ccc6f0c	improve types	2024-06-17 12:44:08 -04:00
Joey Hess	64afbb0b93	don't count clusters as copies, continued Handled limitCopies, as well as everything using fromNumCopies and fromMinCopies. This should be everything, probably. Note that, git-annex info displays a count of repositories, which still includes cluster. I think that's ok. It would be possible to filter out clusters there, but to the user they're pretty much just another repository. The numcopies displayed by eg `git-annex info .` does not include clusters.	2024-06-16 15:14:53 -04:00
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00
Joey Hess	36c6d8da69	don't count clusters as copies Since the cluster UUID is inserted into the location log when the location log lists a node as containing content. Also avoid trying to lock content on cluster remotes. The cluster nodes are also proxied, so that content can be locked on individual nodes, and locking content on a cluster as a whole probably won't be implemented. And made git-annex whereis use numcopies machinery for displaying its count, so it won't count cluster UUIDs redundantly to nodes. Other commands, like git-annex info that also display numcopies information already used the numcopies machinery. There is more to be done, fromNumCopies is sometimes used to get a number that is compared with a list of UUIDs. And limitCopies doesn't use numcopies machinery.	2024-06-16 14:17:56 -04:00
Joey Hess	a4c9d4424c	remove Logs.Presence imports When imported along with Logs.Location, it can be an unused import and it won't warn, due to reexports. The point if this is really to show that Logs.Presence is not widely used, outside Logs/	2024-06-14 17:27:34 -04:00
Joey Hess	570ceffe8d	broke out initcluster One benefit of this is that a typo in annex-cluster-node config won't init a new cluster. Also it gets the cluster description set and is consistent with initremote.	2024-06-14 17:23:11 -04:00
Joey Hess	bfe7f488d9	fogot to add	2024-06-14 16:37:17 -04:00
Joey Hess	2028ad02b8	add clusters to proxy log Note that it's not defined what will happen if a cluster has the same name as a remote that has proxying enabled.	2024-06-14 15:03:42 -04:00
Joey Hess	bbf261487d	add git-annex updatecluster command Seems to work fine, making the right changes to the git-annex branch.	2024-06-14 15:02:01 -04:00
Joey Hess	46a1fcb3ea	avoid git syncing with instantiate proxied remotes These remotes have no url configured, so git pull and push will fail. git-annex sync --content etc can still sync with them otherwise. Also, avoid git syncing twice with the same url. This is for cases where a proxied remote has been manually configured and so does have a url. Or perhaps proxied remotes will get configured like that automatically later.	2024-06-12 15:10:03 -04:00
Joey Hess	dfdda95053	proxy updates location tracking information This does mean a redundant write to the git-annex branch. But, it means that two clients can be using the same proxy, and after one sends a file to a proxied remote, the other only has to pull from the proxy to learn about that. It does not need to pull from every remote behind the proxy (which it couldn't do anyway as git repo access is not currently proxied). Anyway, the overhead of this in git-annex branch writes is no worse than eg, sending a file to a repository where git-annex assistant is running, which then sends the file on to a remote, and updates the git-annex branch then. Indeed, when the assistant also drops the local copy, that results in more writes to the git-annex branch.	2024-06-12 11:37:14 -04:00
Joey Hess	c6e0710281	proxying to local git remotes works This just happened to work correctly. Rather surprisingly. It turns out that openP2PSshConnection actually also supports local git remotes, by just running git-annex-shell with the path to the remote. Renamed "P2PSsh" to "P2PShell" to make this clear.	2024-06-12 10:10:11 -04:00
Joey Hess	58d8ba5a4f	implement simple proxy actions (untested) Still need to implement GET and PUT, and will implement CONNECT and NOTIFYCHANGE for completeness. All ServerMode checking is implemented for the proxy. There are two possible approaches for how the proxy sends back messages from the remote to the client. One would be to have a background thread that reads messages and sends them back as they come in. The other, which is being implemented so far, is to read messages from the remote at points where it is expected to send them, and relay back to the client before reading the next message from the client. At this point, I'm unsure which approach would be better. The need for proxynoresponse to be used by UNLOCKCONTENT, for example, builds protocol knowledge into the proxy which it would not need with the other method.	2024-06-11 12:56:20 -04:00
Joey Hess	92c83a417f	refactoring	2024-06-11 10:22:05 -04:00
Joey Hess	501d65eeab	started implementing git-annex-shell proxy So far, it negotiates VERSION with both parties. This is a tricky dance. Untested.	2024-06-10 18:01:36 -04:00
Joey Hess	649b87bedd	Merge branch 'master' into proxy	2024-06-10 14:26:18 -04:00
Joey Hess	9a8391078a	git-annex-shell: block relay requests connRepo is only used when relaying git upload-pack and receive-pack. That's only supposed to be used when git-annex-remotedaemon is serving git-remote-tor-annex connections over tor. But, it was always set, and so could be used in other places possibly. Fixed by making connRepo optional in the P2P protocol interface. In Command.EnableTor, it's not needed, because it only speaks the protocol in order to check that it's able to connect back to itself via the hidden service. So changed that to pass Nothing rather than the git repo. In Remote.Helper.Ssh, it's connecting to git-annex-shell p2pstdio, so is making the requests, so will never need connRepo. In git-annex-shell p2pstdio, it was accepting git upload-pack and receive-pack requests over the P2P protocol, even though nothing sent them. This is arguably a security hole, particularly if the user has set environment variables like GIT_ANNEX_SHELL_LIMITED to prevent git push/pull via git-annex-shell.	2024-06-10 14:16:27 -04:00
Joey Hess	f97f4b8bdb	Added updateproxy command and remote.name.annex-proxy configuration So far this only records proxy information on the git-annex branch.	2024-06-04 14:52:03 -04:00
Joey Hess	98762a2f96	group: Added --list option Seemed to make sense to exclude groups used only by dead repositories.	2024-05-29 13:37:35 -04:00
Joey Hess	19418e81ee	git-remote-annex: Display full url when using remote with the shorthand url	2024-05-24 17:15:31 -04:00
Joey Hess	58301e40d2	sync with special remotes with an annex:: url Check explicitly for an annex:: url, not just any url. While no built-in special remotes set an url, except ones that can be synced with, it seems possible that some external special remote sets an url for its own use, but did not expect it to be used by git-annex sync et al. The assistant also syncs with them.	2024-05-24 14:57:29 -04:00
Joey Hess	22bf23782f	initremote, enableremote: Added --with-url to enable using git-remote-annex Also sets remote.name.fetch to a typical value, same as git remote add does.	2024-05-24 14:29:36 -04:00
Joey Hess	434a88c368	Merge branch 'git-remote-annex'	2024-05-15 17:57:50 -04:00
Joey Hess	768cdee461	testremote: Really fsck downloaded objects `8844372c23` exposted a bug in testremote, it was passing the serialized key, not the object file, to be checksummed.	2024-05-15 17:57:27 -04:00
Joey Hess	468de43d66	Merge branch 'master' into git-remote-annex	2024-05-15 17:49:12 -04:00
Joey Hess	24af51e66d	git-annex unused --from remote skips its git-remote-annex keys This turns out to only be necessary is edge cases. Most of the time, git-annex unused --from remote doesn't see git-remote-annex keys at all, because it does not record a location log for them. On the other hand, git-annex unused does find them, since it does not rely on the location log. And that's good because they're a local cache that the user should be able to drop. If, however, the user ran git-annex unused and then git-annex move --unused --to remote, the keys would have a location log for that remote. Then git-annex unused --from remote would see them, and would consider them unused. Even when they are present on the special remote they belong to. And that risks losing data if they drop the keys from the special remote, but didn't expect it would delete git branches they had pushed to it. So, make git-annex unused --from skip git-remote-annex keys whose uuid is the same as the remote.	2024-05-14 15:17:40 -04:00
Joey Hess	0281f7f23e	Avoid the --fast option preventing checksumming in some cases it was not supposed to fsck --fast was intended to disable checksumming, but checksumming is done after transfers too. Due to the check being in the non-incremental path, it would only affect non-incremental checksumming during a transfer, and I'm not 100% sure that it was a problem. Also, when using an external backend that does checksumming, fsck --fast didn't disable it and now does.	2024-05-12 21:36:48 -04:00
Joey Hess	05684bdd6c	fsck: Fix recent reversion that made it say it was checksumming files whose content is not present Did not track down the commit that caused the problem, but git-annex version 10.20240431 didn't behave that way.	2024-05-12 21:23:27 -04:00

1 2 3 4 5 ...

2856 commits