git-annex

Author	SHA1	Message	Date
Joey Hess	2e5af38f86	GET from proxied special remote Working, but lots of room for improvement... Without streaming, so there is a delay before download begins as the file is retreived from the special remote. And when resuming it retrieves the whole file from the special remote again. Also, if the special remote throws an exception, currently it shows as "protocol error".	2024-06-28 15:44:48 -04:00
Joey Hess	158d7bc933	fix handling of ERROR in response to REMOVE This allows an error message from a proxied special remote to be displayed to the client. In the case where removal from several nodes of a cluster fails, there can be several errors. What to do? I decided to only show the first error to the user. Probably in this case the user is not in a position to do anything about an error message, so best keep it simple. If the problem with the first node is fixed, they'll see the error from the next node.	2024-06-28 14:10:25 -04:00
Joey Hess	a6ea057f6b	fix handling of ERROR in response to CHECKPRESENT That error is now rethrown on the client, so it will be displayed. For example: $ git-annex fsck x --fast --from AMS-dir fsck x (special remote reports: directory /home/joey/tmp/bench2/dir is not accessible) failed No protocol version check is needed. Because in order to talk to a proxied special remote, the client has to be running the upcoming git-annex release. Which has this fix in it.	2024-06-28 13:46:27 -04:00
Joey Hess	d3c75c003a	proxying special remotes This is early, but already working for CHECKPRESENT. However, when the special remote throws an exception on checkPresent, this happens: [2024-06-28 13:22:18.520884287] (P2P.IO) [ThreadId 4] P2P > ERROR directory /home/joey/tmp/bench2/dir is not accessible [2024-06-28 13:22:18.521053135] (P2P.IO) [ThreadId 4] P2P < ERROR expected SUCCESS or FAILURE git-annex: client error: expected SUCCESS or FAILURE (fixing location log) p2pstdio: 1 failed Based on the location log, x was expected to be present, but its content is missing. failed	2024-06-28 13:31:19 -04:00
Joey Hess	62750f0102	shut down RemoteSides cleanly Before it just exited without actually shutting down the RemoteSides, when the client hung up.	2024-06-28 13:19:57 -04:00
Joey Hess	c3a785204e	support a P2PConnection that uses TMVars rather than Handles This will allow having an internal thread speaking P2P protocol, which will be needed to support proxying to external special remotes. No serialization is done on the internal P2P protocol of course. When a ByteString is being exchanged, it may or may not be exactly the length indicated by DATA. While that has to be carefully managed for the serialized P2P protocol, here it would require buffering the whole lazy bytestring in memory to check its length when sending, so it's better to do length checks on the receiving side.	2024-06-28 11:22:29 -04:00
Joey Hess	28f5c47b5a	remove mention of XMPP which is no longer used	2024-06-27 15:56:30 -04:00
Joey Hess	9305d62b54	layout	2024-06-27 15:52:58 -04:00
Joey Hess	a367e8a9a1	layout	2024-06-27 15:52:10 -04:00
Joey Hess	5ed690b690	improve	2024-06-27 15:50:27 -04:00
Joey Hess	5b1971e2f8	merged the proxy branch into master!	2024-06-27 15:44:11 -04:00
Joey Hess	c3f88923c0	Merge branch 'proxy'	2024-06-27 15:43:45 -04:00
Joey Hess	bd2507de17	Merge branch 'master' of ssh://git-annex.branchable.com	2024-06-27 15:43:42 -04:00
Joey Hess	591f79a9c3	move clusters page to tips also add a section on the front page highlighting major new features	2024-06-27 15:41:38 -04:00
Joey Hess	41a0817188	make extendcluster also updatecluster This avoids the user forgetting to do it and simplifies the documentation.	2024-06-27 15:34:45 -04:00
Joey Hess	85f4527d74	update	2024-06-27 15:28:10 -04:00
Joey Hess	20ef1262df	give proxied cluster nodes a higher cost than the cluster gateway This makes eg git-annex get default to using the cluster rather than an arbitrary node, which is better UI. The actual cost of accessing a proxied node vs using the cluster is basically the same. But using the cluster allows smarter load-balancing to be done on the cluster.	2024-06-27 15:21:03 -04:00
Joey Hess	cf59d7f92c	GET and CHECKPRESENT amoung lowest cost cluster nodes Before it was using a node that might have had a higher cost. Also threw in a random selection from amoung the low cost nodes. Of course this is a poor excuse for load balancing, but it's better than nothing. Most of the time...	2024-06-27 14:36:55 -04:00
Joey Hess	dceb8dc776	update	2024-06-27 13:40:09 -04:00
Joey Hess	dabd05e547	remove a TODO marker I have a todo item for this outside the code	2024-06-27 13:36:04 -04:00
Joey Hess	c9d63d74d8	remove viconfig item it works when run on a client that has the cluster gateway as a remote, just not when on the cluster gateway	2024-06-27 13:34:24 -04:00
Joey Hess	87a7eeac33	document various multi-gateway cluster considerations Perhaps this will avoid me needing to eg, implement spanning tree protocol. ;-)	2024-06-27 13:33:19 -04:00
Joey Hess	8e322f76bc	updates	2024-06-27 12:57:08 -04:00
Joey Hess	dbfff04fb6	update for clusters	2024-06-27 12:47:26 -04:00
Joey Hess	0ef4183b00	Merge branch 'master' into proxy	2024-06-27 12:41:57 -04:00
Joey Hess	ea8c50ec8a	remove unused import	2024-06-27 12:38:32 -04:00
Joey Hess	19137ae780	avoid unfiltered debugging from git-annex-shell When --debugfilter or annex.debugfilter is set, avoid propigating debug output from git-annex-shell, since it cannot be filtered. It would be possible to pass --debugfilter on to git-annex-shell, but it only started accepting that option in 2022. So it would break interop with older versions.	2024-06-27 12:37:25 -04:00
Joey Hess	3dad9446ce	distributed cluster cycle prevention Added BYPASS to P2P protocol, and use it to avoid cycling between cluster gateways. Distributed clusters are working well now!	2024-06-27 12:20:22 -04:00
lykos@d125a37d89b1cfac20829f12911656c40cb70018	bc451b6aa8		2024-06-27 10:47:43 +00:00
Joey Hess	effaf51b1f	avoid loop between cluster gateways The VIA extension is still needed to avoid some extra work and ugly messages, but this is enough that it actually works. This filters out the RemoteSides that are a proxied connection via a remote gateway to the cluster. The VIA extension will not filter those out, but will send VIA to them on connect, which will cause the ones that are accessed via the listed gateways to be filtered out.	2024-06-26 15:29:59 -04:00
Joey Hess	4172109c8d	support multi-gateway clusters VIA extension still needed otherwise a copy to a cluster can loop forever.	2024-06-26 15:07:03 -04:00
Joey Hess	8b6708e745	update for multi-gateway clusters	2024-06-26 14:40:25 -04:00
Joey Hess	923953c9fe	fix cycle prevention code	2024-06-26 13:21:51 -04:00
Joey Hess	07e899c9d3	git-annex-shell: proxy nodes located beyond remote cluster gateways Walking a tightrope between security and convenience here, because git-annex-shell needs to only proxy for things when there has been an explicit, local action to configure them. In this case, the user has to have run `git-annex extendcluster`, which now sets annex-cluster-gateway on the remote. Note that any repositories that the gateway is recorded to proxy for will be proxied onward. This is not limited to cluster nodes, because checking the node log would not add any security; someone could add any uuid to it. The gateway of course then does its own checking to determine if it will allow proxying for the remote.	2024-06-26 12:56:16 -04:00
Joey Hess	1ec2fecf3f	set up proxies for cluster nodes that are themselves proxied via a remote When there are multiple gateways to a cluster, this sets up proxying for nodes that are accessed via a remote gateway. Eg, when running in nyc and amsterdam is the remote gateway, and it has node1 and node2, this sets up proxying for amsterdam-node1 and amsterdam-node2. A client that has nyc as a remote will see proxied remotes nyc-amsterdam-node1 and nyc-amsterdam-node2.	2024-06-26 11:24:55 -04:00
Joey Hess	02bf3ddc3f	updatecluster: support multiple gateways Just look at the existing proxied remotes that correspond to already existing nodes of the cluster, and keep those nodes in the cluster. While adding any remotes of the local repo that are configured as cluster nodes. This allows removing cluster nodes from the local repo and updating, without it also removing nodes provided by other gateways.	2024-06-26 10:51:14 -04:00
Joey Hess	0b72b85df5	added git-annex extendcluster This works, but updatecluster does not work yet in multi-gateway clusters, nor do gateways relay to other gateways.	2024-06-26 10:26:54 -04:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	f9ce7a452c	Added a comment	2024-06-26 10:20:29 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	6e6811c72f	Do checkpresentkey with --debug set	2024-06-26 10:11:58 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	b1e36c5ddf		2024-06-26 08:06:37 +00:00
Joey Hess	798d6f6a46	todo	2024-06-25 17:58:45 -04:00
Joey Hess	e3dd29409b	improve docs	2024-06-25 17:50:22 -04:00
Joey Hess	0a1001dbfb	update	2024-06-25 17:26:26 -04:00
Joey Hess	9a8dcb58cd	design for distributed clusters	2024-06-25 17:20:49 -04:00
Joey Hess	b9889917a3	thoughts on cycles Rejected the idea of automatically instantiating remotes for proxies-of-proxies. That needs cycle protection, while the current behavior, which happened for free, is that running git-annex updateproxy on the proxy can be used to configure it, but only for topologies that actually exist.	2024-06-25 15:32:11 -04:00
Joey Hess	cec2848e8a	support annex.jobs for clusters	2024-06-25 14:54:20 -04:00
Joey Hess	818030e4d3	improve handling of cluster nodes disconnecting	2024-06-25 14:10:06 -04:00
Joey Hess	5ede109ae5	gave up on upload fanout to cluster's proxy The problem with that idea is that the cluster's proxy is necessarily a remote, and necessarily one that we'll want to sync with, since the git repository is stored there. So when its preferred content wants a file, and the cluster does too, the file will get uploaded to it as well as to the cluster. With fanout, the upload to the cluster will populate the proxy as well, avoiding a second upload. But only if the file is sent to the cluster first. If it's sent to the proxy first, there will be two uploads. Another, lesser problem is that a repository can proxy for more than one cluster. So when does it make sense to drop content from the repository? It could be done when dropping from one cluster, but what of the other one? This complication was not necessary anyway. Instead, if it's desirable to have some content accessed from close to the proxy, one of the cluster nodes can just be put on the same filesystem as it. That will be just as fast as storing the content on the proxy.	2024-06-25 13:35:12 -04:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	bbdfe6b910		2024-06-25 15:59:36 +00:00
Joey Hess	1bfe7f8a53	honor preferred content settings of cluster nodes Except when no nodes want a file, it has to be stored somewhere, so store it on all. Which is not really desirable, but neither is having to pick one. ProtoAssociatedFile deserialization is rather broken, and this could possibly affect preferred content expressions that match on filenames. The inability to roundtrip whitespace like tabs and newlines through is not a problem because preferred content expressions can't be written that match on whitespace such as a tab. For example: joey@darkstar:~/tmp/bench/z>git-annex wanted origin-node2 'exclude=CTRL-VTab' wanted origin-node2 git-annex: Parse error: Parse failure: near "*" But, the filtering of control characters could perhaps be a problem. I think that filtering is now obsolete, git-annex has comprehensive filtering of control characters when displaying filenames, that happens at a higher level. However, I don't want to risk a security hole so am leaving in that filtering in ProtoAssociatedFile deserialization for now.	2024-06-25 11:43:09 -04:00

1 2 3 4 5 ...

45024 commits