git-annex

Author	SHA1	Message	Date
Joey Hess	b8016eeb65	add annex-proxied This makes git-annex sync and similar not treat proxied remotes as git syncable remotes. Also, display in git-annex info remote when the remote is proxied.	2024-06-24 10:16:59 -04:00
Joey Hess	0c111fc96a	fix git-annex sync --content with proxied remotes Loading the remote list a second time was removing all proxied remotes. That happened because setting up the proxied remote added some config fields to the in-memory git config, and on the second load, it saw those configs and decided not to overwrite them with the proxy. Now on the second load, that still happens. But now, the proxied git configs are used to generate a remote same as if those configs were all set. The reason that didn't happen before was twofold, the gitremotes cache was not dropped, and the remote's url field was not set correctly. The problem with the remote's url field is that while it was marked as proxy inherited, all other proxy inherited fields are annex- configs. And the code to inherit didn't work for the url field. Now it all works, but git-annex sync is left running git push/pull on the proxied remote, which doesn't work. That still needs to be fixed.	2024-06-24 09:45:51 -04:00
Joey Hess	6f94062c53	drop gitremotes cache when config is changed	2024-06-24 09:36:21 -04:00
Joey Hess	bf6b309917	remove attempt to avoid git syncing with instantiate proxied remotes It didn't work. Actually, sync was skipping those remotes due to a bug.	2024-06-24 09:35:24 -04:00
Joey Hess	60413a2557	update	2024-06-23 16:38:01 -04:00
Joey Hess	5d8bdac38e	upload fanout resume seems free of fenceposts Tested it with small chunk sizes (like 2) and resumes that were eg 1 byte from the end of the file or beginning of file. Also, git-annex testremote passes now against a cluster!	2024-06-23 16:22:39 -04:00
Joey Hess	8a341cd195	fix comparison With this a PUT to two remotes that have different partial amounts transferred works reliably. I'm not sure though that it doesn't have fencepost errors.	2024-06-23 16:01:58 -04:00
Joey Hess	9e070470f4	update	2024-06-23 12:48:22 -04:00
Joey Hess	3cd7969823	update	2024-06-23 12:31:00 -04:00
Joey Hess	d0aec8f623	always check numcopies when moving from cluster When the destination does not start with a copy, the cluster has one or more copies. If more, dropping would reduce the number of copies, so numcopies must be checked. Considered checking how many nodes of the cluster contain a copy. If only 1 node does, it could allow a move without checking numcopies. The problem with that, though, is that other nodes of the cluster could have copies that we don't know about. And dropping from a cluster tries to drop from all nodes, so will drop even from those. So any drop from a cluster can remove more than 1 copy.	2024-06-23 12:00:50 -04:00
Joey Hess	ec5b6454f4	todo	2024-06-23 10:09:35 -04:00
Joey Hess	466c972913	don't use SUCCESS-PLUS unncessarily When dropping from a proxied remote that is not a cluster, SUCCESS-PLUS is not needed, so don't use it.	2024-06-23 10:07:26 -04:00
Joey Hess	2762f9c4ce	fix location log update for copy to 1-node cluster	2024-06-23 09:53:33 -04:00
Joey Hess	5b332a87be	dropping from clusters Dropping from a cluster drops from every node of the cluster. Including nodes that the cluster does not think have the content. This is different from GET and CHECKPRESENT, which do trust the cluster's location log. The difference is that removing from a cluster should make 100% the content is gone from every node. So doing extra work is ok. Compare with CHECKPRESENT where checking every node could make it very expensive, and the worst that can happen in a false negative is extra work being done. Extended the P2P protocol with FAILURE-PLUS to handle the case where a drop from one node succeeds, but a drop from another node fails. In that case the entire cluster drop has failed. Note that SUCCESS-PLUS is returned when dropping from a proxied remote that is not a cluster, when the protocol version supports it. This is because P2P.Proxy does not know when it's proxying for a single node cluster vs for a remote that is not a cluster.	2024-06-23 09:43:40 -04:00
Joey Hess	a6a04b7e5e	avoid storing SUCCESS-PLUS uuid when it is the remote uuid This is slightly belt and suspenders, but nothing guarantees that the peer avoids including its uuid in the SUCCESS-PLUS list as it's supposed to. And while it probably doesn't matter if the location log is updated redundantly, let's not find out.	2024-06-23 08:21:11 -04:00
Joey Hess	7bbd822a17	avoid using cluster nodes in drop proof when dropping from cluster This is obviously necessary in order for dropping from a cluster to be able to drop from all nodes. It also avoids violating numcopies when a cluster node is a special remote. If it were used in the drop proof, nothing would prevent the cluster from dropping from it.	2024-06-23 06:20:11 -04:00
Joey Hess	5a4b4b59b9	update	2024-06-23 05:26:45 -04:00
Joey Hess	53674e8abb	Merge branch 'master' into proxy	2024-06-20 11:20:26 -04:00
Joey Hess	53598e5154	merge from proxy branch	2024-06-20 11:20:16 -04:00
Joey Hess	d89ac8c6ee	Merge branch 'master' of ssh://git-annex.branchable.com	2024-06-20 11:03:30 -04:00
Joey Hess	9173095d11	add my distribits talk	2024-06-20 11:03:19 -04:00
Joey Hess	ff5fe4e759	clusters documentation	2024-06-20 10:57:43 -04:00
Joey Hess	032d3902d8	wording	2024-06-20 10:15:24 -04:00
Joey Hess	ecab2e03b9	working PUT fanout to multiple remotes for clusters Still need to check for fencepost errors on resume when different nodes have different amounts of data.	2024-06-20 10:04:26 -04:00
joris	b35be4b656	Added a comment	2024-06-20 09:58:05 +00:00
jochen.keil@38b1f86ab65128dab3e62e726403ceee4f5141bf	4da453e30c		2024-06-19 15:46:26 +00:00
Joey Hess	54307af8c0	more on proxying special remotes	2024-06-19 06:40:19 -04:00
Joey Hess	097ef9979c	towards a design for proxying to special remotes	2024-06-19 06:15:03 -04:00
Joey Hess	6eac3112e5	be quiet when reading cluster and proxy information at startup I had a transfer of 3 files fail like this: git-annex: transferrer protocol error: "(recording state in git...)" The remote had stalldetection enabled, although I didn't see it stall. So git-annex transferrer would have been started up. I guess that one of these new git-annex branch reads, that happens early, caused that message due to perhaps an uncommitted git-annex branch change. Since the transferrer speaks a protocol over stdout, it needs to be prevented from outputting other messages to stdout. Interestingly, startupAnnex is run after prepRunCommand, so if a command requests quiet output it would already be quiet. But the transferrer does not, instead it calls Annex.setOutput SerializedOutput in its start action.	2024-06-18 21:31:32 -04:00
Joey Hess	f916ce4b68	allow proxying to remotes that are nodes of clusters fixes reversion in `ca08f3fcc2`	2024-06-18 17:02:23 -04:00
Joey Hess	f18740699e	P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS is complete, when a PUT stores to additional repositories than the expected on, the location log is updated with the additional UUIDs that contain the content. Started implementing PUT fanout to multiple remotes for clusters. It is untested, and I fear fencepost errors in the relative offset calculations. And it is missing proxying for the protocol after DATA.	2024-06-18 16:21:40 -04:00
Joey Hess	ca08f3fcc2	only proxy to a remote when remote.name.annex-proxy is set Avoids someone writing to proxy.log and gaining access to remotes of someone else's repository that they were not intended to be able to proxy to.	2024-06-18 11:43:10 -04:00
Joey Hess	fb0fd78485	only use a remote as a node when git configuration is set Avoids someone writing to cluster.log and nominating remotes of someone else's repository as a cluster.	2024-06-18 11:37:38 -04:00
Joey Hess	f049156a03	checkpresent support for clusters This assumes that the proxy for a cluster has up-to-date location logs. If it didn't, it might proxy the checkpresent to a node that no longer has the content, while some other node still does, and so it would incorrectly appear that the cluster no longer contains the content. Since cluster UUIDs are not stored to location logs, git-annex fsck --fast when claiming to fix a location log when that occurred would not cause any problems. And presumably the location tracking would later get sorted out. At least usually, changes to the content of nodes goes via the proxy, and it will update its location logs, so they will be accurate. However, if there were multiple proxies to the same cluster, or nodes were accessed directly (or via proxy to the node and not the cluster), the proxy's location log could certainly be wrong. (The location log access for GET has the same issues.)	2024-06-18 11:16:16 -04:00
Joey Hess	88d9a02f7c	initial, working support for getting from clusters Currently tends to put all the load on a single node, which will need to be improved.	2024-06-18 11:01:10 -04:00
Joey Hess	d34326ab76	factor out Annex.Proxy	2024-06-18 10:51:37 -04:00
Joey Hess	f0d6114286	refactor cluster code into own module	2024-06-18 10:36:04 -04:00
Joey Hess	8290f70978	update	2024-06-18 10:08:15 -04:00
yarikoptic	28029d6668	original report / question	2024-06-18 13:57:23 +00:00
Joey Hess	ef26470810	ProxySelector data type	2024-06-17 19:19:15 -04:00
Joey Hess	7a839a983a	preparing for cluster node selection Support selecting what remote to proxy for each top-level P2P protocol message. This only needs to be extended now to support fanout to multiple nodes for PUT and REMOVE, and with a remote that fails for LOCKCONTENT and UNLOCKCONTENT. But a good first step would be to implement CHECKPRESENT and GET for clusters. Both should select a node that actually does have the content. That will allow a cluster to work for GET even when location tracking is out of date.	2024-06-17 15:51:10 -04:00
Joey Hess	291280ced2	started on git-annex-shell cluster support Works down to P2P protocol. The question now is, how to handle protocol version negotiation for clusters? Connecting to each node to find their protocol versions and using the lowest would be too expensive with a lot of nodes. So it seems that the cluster needs to pick its own protocol version to use with the client. Then it can either negotiate that same version with the nodes when it comes time to use them, or it can translate between multiple protocol versions. That seems complicated. Thinking it would be ok to refuse to use a node if it is not able to negotiate the same protocol version with it as with the client. That will mean that sometimes need nodes to be upgraded when upgrading the cluster's proxy. But protocol versions rarely change.	2024-06-17 15:10:04 -04:00
Joey Hess	c7ad44e4d1	work toward supporting proxying to multiple remotes at once For eg, upload fanout. Delay connecting to a remote until it's needed. When there are many proxied remotes, it would not do for the proxy to connect to each of them on startup; that could take a long time.	2024-06-17 14:16:44 -04:00
Joey Hess	83a1db8d17	more specific type	2024-06-17 13:04:40 -04:00
Joey Hess	b72ccc6f0c	improve types	2024-06-17 12:44:08 -04:00
Joey Hess	e2fd2ee2bd	update	2024-06-17 09:31:44 -04:00
Joey Hess	3970bbb03b	Merge branch 'master' into proxy	2024-06-17 09:29:34 -04:00
Joey Hess	af79728ac3	tab complete special remotes An oversight.. And with the work in progress proxy and cluster, there can be additional remotes that are not listed in .git/config, but are available. Making those more discoverable is another big benefit of this.	2024-06-17 09:26:03 -04:00
Joey Hess	64afbb0b93	don't count clusters as copies, continued Handled limitCopies, as well as everything using fromNumCopies and fromMinCopies. This should be everything, probably. Note that, git-annex info displays a count of repositories, which still includes cluster. I think that's ok. It would be possible to filter out clusters there, but to the user they're pretty much just another repository. The numcopies displayed by eg `git-annex info .` does not include clusters.	2024-06-16 15:14:53 -04:00
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00

1 2 3 4 5 ...

44970 commits