git-annex

Author	SHA1	Message	Date
Joey Hess	3b37b9e53f	fix serveGet hang This came down to SendBytes waiting on the waitv. Nothing ever filled it. Only Annex.Proxy needs the waitv, and it handles filling it. So make it optional.	2024-07-11 07:46:52 -04:00
Joey Hess	8cb1332407	update	2024-07-10 16:10:08 -04:00
Joey Hess	b5b3d8cde2	update	2024-07-09 14:30:50 -04:00
Joey Hess	751b8e0baf	implemented serveCheckPresent Still need a way to run Proto though	2024-07-09 09:08:42 -04:00
Joey Hess	3f402a20a8	implement Locker	2024-07-08 21:00:10 -04:00
Joey Hess	838169ee86	status	2024-07-07 16:16:11 -04:00
Joey Hess	40306d3fcf	finalizing HTTP P2p protocol some more Added v2-v0 endpoints. These are tedious, but will be needed in order to use the HTTP protocol to proxy to repositories with older git-annex, where git-annex-shell will be speaking an older version of the protocol. Changed GET to use 422 when the content is not present. 404 is needed to detect when a protocol version is not supported.	2024-07-05 15:34:58 -04:00
Joey Hess	0bfdc57d25	update	2024-07-04 15:18:06 -04:00
Joey Hess	f452bd448a	REMOVE-BEFORE and GETTIMESTAMP proxying For clusters, the timestamps have to be translated, since each node can have its own idea about what time it is. To translate a timestamp, the proxy remembers what time it asked the node for a timestamp in GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE. This does mean that a remove from a cluster has to call GETTIMESTAMP on every node before dropping from nodes. Not very efficient. Although currently it tries to drop from every single node anyway, which is also not very efficient. I thought about caching the GETTIMESTAMP from the nodes on the first call. That would improve efficiency. But, since monotonic clocks on !Linux don't advance when the computer is suspended, consider what might happen if one node was suspended for a while, then came back. Its monotonic timestamp would end up behind where the proxying expects it to be. Would that result in removing when it shouldn't, or refusing to remove when it should? Have not thought it through. Either way, a cluster behaving strangly for an extended period of time because one of its nodes was briefly asleep doesn't seem like good behavior.	2024-07-04 15:09:34 -04:00
Joey Hess	99b7a0cfe9	use REMOVE-BEFORE in P2P protocol Only clusters still need to be fixed to close this todo.	2024-07-04 13:47:38 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	f69661ab65	status	2024-07-03 17:04:12 -04:00
Joey Hess	44b3136fdf	update	2024-07-03 15:53:25 -04:00
Joey Hess	6a95eb08ce	status	2024-07-03 15:01:34 -04:00
Joey Hess	badcb502a4	todo	2024-07-03 13:15:09 -04:00
Joey Hess	24d63e8c8e	update	2024-07-02 18:04:29 -04:00
Joey Hess	b2a24a1669	update	2024-07-02 16:16:37 -04:00
Joey Hess	fbc4d549f3	reorder	2024-07-01 11:44:54 -04:00
Joey Hess	8db30323b0	update	2024-07-01 11:38:29 -04:00
Joey Hess	1e1584d34b	toc	2024-07-01 11:37:12 -04:00
Joey Hess	d9e66f7754	update	2024-07-01 11:33:07 -04:00
Joey Hess	f58a5f577d	update	2024-07-01 11:29:04 -04:00
Joey Hess	fa5e7463eb	fix display when proxied GET yields ERROR The error message is not displayed to the use, but this mirrors the behavior when a regular get from a special remote fails. At least now there is not a protocol error.	2024-07-01 11:19:02 -04:00
Joey Hess	dce3848ad8	avoid populating proxy's object file when storing on special remote Now that storeKey can have a different object file passed to it, this complication is not needed. This avoids a lot of strange situations, and will also be needed if streaming is eventually supported.	2024-07-01 10:53:49 -04:00
Joey Hess	0dfdc9f951	dup stdio handles for P2P proxy Special remotes might output to stdout, or read from stdin, which would mess up the P2P protocol. So dup the handles to avoid any such problem.	2024-07-01 10:06:29 -04:00
Joey Hess	0e19c1c9fa	todo	2024-06-28 17:14:18 -04:00
Joey Hess	711a5166e2	PUT to proxied special remote working Still needs some work. The reason that the waitv is necessary is because without it, runNet loops back around and reads the next protocol message. But it's not finished reading the whole bytestring yet, and so it reads some part of it.	2024-06-28 17:10:58 -04:00
Joey Hess	2e5af38f86	GET from proxied special remote Working, but lots of room for improvement... Without streaming, so there is a delay before download begins as the file is retreived from the special remote. And when resuming it retrieves the whole file from the special remote again. Also, if the special remote throws an exception, currently it shows as "protocol error".	2024-06-28 15:44:48 -04:00
Joey Hess	5b1971e2f8	merged the proxy branch into master!	2024-06-27 15:44:11 -04:00
Joey Hess	c3f88923c0	Merge branch 'proxy'	2024-06-27 15:43:45 -04:00
Joey Hess	85f4527d74	update	2024-06-27 15:28:10 -04:00
Joey Hess	20ef1262df	give proxied cluster nodes a higher cost than the cluster gateway This makes eg git-annex get default to using the cluster rather than an arbitrary node, which is better UI. The actual cost of accessing a proxied node vs using the cluster is basically the same. But using the cluster allows smarter load-balancing to be done on the cluster.	2024-06-27 15:21:03 -04:00
Joey Hess	cf59d7f92c	GET and CHECKPRESENT amoung lowest cost cluster nodes Before it was using a node that might have had a higher cost. Also threw in a random selection from amoung the low cost nodes. Of course this is a poor excuse for load balancing, but it's better than nothing. Most of the time...	2024-06-27 14:36:55 -04:00
Joey Hess	dceb8dc776	update	2024-06-27 13:40:09 -04:00
Joey Hess	c9d63d74d8	remove viconfig item it works when run on a client that has the cluster gateway as a remote, just not when on the cluster gateway	2024-06-27 13:34:24 -04:00
Joey Hess	87a7eeac33	document various multi-gateway cluster considerations Perhaps this will avoid me needing to eg, implement spanning tree protocol. ;-)	2024-06-27 13:33:19 -04:00
Joey Hess	8e322f76bc	updates	2024-06-27 12:57:08 -04:00
Joey Hess	3dad9446ce	distributed cluster cycle prevention Added BYPASS to P2P protocol, and use it to avoid cycling between cluster gateways. Distributed clusters are working well now!	2024-06-27 12:20:22 -04:00
Joey Hess	effaf51b1f	avoid loop between cluster gateways The VIA extension is still needed to avoid some extra work and ugly messages, but this is enough that it actually works. This filters out the RemoteSides that are a proxied connection via a remote gateway to the cluster. The VIA extension will not filter those out, but will send VIA to them on connect, which will cause the ones that are accessed via the listed gateways to be filtered out.	2024-06-26 15:29:59 -04:00
Joey Hess	4172109c8d	support multi-gateway clusters VIA extension still needed otherwise a copy to a cluster can loop forever.	2024-06-26 15:07:03 -04:00
Joey Hess	07e899c9d3	git-annex-shell: proxy nodes located beyond remote cluster gateways Walking a tightrope between security and convenience here, because git-annex-shell needs to only proxy for things when there has been an explicit, local action to configure them. In this case, the user has to have run `git-annex extendcluster`, which now sets annex-cluster-gateway on the remote. Note that any repositories that the gateway is recorded to proxy for will be proxied onward. This is not limited to cluster nodes, because checking the node log would not add any security; someone could add any uuid to it. The gateway of course then does its own checking to determine if it will allow proxying for the remote.	2024-06-26 12:56:16 -04:00
Joey Hess	798d6f6a46	todo	2024-06-25 17:58:45 -04:00
Joey Hess	e3dd29409b	improve docs	2024-06-25 17:50:22 -04:00
Joey Hess	0a1001dbfb	update	2024-06-25 17:26:26 -04:00
Joey Hess	9a8dcb58cd	design for distributed clusters	2024-06-25 17:20:49 -04:00
Joey Hess	b9889917a3	thoughts on cycles Rejected the idea of automatically instantiating remotes for proxies-of-proxies. That needs cycle protection, while the current behavior, which happened for free, is that running git-annex updateproxy on the proxy can be used to configure it, but only for topologies that actually exist.	2024-06-25 15:32:11 -04:00
Joey Hess	cec2848e8a	support annex.jobs for clusters	2024-06-25 14:54:20 -04:00
Joey Hess	5ede109ae5	gave up on upload fanout to cluster's proxy The problem with that idea is that the cluster's proxy is necessarily a remote, and necessarily one that we'll want to sync with, since the git repository is stored there. So when its preferred content wants a file, and the cluster does too, the file will get uploaded to it as well as to the cluster. With fanout, the upload to the cluster will populate the proxy as well, avoiding a second upload. But only if the file is sent to the cluster first. If it's sent to the proxy first, there will be two uploads. Another, lesser problem is that a repository can proxy for more than one cluster. So when does it make sense to drop content from the repository? It could be done when dropping from one cluster, but what of the other one? This complication was not necessary anyway. Instead, if it's desirable to have some content accessed from close to the proxy, one of the cluster nodes can just be put on the same filesystem as it. That will be just as fast as storing the content on the proxy.	2024-06-25 13:35:12 -04:00
Joey Hess	1bfe7f8a53	honor preferred content settings of cluster nodes Except when no nodes want a file, it has to be stored somewhere, so store it on all. Which is not really desirable, but neither is having to pick one. ProtoAssociatedFile deserialization is rather broken, and this could possibly affect preferred content expressions that match on filenames. The inability to roundtrip whitespace like tabs and newlines through is not a problem because preferred content expressions can't be written that match on whitespace such as a tab. For example: joey@darkstar:~/tmp/bench/z>git-annex wanted origin-node2 'exclude=CTRL-VTab' wanted origin-node2 git-annex: Parse error: Parse failure: near "*" But, the filtering of control characters could perhaps be a problem. I think that filtering is now obsolete, git-annex has comprehensive filtering of control characters when displaying filenames, that happens at a higher level. However, I don't want to risk a security hole so am leaving in that filtering in ProtoAssociatedFile deserialization for now.	2024-06-25 11:43:09 -04:00
Joey Hess	202ea3ff2a	don't sync with cluster nodes by default Avoid `git-annex sync --content` etc from operating on cluster nodes by default since syncing with a cluster implicitly syncs with its nodes. This avoids a lot of unncessary work when a cluster has a lot of nodes just in checking if each node's preferred content is satisfied. And it avoids content being sent to nodes individually, so instead syncing with clusters always fanout uploads to nodes. The downside is that there are situations where a cluster's preferred content settings can be met, but those of its nodes are not. Or where a node does not contain a key, but the cluster does, and there are not enough copies of the key yet, so it would be desirable the send it there. I think that's an acceptable tradeoff. These kind of situations are ones where the cluster itself should probably be responsible for copying content to the node. Which it can do much less expensively than a client can. Part of the balanced preferred content design that I will be working on in a couple of months involves rebalancing clusters, so I expect to revisit this. The use of annex-sync config does allow running git-annex sync with a specific node, or nodes, and it will sync with it. And it's also possible to set annex-sync git configs to make it sync with a node by default. (Although that will require setting up an explicit git remote for the node rather than relying on the proxied remote.) Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster due to a cycle. And the Annex.Startup load of clusters happens too late for Remote.Git to use that. This does mean one redundant load of the cluster log, though only when there is a proxy.	2024-06-25 10:24:38 -04:00

1 2 3 4 5 ...

4624 commits