git-annex

Author	SHA1	Message	Date
Joey Hess	5c39652235	starting support for remote.name.annexUrl set to annex+http In this case, Remote.Git should not use that url for all access to the repository. It will only be used for annex operations, which isn't done yet.	2024-07-23 09:12:21 -04:00
Joey Hess	2acde0152a	fix build	2024-07-22 21:19:20 -04:00
Joey Hess	06de2ad972	change default port to 9417 Port 80 would need root, not a good idea, so pick something that might work by default. 9418 is git protocol's port. 9419 is used by something, but nothing known uses 9417, so it's as good a default as any.	2024-07-22 20:52:17 -04:00
Joey Hess	7f4cff7ae9	locking over http basically working	2024-07-22 19:44:26 -04:00
Joey Hess	e979e85bff	make serveKeepLocked check auth just to be safe	2024-07-22 19:15:52 -04:00
Joey Hess	d5eaf0f567	improve clientKeepLocked	2024-07-22 16:56:44 -04:00
Joey Hess	48eb6671e4	improve clientGet types	2024-07-22 16:23:08 -04:00
Joey Hess	3069e28dd8	implemented servePutOffset and clientPutOffset But, it's buggy: the server hangs without processing the VALIDITY, and I can't seem to work out why. As far as I can see, storefile is getting as far as running the validitycheck, which is supposed to read that, but never does. This is especially strange because what seems like the same protocol doesn't hang when servePut runs it. This made me think that it needed to use inAnnexWorker to be more like servePut, but that didn't help. Another small problem with this is that it does create an empty .git/annex/tmp/ file for the key. Since this will usually be used in combination with servePut, that doesn't seem worth worrying about much.	2024-07-22 15:04:10 -04:00
Joey Hess	b240a11b79	clientPut seeking to offset	2024-07-22 12:50:21 -04:00
Joey Hess	a01426b713	avoid padding in servePut This means that when the client sends a truncated data to indicate invalidity, DATA is not passed the full expected data. That leaves the P2P connection in a state where it cannot be reused. While so far, they are not reused, they will be later when proxies are supported. So, have to close the P2P connection in this situation.	2024-07-22 12:30:30 -04:00
Joey Hess	4826a3745d	servePut and clientPut implementation Made the data-length header required even for v0. This simplifies the implementation, and doesn't preclude extra verification being done for v0. The connectionWaitVar is an ugly hack. In servePut, nothing waits on the waitvar, and I could not find a good way to make anything wait on it.	2024-07-22 10:27:44 -04:00
Joey Hess	97a2d0e4fb	use worker pool in withLocalP2PConnections This allows multiple clients to be handled at the same time.	2024-07-11 14:37:52 -04:00
Joey Hess	2228d56db3	serveGet invalidation	2024-07-11 11:42:32 -04:00
Joey Hess	74c6175795	fix serveGet early handle close Needed that waitv after all..	2024-07-11 09:55:17 -04:00
Joey Hess	1e0f92a5a1	implemented serveGet and clientGet Both are only at bare proof of concept stage. Still need to deal with signaling validity and invalidity, and checking it. And there's a bad bug: After -JN*2 requests, another request hangs! So, I think it's failing to free up the Annex worker and end of request lifetime. Perhaps I need to use this: https://docs.servant.dev/en/stable/cookbook/managed-resource/ManagedResource.html	2024-07-10 16:06:39 -04:00
Joey Hess	f9b7ce7224	add Annex worker pool to P2PHttp This will be needed for get and store, since those need to run Annex actions. withLocalP2PConnections will also probably use it.	2024-07-10 12:19:47 -04:00
Joey Hess	d4b9aea87b	implement gettimestamp	2024-07-10 10:23:10 -04:00
Joey Hess	7c588a5791	implement remove-before The reason to use removeBeforeRemoteEndTime is twofold. First, removeBefore sends two protocol commands. Currently, the HTTP protocol runner only supports sending a single command per invocation. Secondly, the http server gets a monotonic timestamp from the client. So translating back to a POSIXTime would be annoying. The timestamp flow with a proxy will be: - client gets timestamp, which gets the monotonic timestamp from the proxied remote via the proxy. The timestamp is currently not proxied when there is a single proxy. - client calls remove-before - http server calls removeBeforeRemoteEndTime which sends REMOVE-BEFORE to the proxied remote.	2024-07-10 10:03:26 -04:00
Joey Hess	b8a26712c6	implement clientRemove Tested removal.	2024-07-10 09:20:13 -04:00
Joey Hess	48f76cb3e8	implement serveRemove and send WWW-Authenticate header on auth failure	2024-07-10 09:13:01 -04:00
Joey Hess	97d0fc9b65	git-annex p2phttp options	2024-07-10 00:01:55 -04:00
Joey Hess	08371c3745	started on auth	2024-07-09 17:30:55 -04:00
Joey Hess	3d13521479	set up handles for p2phttp Now it fully works.. for the first request. But then it gets stuck waiting for the P2P protocol runner to shut down.	2024-07-09 13:50:42 -04:00
Joey Hess	edf8a3df2d	p2phttp is almost working for checkpresent The server is fully running annex actions, only the P2PConnection is wrong, currently using stdio.	2024-07-09 13:37:55 -04:00
Joey Hess	0bdee626ad	thread in a state	2024-07-08 14:00:23 -04:00
Joey Hess	82d66ede5e	convert lockcontent api to http long polling Websockets would work, but the problem with using them for this is that each lockcontent call is a separate websocket connection. And that's an actual TCP connection. One TCP connection per file dropped would be too expensive. With http long polling, regular http pipelining can be used, so it will reuse a TCP connection. Unfortunately, at least with servant, bi-directional streams with long polling don't result in true bidirectional full duplex communication. Servant processes the whole client body stream before generating the server body stream. I think it's entirely possible to do full bi-directional communication over http, but it would need changes to servant. And, there's no way for the client to tell if the server successfully locked the content, since the server will keep processing the client stream no matter what.: So, added a new api endpoint, keeplocked. lockcontent will lock the key for 10 minutes with retention lock, and then a call to keeplocked will keep it locked for as long as needed. This does mean that there will need to be a Map of locks by key, and I will probably want to add some kind of lock identifier that lockcontent returns.	2024-07-08 12:57:46 -04:00
Joey Hess	522700d1c4	implemented servant-client support for websockets	2024-07-08 07:44:59 -04:00
Joey Hess	1dbb5ec70d	servant API type is complete	2024-07-07 12:59:12 -04:00
Joey Hess	86ce3bf1e4	started servant implementation of HTTP P2P protocol	2024-07-07 12:08:10 -04:00
Joey Hess	f452bd448a	REMOVE-BEFORE and GETTIMESTAMP proxying For clusters, the timestamps have to be translated, since each node can have its own idea about what time it is. To translate a timestamp, the proxy remembers what time it asked the node for a timestamp in GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE. This does mean that a remove from a cluster has to call GETTIMESTAMP on every node before dropping from nodes. Not very efficient. Although currently it tries to drop from every single node anyway, which is also not very efficient. I thought about caching the GETTIMESTAMP from the nodes on the first call. That would improve efficiency. But, since monotonic clocks on !Linux don't advance when the computer is suspended, consider what might happen if one node was suspended for a while, then came back. Its monotonic timestamp would end up behind where the proxying expects it to be. Would that result in removing when it shouldn't, or refusing to remove when it should? Have not thought it through. Either way, a cluster behaving strangly for an extended period of time because one of its nodes was briefly asleep doesn't seem like good behavior.	2024-07-04 15:09:34 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	d2b27ca136	add content retention files This allows lockContentShared to lock content for eg, 10 minutes and if the process then gets terminated before it can unlock, the content will remain locked for that amount of time. The Windows implementation is not yet tested. In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio or remotedaemon is serving the P2P protocol, and is asked to LOCKCONTENT, and that process gets killed, the content will not be subject to deletion. This is not a perfect solution to doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most of the way there, without needing any P2P protocol changes. This is only done in v10 and higher repositories (or on Windows). It might be possible to backport it to v8 or earlier, but it would complicate locking even further, and without a separate lock file, might be hard. I think that by the time this fix reaches a given user, they will probably have been running git-annex 10.x long enough that their v8 repositories will have upgraded to v10 after the 1 year wait. And it's not as if git-annex hasn't already been subject to this problem (though I have not heard of any data loss caused by it) for 6 years already, so waiting another fraction of a year on top of however long it takes this fix to reach users is unlikely to be a problem.	2024-07-03 14:58:39 -04:00
Joey Hess	8fa4e25c1e	Merge branch 'master' into proxy-specialremotes	2024-07-01 11:23:21 -04:00
Joey Hess	8b5fc94d50	add optional object file location to storeKey This will be used by the next commit to simplify the proxy.	2024-07-01 10:42:27 -04:00
Joey Hess	0dfdc9f951	dup stdio handles for P2P proxy Special remotes might output to stdout, or read from stdin, which would mess up the P2P protocol. So dup the handles to avoid any such problem.	2024-07-01 10:06:29 -04:00
Joey Hess	0033e6c0a6	Tab completion of many commands like info and trust now includes remotes Especially useful with proxied remotes and clusters, where the user may not be entirely familiar with the name and can learn by tab completion.	2024-06-30 12:39:18 -04:00
Joey Hess	62750f0102	shut down RemoteSides cleanly Before it just exited without actually shutting down the RemoteSides, when the client hung up.	2024-06-28 13:19:57 -04:00
Joey Hess	c3a785204e	support a P2PConnection that uses TMVars rather than Handles This will allow having an internal thread speaking P2P protocol, which will be needed to support proxying to external special remotes. No serialization is done on the internal P2P protocol of course. When a ByteString is being exchanged, it may or may not be exactly the length indicated by DATA. While that has to be carefully managed for the serialized P2P protocol, here it would require buffering the whole lazy bytestring in memory to check its length when sending, so it's better to do length checks on the receiving side.	2024-06-28 11:22:29 -04:00
Joey Hess	41a0817188	make extendcluster also updatecluster This avoids the user forgetting to do it and simplifies the documentation.	2024-06-27 15:34:45 -04:00
Joey Hess	cf59d7f92c	GET and CHECKPRESENT amoung lowest cost cluster nodes Before it was using a node that might have had a higher cost. Also threw in a random selection from amoung the low cost nodes. Of course this is a poor excuse for load balancing, but it's better than nothing. Most of the time...	2024-06-27 14:36:55 -04:00
Joey Hess	3dad9446ce	distributed cluster cycle prevention Added BYPASS to P2P protocol, and use it to avoid cycling between cluster gateways. Distributed clusters are working well now!	2024-06-27 12:20:22 -04:00
Joey Hess	923953c9fe	fix cycle prevention code	2024-06-26 13:21:51 -04:00
Joey Hess	07e899c9d3	git-annex-shell: proxy nodes located beyond remote cluster gateways Walking a tightrope between security and convenience here, because git-annex-shell needs to only proxy for things when there has been an explicit, local action to configure them. In this case, the user has to have run `git-annex extendcluster`, which now sets annex-cluster-gateway on the remote. Note that any repositories that the gateway is recorded to proxy for will be proxied onward. This is not limited to cluster nodes, because checking the node log would not add any security; someone could add any uuid to it. The gateway of course then does its own checking to determine if it will allow proxying for the remote.	2024-06-26 12:56:16 -04:00
Joey Hess	1ec2fecf3f	set up proxies for cluster nodes that are themselves proxied via a remote When there are multiple gateways to a cluster, this sets up proxying for nodes that are accessed via a remote gateway. Eg, when running in nyc and amsterdam is the remote gateway, and it has node1 and node2, this sets up proxying for amsterdam-node1 and amsterdam-node2. A client that has nyc as a remote will see proxied remotes nyc-amsterdam-node1 and nyc-amsterdam-node2.	2024-06-26 11:24:55 -04:00
Joey Hess	02bf3ddc3f	updatecluster: support multiple gateways Just look at the existing proxied remotes that correspond to already existing nodes of the cluster, and keep those nodes in the cluster. While adding any remotes of the local repo that are configured as cluster nodes. This allows removing cluster nodes from the local repo and updating, without it also removing nodes provided by other gateways.	2024-06-26 10:51:14 -04:00
Joey Hess	0b72b85df5	added git-annex extendcluster This works, but updatecluster does not work yet in multi-gateway clusters, nor do gateways relay to other gateways.	2024-06-26 10:26:54 -04:00
Joey Hess	cec2848e8a	support annex.jobs for clusters	2024-06-25 14:54:20 -04:00
Joey Hess	b8016eeb65	add annex-proxied This makes git-annex sync and similar not treat proxied remotes as git syncable remotes. Also, display in git-annex info remote when the remote is proxied.	2024-06-24 10:16:59 -04:00
Joey Hess	bf6b309917	remove attempt to avoid git syncing with instantiate proxied remotes It didn't work. Actually, sync was skipping those remotes due to a bug.	2024-06-24 09:35:24 -04:00
Joey Hess	d0aec8f623	always check numcopies when moving from cluster When the destination does not start with a copy, the cluster has one or more copies. If more, dropping would reduce the number of copies, so numcopies must be checked. Considered checking how many nodes of the cluster contain a copy. If only 1 node does, it could allow a move without checking numcopies. The problem with that, though, is that other nodes of the cluster could have copies that we don't know about. And dropping from a cluster tries to drop from all nodes, so will drop even from those. So any drop from a cluster can remove more than 1 copy.	2024-06-23 12:00:50 -04:00

1 2 3 4 5 ...

2892 commits