git-annex

Author	SHA1	Message	Date
Joey Hess	ef8f24f28c	fix PUT to http proxied special remote It was hanging because it never sent FAILURE in the INVALID case. And putoffset always triggers the INVALID case.	2024-07-28 09:14:42 -04:00
Joey Hess	1c0448e33c	update	2024-07-26 20:44:01 -04:00
Joey Hess	0fb86d2916	UNLOCKCONTENT is not a top-level request proxyRequest was treating UNLOCKCONTENT as a separate request. That made it possible for there to be two different connections to the proxied remote, with LOCKCONTENT being sent to one, and UNLOCKCONTENT to the other one. A protocol error. git-annex testremote now passes against a http proxied remote.	2024-07-26 20:39:06 -04:00
Joey Hess	a3dab58be2	fix hang at end of PUT to proxied p2p http remote sendExactly will now be sure to evaluate the whole lazy ByteString. In this case, the lazy ByteString was exactly the right lenth. But, it seems that L.take caused it to not actually be fully evaluated. In servePut, this manifested as gather never being fully evaluated, which caused the hang. Very, very subtle, and horrible bug. Clearly the use of lazy ByteString (or really just laziness) is at fault, and it would be very worth moving to conduit or whatever to avoid this.	2024-07-26 19:50:15 -04:00
Joey Hess	b431201e1f	update	2024-07-26 17:15:09 -04:00
Joey Hess	d1faa13d6a	implement proxy connection pool removeOldestProxyConnectionPool will be innefficient the larger the pool is. A better data structure could be more efficient. Eg, make each value in the pool include the timestamp of its oldest element, then the oldest value can be found and modified, rather than rebuilding the whole Map. But, for pools of a few hundred items, this should be fine. It's O(n*n log n) or so. Also, when more than 1 connection with the same pool key exists, it's efficient even for larger pools, since removeOldestProxyConnectionPool is not needed. The default of 1 idle connection could perhaps be larger.. like the number of jobs? Otoh, it seems good to ramp up and down the number of connections, which does happen. With 1, there is at most one stale connection, which might cause a request to fail.	2024-07-26 17:03:31 -04:00
Joey Hess	ad025b8e5e	clean up protocol version for proxying The proxy always checks the protocol version of a remote before talking to it in a version-specific way, so the protocol version in the ProxyParams is the client's protocol version. The remote will always be at the same or an older protocol version than the client. Note that in relayDATAFinish, when the client is at protocol version 0, the remote must thus be as well, and that's why its version is not checked in the case for that. With that clarified, it's evident that, in P2P.Http.State, there's no need to look at the proxied remote's protocol version at all.	2024-07-26 13:49:05 -04:00
Joey Hess	576ec6ed71	fix hang in GET from http p2p proxy serverP2PConnection = proxyfromclientconn causes serveGet to signalFullyConsumedByteString to it, which is what it's waiting for	2024-07-26 12:51:00 -04:00
Joey Hess	f052091558	update	2024-07-26 11:01:45 -04:00
Joey Hess	cc1da2d516	http p2p proxy is now largely working	2024-07-26 10:44:10 -04:00
Joey Hess	96ad0ccc5b	wip	2024-07-25 15:39:57 -04:00
Joey Hess	b13c2407af	p2phttp drop supports checking proof timestamps At this point the p2phttp implementation is fully complete!	2024-07-25 10:11:09 -04:00
Joey Hess	f5624a69e3	expire lock after 10 minutes initially Once keeplocked is called, the lock will expire at the end of that call. But if keeplocked never gets called, this avoids the lock persisting forever.	2024-07-24 14:25:40 -04:00
Joey Hess	97836aafba	Remote.Git lockContent works with annex+http urls	2024-07-24 13:42:57 -04:00
Joey Hess	9fa9678585	Remote.Git removeKey works with annex+http urls Does not yet handle drop proof lock timestamp checking.	2024-07-24 12:33:26 -04:00
Joey Hess	fd3bdb2300	update	2024-07-24 12:19:53 -04:00
Joey Hess	0d81d1ee2f	update	2024-07-24 12:18:51 -04:00
Joey Hess	cfdb80cd05	progress meter for p2phttp storeKey	2024-07-24 12:14:56 -04:00
Joey Hess	0280e2dd5e	update	2024-07-24 11:13:37 -04:00
Joey Hess	10f2c23fd7	fix slowloris timeout in hashing resume of download of large file Hash the data that is already present in the file before connecting to the http server.	2024-07-24 11:03:59 -04:00
Joey Hess	7bd616e169	Remote.Git retrieveKeyFile works with annex+http urls This includes a bugfix to serveGet, it hung at the end.	2024-07-24 10:28:44 -04:00
Joey Hess	b4d749cc91	Merge branch 'master' into httpproto	2024-07-23 21:17:06 -04:00
Joey Hess	48657405c6	cache credentials for p2phttp in memory	2024-07-23 18:45:02 -04:00
Joey Hess	b89c784a9b	use git credential when p2phttp needs auth	2024-07-23 18:11:15 -04:00
Joey Hess	73ffb58456	p2phttp support https	2024-07-23 15:37:36 -04:00
Joey Hess	b7454f1eeb	protocol version fallback on 404 and prettified errors	2024-07-23 14:58:49 -04:00
Joey Hess	75b1d50b99	add remoteAnnexP2PHttpUrl to RemoveGitConfig This is always parsed, when building without servant, a Baseurl is not generated, and users of it will need to fail.	2024-07-23 09:57:01 -04:00
Joey Hess	758cff0fde	update	2024-07-22 20:59:45 -04:00
Joey Hess	9984252ab5	P2P protocol is finalized	2024-07-22 19:50:08 -04:00
Joey Hess	e979e85bff	make serveKeepLocked check auth just to be safe	2024-07-22 19:15:52 -04:00
Joey Hess	f5dd7a8bc0	implemented serveLockContent (untested)	2024-07-22 17:38:42 -04:00
Joey Hess	b697c6b9da	fix TMVar left full crash affecting servePutOffset Problem is that whatever is reading from the TMVar may not have read from it yet before the client writes the next thing to it.	2024-07-22 15:48:46 -04:00
Joey Hess	3069e28dd8	implemented servePutOffset and clientPutOffset But, it's buggy: the server hangs without processing the VALIDITY, and I can't seem to work out why. As far as I can see, storefile is getting as far as running the validitycheck, which is supposed to read that, but never does. This is especially strange because what seems like the same protocol doesn't hang when servePut runs it. This made me think that it needed to use inAnnexWorker to be more like servePut, but that didn't help. Another small problem with this is that it does create an empty .git/annex/tmp/ file for the key. Since this will usually be used in combination with servePut, that doesn't seem worth worrying about much.	2024-07-22 15:04:10 -04:00
Joey Hess	b240a11b79	clientPut seeking to offset	2024-07-22 12:50:21 -04:00
Joey Hess	a01426b713	avoid padding in servePut This means that when the client sends a truncated data to indicate invalidity, DATA is not passed the full expected data. That leaves the P2P connection in a state where it cannot be reused. While so far, they are not reused, they will be later when proxies are supported. So, have to close the P2P connection in this situation.	2024-07-22 12:30:30 -04:00
Joey Hess	efa0efdc44	avoid padding in clientPut Instead truncate when necessary to indicate invalid content was sent. Very similar to how serveGet handles it.	2024-07-22 11:47:24 -04:00
Joey Hess	72d0769ca5	avoid padding content in serveGet Always truncate instead. The padding risked something not noticing the content was bad and getting a file that was corrupted in a novel way with the padding "X" at the end. A truncated file is better.	2024-07-22 11:19:52 -04:00
Joey Hess	4826a3745d	servePut and clientPut implementation Made the data-length header required even for v0. This simplifies the implementation, and doesn't preclude extra verification being done for v0. The connectionWaitVar is an ugly hack. In servePut, nothing waits on the waitvar, and I could not find a good way to make anything wait on it.	2024-07-22 10:27:44 -04:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	1ee30b29ee	Added a comment	2024-07-21 12:38:12 +00:00
nobodyinperson	b920655acd	Added a comment: Also Serveo.net	2024-07-19 15:21:19 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	2878343354		2024-07-19 12:12:56 +00:00
mih	5bc00a55dd		2024-07-16 15:02:46 +00:00
Joey Hess	97a2d0e4fb	use worker pool in withLocalP2PConnections This allows multiple clients to be handled at the same time.	2024-07-11 14:37:52 -04:00
Joey Hess	14e0f778b7	simplify	2024-07-11 11:50:44 -04:00
Joey Hess	2228d56db3	serveGet invalidation	2024-07-11 11:42:32 -04:00
Joey Hess	3b37b9e53f	fix serveGet hang This came down to SendBytes waiting on the waitv. Nothing ever filled it. Only Annex.Proxy needs the waitv, and it handles filling it. So make it optional.	2024-07-11 07:46:52 -04:00
benjamin.poldrack@d09ccff6d42dd20277610b59867cf7462927b8e3	a82a573f75		2024-07-11 07:47:27 +00:00
benjamin.poldrack@d09ccff6d42dd20277610b59867cf7462927b8e3	9ce207532e		2024-07-11 07:23:30 +00:00
Joey Hess	8cb1332407	update	2024-07-10 16:10:08 -04:00
Joey Hess	b5b3d8cde2	update	2024-07-09 14:30:50 -04:00
Joey Hess	751b8e0baf	implemented serveCheckPresent Still need a way to run Proto though	2024-07-09 09:08:42 -04:00
Joey Hess	3f402a20a8	implement Locker	2024-07-08 21:00:10 -04:00
Joey Hess	838169ee86	status	2024-07-07 16:16:11 -04:00
Joey Hess	40306d3fcf	finalizing HTTP P2p protocol some more Added v2-v0 endpoints. These are tedious, but will be needed in order to use the HTTP protocol to proxy to repositories with older git-annex, where git-annex-shell will be speaking an older version of the protocol. Changed GET to use 422 when the content is not present. 404 is needed to detect when a protocol version is not supported.	2024-07-05 15:34:58 -04:00
Joey Hess	0bfdc57d25	update	2024-07-04 15:18:06 -04:00
Joey Hess	f452bd448a	REMOVE-BEFORE and GETTIMESTAMP proxying For clusters, the timestamps have to be translated, since each node can have its own idea about what time it is. To translate a timestamp, the proxy remembers what time it asked the node for a timestamp in GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE. This does mean that a remove from a cluster has to call GETTIMESTAMP on every node before dropping from nodes. Not very efficient. Although currently it tries to drop from every single node anyway, which is also not very efficient. I thought about caching the GETTIMESTAMP from the nodes on the first call. That would improve efficiency. But, since monotonic clocks on !Linux don't advance when the computer is suspended, consider what might happen if one node was suspended for a while, then came back. Its monotonic timestamp would end up behind where the proxying expects it to be. Would that result in removing when it shouldn't, or refusing to remove when it should? Have not thought it through. Either way, a cluster behaving strangly for an extended period of time because one of its nodes was briefly asleep doesn't seem like good behavior.	2024-07-04 15:09:34 -04:00
Joey Hess	99b7a0cfe9	use REMOVE-BEFORE in P2P protocol Only clusters still need to be fixed to close this todo.	2024-07-04 13:47:38 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	f69661ab65	status	2024-07-03 17:04:12 -04:00
Joey Hess	44b3136fdf	update	2024-07-03 15:53:25 -04:00
Joey Hess	6a95eb08ce	status	2024-07-03 15:01:34 -04:00
Joey Hess	badcb502a4	todo	2024-07-03 13:15:09 -04:00
Joey Hess	24d63e8c8e	update	2024-07-02 18:04:29 -04:00
Joey Hess	b2a24a1669	update	2024-07-02 16:16:37 -04:00
Joey Hess	fbc4d549f3	reorder	2024-07-01 11:44:54 -04:00
Joey Hess	8db30323b0	update	2024-07-01 11:38:29 -04:00
Joey Hess	1e1584d34b	toc	2024-07-01 11:37:12 -04:00
Joey Hess	d9e66f7754	update	2024-07-01 11:33:07 -04:00
Joey Hess	f58a5f577d	update	2024-07-01 11:29:04 -04:00
Joey Hess	fa5e7463eb	fix display when proxied GET yields ERROR The error message is not displayed to the use, but this mirrors the behavior when a regular get from a special remote fails. At least now there is not a protocol error.	2024-07-01 11:19:02 -04:00
Joey Hess	dce3848ad8	avoid populating proxy's object file when storing on special remote Now that storeKey can have a different object file passed to it, this complication is not needed. This avoids a lot of strange situations, and will also be needed if streaming is eventually supported.	2024-07-01 10:53:49 -04:00
Joey Hess	0dfdc9f951	dup stdio handles for P2P proxy Special remotes might output to stdout, or read from stdin, which would mess up the P2P protocol. So dup the handles to avoid any such problem.	2024-07-01 10:06:29 -04:00
Joey Hess	0e19c1c9fa	todo	2024-06-28 17:14:18 -04:00
Joey Hess	711a5166e2	PUT to proxied special remote working Still needs some work. The reason that the waitv is necessary is because without it, runNet loops back around and reads the next protocol message. But it's not finished reading the whole bytestring yet, and so it reads some part of it.	2024-06-28 17:10:58 -04:00
Joey Hess	2e5af38f86	GET from proxied special remote Working, but lots of room for improvement... Without streaming, so there is a delay before download begins as the file is retreived from the special remote. And when resuming it retrieves the whole file from the special remote again. Also, if the special remote throws an exception, currently it shows as "protocol error".	2024-06-28 15:44:48 -04:00
Joey Hess	5b1971e2f8	merged the proxy branch into master!	2024-06-27 15:44:11 -04:00
Joey Hess	c3f88923c0	Merge branch 'proxy'	2024-06-27 15:43:45 -04:00
Joey Hess	85f4527d74	update	2024-06-27 15:28:10 -04:00
Joey Hess	20ef1262df	give proxied cluster nodes a higher cost than the cluster gateway This makes eg git-annex get default to using the cluster rather than an arbitrary node, which is better UI. The actual cost of accessing a proxied node vs using the cluster is basically the same. But using the cluster allows smarter load-balancing to be done on the cluster.	2024-06-27 15:21:03 -04:00
Joey Hess	cf59d7f92c	GET and CHECKPRESENT amoung lowest cost cluster nodes Before it was using a node that might have had a higher cost. Also threw in a random selection from amoung the low cost nodes. Of course this is a poor excuse for load balancing, but it's better than nothing. Most of the time...	2024-06-27 14:36:55 -04:00
Joey Hess	dceb8dc776	update	2024-06-27 13:40:09 -04:00
Joey Hess	c9d63d74d8	remove viconfig item it works when run on a client that has the cluster gateway as a remote, just not when on the cluster gateway	2024-06-27 13:34:24 -04:00
Joey Hess	87a7eeac33	document various multi-gateway cluster considerations Perhaps this will avoid me needing to eg, implement spanning tree protocol. ;-)	2024-06-27 13:33:19 -04:00
Joey Hess	8e322f76bc	updates	2024-06-27 12:57:08 -04:00
Joey Hess	3dad9446ce	distributed cluster cycle prevention Added BYPASS to P2P protocol, and use it to avoid cycling between cluster gateways. Distributed clusters are working well now!	2024-06-27 12:20:22 -04:00
Joey Hess	effaf51b1f	avoid loop between cluster gateways The VIA extension is still needed to avoid some extra work and ugly messages, but this is enough that it actually works. This filters out the RemoteSides that are a proxied connection via a remote gateway to the cluster. The VIA extension will not filter those out, but will send VIA to them on connect, which will cause the ones that are accessed via the listed gateways to be filtered out.	2024-06-26 15:29:59 -04:00
Joey Hess	4172109c8d	support multi-gateway clusters VIA extension still needed otherwise a copy to a cluster can loop forever.	2024-06-26 15:07:03 -04:00
Joey Hess	07e899c9d3	git-annex-shell: proxy nodes located beyond remote cluster gateways Walking a tightrope between security and convenience here, because git-annex-shell needs to only proxy for things when there has been an explicit, local action to configure them. In this case, the user has to have run `git-annex extendcluster`, which now sets annex-cluster-gateway on the remote. Note that any repositories that the gateway is recorded to proxy for will be proxied onward. This is not limited to cluster nodes, because checking the node log would not add any security; someone could add any uuid to it. The gateway of course then does its own checking to determine if it will allow proxying for the remote.	2024-06-26 12:56:16 -04:00
Joey Hess	798d6f6a46	todo	2024-06-25 17:58:45 -04:00
Joey Hess	e3dd29409b	improve docs	2024-06-25 17:50:22 -04:00
Joey Hess	0a1001dbfb	update	2024-06-25 17:26:26 -04:00
Joey Hess	9a8dcb58cd	design for distributed clusters	2024-06-25 17:20:49 -04:00
Joey Hess	b9889917a3	thoughts on cycles Rejected the idea of automatically instantiating remotes for proxies-of-proxies. That needs cycle protection, while the current behavior, which happened for free, is that running git-annex updateproxy on the proxy can be used to configure it, but only for topologies that actually exist.	2024-06-25 15:32:11 -04:00
Joey Hess	cec2848e8a	support annex.jobs for clusters	2024-06-25 14:54:20 -04:00
Joey Hess	5ede109ae5	gave up on upload fanout to cluster's proxy The problem with that idea is that the cluster's proxy is necessarily a remote, and necessarily one that we'll want to sync with, since the git repository is stored there. So when its preferred content wants a file, and the cluster does too, the file will get uploaded to it as well as to the cluster. With fanout, the upload to the cluster will populate the proxy as well, avoiding a second upload. But only if the file is sent to the cluster first. If it's sent to the proxy first, there will be two uploads. Another, lesser problem is that a repository can proxy for more than one cluster. So when does it make sense to drop content from the repository? It could be done when dropping from one cluster, but what of the other one? This complication was not necessary anyway. Instead, if it's desirable to have some content accessed from close to the proxy, one of the cluster nodes can just be put on the same filesystem as it. That will be just as fast as storing the content on the proxy.	2024-06-25 13:35:12 -04:00
Joey Hess	1bfe7f8a53	honor preferred content settings of cluster nodes Except when no nodes want a file, it has to be stored somewhere, so store it on all. Which is not really desirable, but neither is having to pick one. ProtoAssociatedFile deserialization is rather broken, and this could possibly affect preferred content expressions that match on filenames. The inability to roundtrip whitespace like tabs and newlines through is not a problem because preferred content expressions can't be written that match on whitespace such as a tab. For example: joey@darkstar:~/tmp/bench/z>git-annex wanted origin-node2 'exclude=CTRL-VTab' wanted origin-node2 git-annex: Parse error: Parse failure: near "*" But, the filtering of control characters could perhaps be a problem. I think that filtering is now obsolete, git-annex has comprehensive filtering of control characters when displaying filenames, that happens at a higher level. However, I don't want to risk a security hole so am leaving in that filtering in ProtoAssociatedFile deserialization for now.	2024-06-25 11:43:09 -04:00
Joey Hess	202ea3ff2a	don't sync with cluster nodes by default Avoid `git-annex sync --content` etc from operating on cluster nodes by default since syncing with a cluster implicitly syncs with its nodes. This avoids a lot of unncessary work when a cluster has a lot of nodes just in checking if each node's preferred content is satisfied. And it avoids content being sent to nodes individually, so instead syncing with clusters always fanout uploads to nodes. The downside is that there are situations where a cluster's preferred content settings can be met, but those of its nodes are not. Or where a node does not contain a key, but the cluster does, and there are not enough copies of the key yet, so it would be desirable the send it there. I think that's an acceptable tradeoff. These kind of situations are ones where the cluster itself should probably be responsible for copying content to the node. Which it can do much less expensively than a client can. Part of the balanced preferred content design that I will be working on in a couple of months involves rebalancing clusters, so I expect to revisit this. The use of annex-sync config does allow running git-annex sync with a specific node, or nodes, and it will sync with it. And it's also possible to set annex-sync git configs to make it sync with a node by default. (Although that will require setting up an explicit git remote for the node rather than relying on the proxied remote.) Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster due to a cycle. And the Annex.Startup load of clusters happens too late for Remote.Git to use that. This does mean one redundant load of the cluster log, though only when there is a proxy.	2024-06-25 10:24:38 -04:00
Joey Hess	b8016eeb65	add annex-proxied This makes git-annex sync and similar not treat proxied remotes as git syncable remotes. Also, display in git-annex info remote when the remote is proxied.	2024-06-24 10:16:59 -04:00
Joey Hess	0c111fc96a	fix git-annex sync --content with proxied remotes Loading the remote list a second time was removing all proxied remotes. That happened because setting up the proxied remote added some config fields to the in-memory git config, and on the second load, it saw those configs and decided not to overwrite them with the proxy. Now on the second load, that still happens. But now, the proxied git configs are used to generate a remote same as if those configs were all set. The reason that didn't happen before was twofold, the gitremotes cache was not dropped, and the remote's url field was not set correctly. The problem with the remote's url field is that while it was marked as proxy inherited, all other proxy inherited fields are annex- configs. And the code to inherit didn't work for the url field. Now it all works, but git-annex sync is left running git push/pull on the proxied remote, which doesn't work. That still needs to be fixed.	2024-06-24 09:45:51 -04:00
Joey Hess	60413a2557	update	2024-06-23 16:38:01 -04:00
Joey Hess	5d8bdac38e	upload fanout resume seems free of fenceposts Tested it with small chunk sizes (like 2) and resumes that were eg 1 byte from the end of the file or beginning of file. Also, git-annex testremote passes now against a cluster!	2024-06-23 16:22:39 -04:00
Joey Hess	9e070470f4	update	2024-06-23 12:48:22 -04:00
Joey Hess	3cd7969823	update	2024-06-23 12:31:00 -04:00
Joey Hess	d0aec8f623	always check numcopies when moving from cluster When the destination does not start with a copy, the cluster has one or more copies. If more, dropping would reduce the number of copies, so numcopies must be checked. Considered checking how many nodes of the cluster contain a copy. If only 1 node does, it could allow a move without checking numcopies. The problem with that, though, is that other nodes of the cluster could have copies that we don't know about. And dropping from a cluster tries to drop from all nodes, so will drop even from those. So any drop from a cluster can remove more than 1 copy.	2024-06-23 12:00:50 -04:00
Joey Hess	ec5b6454f4	todo	2024-06-23 10:09:35 -04:00
Joey Hess	2762f9c4ce	fix location log update for copy to 1-node cluster	2024-06-23 09:53:33 -04:00
Joey Hess	5b332a87be	dropping from clusters Dropping from a cluster drops from every node of the cluster. Including nodes that the cluster does not think have the content. This is different from GET and CHECKPRESENT, which do trust the cluster's location log. The difference is that removing from a cluster should make 100% the content is gone from every node. So doing extra work is ok. Compare with CHECKPRESENT where checking every node could make it very expensive, and the worst that can happen in a false negative is extra work being done. Extended the P2P protocol with FAILURE-PLUS to handle the case where a drop from one node succeeds, but a drop from another node fails. In that case the entire cluster drop has failed. Note that SUCCESS-PLUS is returned when dropping from a proxied remote that is not a cluster, when the protocol version supports it. This is because P2P.Proxy does not know when it's proxying for a single node cluster vs for a remote that is not a cluster.	2024-06-23 09:43:40 -04:00
Joey Hess	7bbd822a17	avoid using cluster nodes in drop proof when dropping from cluster This is obviously necessary in order for dropping from a cluster to be able to drop from all nodes. It also avoids violating numcopies when a cluster node is a special remote. If it were used in the drop proof, nothing would prevent the cluster from dropping from it.	2024-06-23 06:20:11 -04:00
Joey Hess	5a4b4b59b9	update	2024-06-23 05:26:45 -04:00
nobodyinperson	724eb8a369	Suggest that 'git annex unused' reports total unused size	2024-06-21 16:30:09 +00:00
Joey Hess	53674e8abb	Merge branch 'master' into proxy	2024-06-20 11:20:26 -04:00
Joey Hess	53598e5154	merge from proxy branch	2024-06-20 11:20:16 -04:00
Joey Hess	032d3902d8	wording	2024-06-20 10:15:24 -04:00
joris	b35be4b656	Added a comment	2024-06-20 09:58:05 +00:00
Joey Hess	097ef9979c	towards a design for proxying to special remotes	2024-06-19 06:15:03 -04:00
Joey Hess	f18740699e	P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS is complete, when a PUT stores to additional repositories than the expected on, the location log is updated with the additional UUIDs that contain the content. Started implementing PUT fanout to multiple remotes for clusters. It is untested, and I fear fencepost errors in the relative offset calculations. And it is missing proxying for the protocol after DATA.	2024-06-18 16:21:40 -04:00
Joey Hess	f049156a03	checkpresent support for clusters This assumes that the proxy for a cluster has up-to-date location logs. If it didn't, it might proxy the checkpresent to a node that no longer has the content, while some other node still does, and so it would incorrectly appear that the cluster no longer contains the content. Since cluster UUIDs are not stored to location logs, git-annex fsck --fast when claiming to fix a location log when that occurred would not cause any problems. And presumably the location tracking would later get sorted out. At least usually, changes to the content of nodes goes via the proxy, and it will update its location logs, so they will be accurate. However, if there were multiple proxies to the same cluster, or nodes were accessed directly (or via proxy to the node and not the cluster), the proxy's location log could certainly be wrong. (The location log access for GET has the same issues.)	2024-06-18 11:16:16 -04:00
Joey Hess	88d9a02f7c	initial, working support for getting from clusters Currently tends to put all the load on a single node, which will need to be improved.	2024-06-18 11:01:10 -04:00
Joey Hess	8290f70978	update	2024-06-18 10:08:15 -04:00
Joey Hess	e2fd2ee2bd	update	2024-06-17 09:31:44 -04:00
Joey Hess	3970bbb03b	Merge branch 'master' into proxy	2024-06-17 09:29:34 -04:00
Joey Hess	64afbb0b93	don't count clusters as copies, continued Handled limitCopies, as well as everything using fromNumCopies and fromMinCopies. This should be everything, probably. Note that, git-annex info displays a count of repositories, which still includes cluster. I think that's ok. It would be possible to filter out clusters there, but to the user they're pretty much just another repository. The numcopies displayed by eg `git-annex info .` does not include clusters.	2024-06-16 15:14:53 -04:00
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00
Joey Hess	b3370a191c	insert cluster UUIDs when loading location logs, and omit when saving Inline isClusterUUID for speed.	2024-06-14 18:06:28 -04:00
Joey Hess	846903e9bb	update todo list for this month whew that's gonna be a lot	2024-06-14 15:23:43 -04:00
Joey Hess	d8daabe9ec	Merge branch 'master' of ssh://git-annex.branchable.com	2024-06-13 06:44:22 -04:00
Joey Hess	22a329c57e	copied over some changes from proxy branch	2024-06-13 06:43:59 -04:00
Joey Hess	555d7e52d3	more thoughts on clusters	2024-06-12 17:30:55 -04:00
Joey Hess	0ebb107974	update	2024-06-12 15:21:23 -04:00
Joey Hess	46a1fcb3ea	avoid git syncing with instantiate proxied remotes These remotes have no url configured, so git pull and push will fail. git-annex sync --content etc can still sync with them otherwise. Also, avoid git syncing twice with the same url. This is for cases where a proxied remote has been manually configured and so does have a url. Or perhaps proxied remotes will get configured like that automatically later.	2024-06-12 15:10:03 -04:00
Joey Hess	a986a20034	designing clusters	2024-06-12 14:57:26 -04:00
Joey Hess	e70e3473b3	on cycles	2024-06-12 13:52:17 -04:00
Joey Hess	44464e4410	update	2024-06-12 12:37:14 -04:00
Joey Hess	67d1e2a459	updates	2024-06-12 12:02:25 -04:00
Joey Hess	dfdda95053	proxy updates location tracking information This does mean a redundant write to the git-annex branch. But, it means that two clients can be using the same proxy, and after one sends a file to a proxied remote, the other only has to pull from the proxy to learn about that. It does not need to pull from every remote behind the proxy (which it couldn't do anyway as git repo access is not currently proxied). Anyway, the overhead of this in git-annex branch writes is no worse than eg, sending a file to a repository where git-annex assistant is running, which then sends the file on to a remote, and updates the git-annex branch then. Indeed, when the assistant also drops the local copy, that results in more writes to the git-annex branch.	2024-06-12 11:37:14 -04:00
Joey Hess	96853cd833	finish P2P protocol proxying CONNECT is not supported by git-annex-shell p2pstdio, but for proxying to tor-annex remotes, it will be supported, and will make a git pull/push to a proxied remote work the same with that as it does over ssh, eg it accesses the proxy's git repo not the proxied remote's git repo. The p2p protocol docs say that NOTIFYCHANGES is not always supported, and it looked annoying to implement it for this, and it also seems pretty useless, so make it be a protocol error. git-annex remotedaemon will already be getting change notifications from the proxy's git repo, so there's no need to get additional redundant change notifications for proxied remotes that would be for changes to the same git repo.	2024-06-12 10:40:51 -04:00
Joey Hess	f98605bce7	a local git remote cannot proxy Prevent listProxied from listing anything when the proxy remote's url is a local directory. Proxying does not work in that situation, because the proxied remotes have the same url, and so git-annex-shell is not run when accessing them, instead the proxy remote is accessed directly. I don't think there is any good way to support this. Even if the instantiated git repos for the proxied remotes somehow used an url that caused it to use git-annex-shell to access them, planned features like `git-annex copy --to proxy` accepting a key and sending it on to nodes behind the proxy would not work, since git-annex-shell is not used to access the proxy. So it would need to use something to access the proxy that causes git-annex-shell to be run and speaks P2P protocol over it. And we have that. It's a ssh connection to localhost. Of course, it would be possible to take ssh out of that mix, and swap in something that does not have encryption overhead and authentication complications, but otherwise behaves the same as ssh. And if the user wants to do that, GIT_SSH does exist.	2024-06-12 10:16:04 -04:00
Joey Hess	c6e0710281	proxying to local git remotes works This just happened to work correctly. Rather surprisingly. It turns out that openP2PSshConnection actually also supports local git remotes, by just running git-annex-shell with the path to the remote. Renamed "P2PSsh" to "P2PShell" to make this clear.	2024-06-12 10:10:11 -04:00
yarikoptic	c6f2a5d372	TODO for log --key	2024-06-12 13:20:29 +00:00
Joey Hess	5beaffb412	proxying PUT now working The almost identical code duplication between relayDATA and relayDATA' is very annoying. I tried quite a few things to parameterize them, but the type checker is having fits when I try it.	2024-06-11 16:56:52 -04:00
Joey Hess	ed4fda098b	todo	2024-06-11 15:15:58 -04:00
Joey Hess	a2f4a8eddf	proxying GET now working Memory use is small and constant; receiveBytes returns a lazy bytestring and it does stream. Comparing speed of a get of a 500 mb file over proxy from origin-origin, vs from the same remote over a direct ssh: joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from origin-origin get bigfile (from origin-origin...) ok (recording state in git...) 1.89user 0.67system 0:10.79elapsed 23%CPU (0avgtext+0avgdata 68716maxresident)k 0inputs+984320outputs (0major+10779minor)pagefaults 0swaps joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from direct-ssh get bigfile (from direct-ssh...) ok 1.79user 0.63system 0:10.49elapsed 23%CPU (0avgtext+0avgdata 65776maxresident)k 0inputs+1024312outputs (0major+9773minor)pagefaults 0swaps So the proxy doesn't add much overhead even when run on the same machine as the client and remote. Still, piping receiveBytes into sendBytes like this does suggest that the proxy could be made to use less CPU resouces by using `sendfile()`.	2024-06-11 15:09:43 -04:00
Joey Hess	09b5e53f49	set annex.uuid in proxy's Repo getRepoUUID looks at that, and was seeing the annex.uuid of the proxy. Which caused it to unncessarily set the git config. Probably also would have led to other problems.	2024-06-11 13:40:50 -04:00
Joey Hess	657a91527a	update	2024-06-11 13:22:03 -04:00
Joey Hess	d2e3c5c89f	update	2024-06-11 13:07:53 -04:00
Joey Hess	43ff697f25	update status and design work on proxy encryption and chunking	2024-06-07 12:35:04 -04:00
Joey Hess	058726ee86	next step identified	2024-06-06 18:06:45 -04:00
Joey Hess	d59383beaf	update	2024-06-06 17:25:22 -04:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd	ca687413ef	Added a comment	2024-06-05 16:53:51 +00:00
Joey Hess	1761e971ee	status update after day 1 of new project	2024-06-04 14:55:54 -04:00
Joey Hess	3df70c5c0c	implementation plan	2024-06-04 07:51:33 -04:00
Joey Hess	6375e3be3b	recieved funding to work on this, which comes with a schedule	2024-06-04 06:53:59 -04:00
Joey Hess	5992e1729a	fixed by git release	2024-06-04 06:39:08 -04:00
Joey Hess	adf17f5038	Merge branch 'master' of ssh://git-annex.branchable.com	2024-05-30 13:26:44 -04:00
Joey Hess	0155abfba4	git-remote-annex: Support urls like annex::https://example.com/foo-repo Using the usual url download machinery even allows these urls to need http basic auth, which is prompted for with git-credential. Which opens the possibility for urls that contain a secret to be used, eg the cipher for encryption=shared. Although the user is currently on their own constructing such an url, I do think it would work. Limited to httpalso for now, for security reasons. Since both httpalso (and retrieving this very url) is limited by the usual annex.security.allowed-ip-addresses configs, it's not possible for an attacker to make one of these urls that sets up a httpalso url that opens the garage door. Which is one class of attacks to keep in mind with this thing. It seems that there could be either a git-config that allows other types of special remotes to be set up this way, or special remotes could indicate when they are safe. I do worry that the git-config would encourage users to set it without thinking through the security implications. One remote config might be safe to access this way, but another config, for one with the same type, might not be. This will need further thought, and real-world examples to decide what to do.	2024-05-30 12:24:16 -04:00
yarikoptic	d23ae92da8	Added a comment	2024-05-30 14:34:32 +00:00
yarikoptic	285a7ff3c3	Added a comment	2024-05-30 14:29:43 +00:00
Joey Hess	3f33616068	security	2024-05-29 22:55:06 -04:00
Joey Hess	efa684ab8a	todo	2024-05-29 18:21:17 -04:00
Joey Hess	09a0552489	split off todo, comment	2024-05-29 13:16:36 -04:00
Joey Hess	e19916f54b	add config-uuid to annex:: url for --sameas remotes And use it to set annex-config-uuid in git config. This makes using the origin special remote work after cloning. Without the added Logs.Remote.configSet, instantiating the remote will look at the annex-config-uuid's config in the remote log, which will be empty, and so it will fail to find a special remote. The added deletion of files in the alternatejournaldir is just to make 100% sure they don't get committed to the git-annex branch. Now that they contain things that definitely should not be committed.	2024-05-29 12:50:00 -04:00
Joey Hess	bbf49c9de7	httpalso just worked, with one small issue to fix	2024-05-28 16:26:16 -04:00
Joey Hess	cb9f7b5646	update	2024-05-28 12:50:54 -04:00
Joey Hess	14443fd307	update	2024-05-28 12:46:56 -04:00
Joey Hess	e19f56e7d8	Merge branch 'master' of ssh://git-annex.branchable.com	2024-05-28 10:27:50 -04:00
Joey Hess	c6669990fb	update	2024-05-28 09:19:00 -04:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	bab6d3e58f	Added a comment: Re: worktree provisioning	2024-05-28 12:06:39 +00:00
Joey Hess	c2483f6e6d	update	2024-05-27 22:44:35 -04:00
Joey Hess	0975e792ea	git-remote-annex: Fix error display on clone cleanupInitialization gets run when an exception is thrown, so needs to avoid throwing exceptions itself, as that would hide the error message that the user needs to see.	2024-05-27 13:28:05 -04:00
Joey Hess	a766475d14	split out a todo	2024-05-27 12:50:46 -04:00
Joey Hess	e64add7cdf	git-remote-annex: support importrree=yes remotes When exporttree=yes is also set. Probably it would also be possible to support ones with only importtree=yes, by enabling exporttree=yes for the remote only when using git-remote-annex, but let's keep this simple... I'm not sure what gets recorded in .git/annex/ state differently in the two cases that might cause a problem when doing that. Note that the full annex:: urls generated and displayed for such a remote omit the importree=yes. Which is ok, cloning from such an url uses an exporttree=remote, but the git-annex branch doesn't get written by this program, so once the real config is available from the git-annex branch, it will still function as an importree=yes remote.	2024-05-27 12:35:42 -04:00
Joey Hess	19418e81ee	git-remote-annex: Display full url when using remote with the shorthand url	2024-05-24 17:15:31 -04:00
Joey Hess	04a256a0f8	work around git "defense in depth" breakage with git clone checking for hooks This git bug also broke git-lfs, and I am confident it will be reverted in the next release. For now, cloning from an annex:: url wastes some bandwidth on the next pull by not caching bundles locally. If git doesn't fix this in the next version, I'd be tempted to rethink whether bundle objects need to be cached locally. It would be possible to instead remember which bundles have been seen and their heads, and respond to the list command with the heads, and avoid unbundling them agian in fetch. This might even be a useful performance improvement in the latter case. It would be quite a complication to a currently simple implementation though.	2024-05-24 15:49:53 -04:00
Joey Hess	6ccd09298b	convert srcref to a sha This fixes pushing a new ref that is the same as something already pushed. In findotherprereq, it compares two shas, which didn't work when one is actually not a sha but a ref. This is one of those cases where Sha being an alias for Ref makes it hard to catch mistakes. One of these days those need to be differentiated at the type level, but not today..	2024-05-24 15:33:35 -04:00
Joey Hess	96c66a7ca9	bug	2024-05-24 15:15:42 -04:00
Joey Hess	58301e40d2	sync with special remotes with an annex:: url Check explicitly for an annex:: url, not just any url. While no built-in special remotes set an url, except ones that can be synced with, it seems possible that some external special remote sets an url for its own use, but did not expect it to be used by git-annex sync et al. The assistant also syncs with them.	2024-05-24 14:57:29 -04:00
Joey Hess	22bf23782f	initremote, enableremote: Added --with-url to enable using git-remote-annex Also sets remote.name.fetch to a typical value, same as git remote add does.	2024-05-24 14:29:36 -04:00
Joey Hess	7d61a99da3	todo	2024-05-24 13:57:33 -04:00
Joey Hess	2670508b97	also broke git-remote-annex	2024-05-24 13:35:45 -04:00
Joey Hess	b792b128a0	verified checkprereq The case documented in its comment worked in a test push and clone.	2024-05-24 13:06:29 -04:00
Joey Hess	1a3c60cc8e	git-remote-annex: avoid bundle object leakage in push race or interrupted push Locally record the manifest before uploading it or any bundles, and read it on the next push. Any bundles from the push that are not included in the currently being pushed manifest will get added to the outManifest, and so eventually get deleted. This deals with an interrupted push that is not resumed and instead something else is pushed. And it deals with a push race that overwrites the manifest. Of course, this can't help if one of those situations is followed by the local repo being deleted. But that's equivilant to doing a git-annex copy of a new annexed file to a special remote and then deleting the special repo w/o pushing. In either case the special remote ends up with a object in it that git-annex doesn't know about.	2024-05-24 12:47:32 -04:00
Joey Hess	264c51b4f4	comment	2024-05-22 06:06:18 -04:00
Joey Hess	4131e31f5c	PATH_MAX	2024-05-22 04:26:36 -04:00
Joey Hess	5fb307f1c5	comment	2024-05-21 17:47:55 -04:00
Joey Hess	938e714a11	bleh	2024-05-21 17:32:49 -04:00
Joey Hess	10a60183e1	guard pushEmpty	2024-05-21 12:12:44 -04:00
Joey Hess	14c79373c4	update	2024-05-21 12:05:44 -04:00
Joey Hess	b3d7ae51f0	fix edge case where git-annex branch does not have config for enabled special remote One way this could happen is cloning an empty special remote. A later fetch would then fail.	2024-05-21 11:27:49 -04:00
Joey Hess	3e7324bbcb	only delete bundles on pushEmpty This avoids some apparently otherwise unsolveable problems involving races that resulted in the manifest listing bundles that were deleted. Removed the annex-max-git-bundles config because it can't actually result in deleting old bundles. It would still be possible to have a config that controls how often to do a full push, which would avoid needing to download too many bundles on clone, as well as needing to checkpresent too many bundles in verifyManifest. But it would need a different name and description.	2024-05-21 11:13:27 -04:00
Joey Hess	f544946b09	update	2024-05-21 10:20:30 -04:00
Joey Hess	b042dfeb0e	emptying pushes only delete	2024-05-21 09:52:35 -04:00
Joey Hess	5d40759470	formalize problem description	2024-05-21 09:35:46 -04:00
Joey Hess	3a38520aac	avoid interrupted push leaving remote without a manifest Added a backup manifest key, which is used if the main manifest key is not present. When uploading a new Manifest, it makes sure that it never drops one key except when the other key is present. It's entirely possible for the two manifest keys to get out of sync, due to races. The main one wins when it's present, it is possible for the main one being dropped to expose the backup one, which has a different push recorded.	2024-05-20 15:41:09 -04:00
Joey Hess	594ca2fd3a	update	2024-05-20 14:52:06 -04:00
Joey Hess	34a6db4f15	improve recovery from interrupted push On push, first try to drop all outManifest keys listed in the current manifest file, which resumes from an interrupted push that didn't get a chance to delete those keys. The new manifest gets its outManifest populated with the keys that were in the old manifest, plus any of the keys that were unable to be dropped. Note that it would be possible for uploadManifest to skip dropping old keys at all. The old keys would get dropped on the next push. But it seems better to delete stuff immediately rather than waiting. And the extra work is limited to push and typically is small. A remote where dropKey always fails will result in an outManifest that grows longer and longer. It would be possible to check if the remote has appendonly = True and avoid populating the outManifest. Of course, an appendonly remote will grow with every git push anyway. And currently only Remote.GitLFS sets that, which can't be used as a git-remote-annex remote anyway.	2024-05-20 13:49:45 -04:00
Joey Hess	ce60211881	add incremental vs full push race to todo with plan to deal with it	2024-05-16 09:37:28 -04:00
Joey Hess	b1b6e35d4c	reorg todo	2024-05-15 17:41:55 -04:00
Joey Hess	adcebbae47	clean up git-remote-annex git-annex branch handling Implemented alternateJournal, which git-remote-annex uses to avoid any writes to the git-annex branch while setting up a special remote from an annex:: url. That prevents the remote.log from being overwritten with the special remote configuration from the url, which might not be 100% the same as the existing special remote configuration. And it prevents an overwrite deleting of other stuff that was already in the remote.log. Also, when the branch was created by git-remote-annex, only delete it at the end if nothing else has been written to it by another command. This fixes the race condition described in `797f27ab05`, where git-remote-annex set up the branch and git-annex init and other commands were run at the same time and their writes to the branch were lost.	2024-05-15 17:33:38 -04:00
Joey Hess	d24d8870c5	todo	2024-05-15 14:33:13 -04:00
Joey Hess	2dfffa0621	bugfix When pushing branch foo, we don't want to delete other tracking branches. In particular, a full push needs all the tracking branches.	2024-05-14 16:17:27 -04:00

... 2 3 4 5 6 ...

4821 commits