git-annex

Author	SHA1	Message	Date
Joey Hess	3cc48279ad	more thoughts on clusters	2024-06-13 06:41:42 -04:00
Joey Hess	555d7e52d3	more thoughts on clusters	2024-06-12 17:30:55 -04:00
Joey Hess	0ebb107974	update	2024-06-12 15:21:23 -04:00
Joey Hess	46a1fcb3ea	avoid git syncing with instantiate proxied remotes These remotes have no url configured, so git pull and push will fail. git-annex sync --content etc can still sync with them otherwise. Also, avoid git syncing twice with the same url. This is for cases where a proxied remote has been manually configured and so does have a url. Or perhaps proxied remotes will get configured like that automatically later.	2024-06-12 15:10:03 -04:00
Joey Hess	a986a20034	designing clusters	2024-06-12 14:57:26 -04:00
Joey Hess	e70e3473b3	on cycles	2024-06-12 13:52:17 -04:00
Joey Hess	0ffb0a4d25	dash is legal in git remote names	2024-06-12 13:24:31 -04:00
Joey Hess	e224b99f36	whitespace	2024-06-12 13:24:25 -04:00
Joey Hess	5b668f9ef1	add missing spaces	2024-06-12 13:06:14 -04:00
Joey Hess	44464e4410	update	2024-06-12 12:37:14 -04:00
Joey Hess	67d1e2a459	updates	2024-06-12 12:02:25 -04:00
Joey Hess	2e76a4744f	inherit remote.name.annex-bare Since a proxied remote uses the proxy's git repo, this makes sense. Although I don't think this config is ever used when accessing a remote via git-annex-shell.	2024-06-12 11:53:28 -04:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	c855b50f04		2024-06-12 15:42:42 +00:00
Joey Hess	dfdda95053	proxy updates location tracking information This does mean a redundant write to the git-annex branch. But, it means that two clients can be using the same proxy, and after one sends a file to a proxied remote, the other only has to pull from the proxy to learn about that. It does not need to pull from every remote behind the proxy (which it couldn't do anyway as git repo access is not currently proxied). Anyway, the overhead of this in git-annex branch writes is no worse than eg, sending a file to a repository where git-annex assistant is running, which then sends the file on to a remote, and updates the git-annex branch then. Indeed, when the assistant also drops the local copy, that results in more writes to the git-annex branch.	2024-06-12 11:37:14 -04:00
Joey Hess	96853cd833	finish P2P protocol proxying CONNECT is not supported by git-annex-shell p2pstdio, but for proxying to tor-annex remotes, it will be supported, and will make a git pull/push to a proxied remote work the same with that as it does over ssh, eg it accesses the proxy's git repo not the proxied remote's git repo. The p2p protocol docs say that NOTIFYCHANGES is not always supported, and it looked annoying to implement it for this, and it also seems pretty useless, so make it be a protocol error. git-annex remotedaemon will already be getting change notifications from the proxy's git repo, so there's no need to get additional redundant change notifications for proxied remotes that would be for changes to the same git repo.	2024-06-12 10:40:51 -04:00
Joey Hess	f98605bce7	a local git remote cannot proxy Prevent listProxied from listing anything when the proxy remote's url is a local directory. Proxying does not work in that situation, because the proxied remotes have the same url, and so git-annex-shell is not run when accessing them, instead the proxy remote is accessed directly. I don't think there is any good way to support this. Even if the instantiated git repos for the proxied remotes somehow used an url that caused it to use git-annex-shell to access them, planned features like `git-annex copy --to proxy` accepting a key and sending it on to nodes behind the proxy would not work, since git-annex-shell is not used to access the proxy. So it would need to use something to access the proxy that causes git-annex-shell to be run and speaks P2P protocol over it. And we have that. It's a ssh connection to localhost. Of course, it would be possible to take ssh out of that mix, and swap in something that does not have encryption overhead and authentication complications, but otherwise behaves the same as ssh. And if the user wants to do that, GIT_SSH does exist.	2024-06-12 10:16:04 -04:00
Joey Hess	c6e0710281	proxying to local git remotes works This just happened to work correctly. Rather surprisingly. It turns out that openP2PSshConnection actually also supports local git remotes, by just running git-annex-shell with the path to the remote. Renamed "P2PSsh" to "P2PShell" to make this clear.	2024-06-12 10:10:11 -04:00
Joey Hess	178da0dc99	Merge branch 'master' into proxy	2024-06-12 09:49:30 -04:00
Joey Hess	345494e3b4	expanding on the exporttree=yes design	2024-06-12 09:43:59 -04:00
yarikoptic	c6f2a5d372	TODO for log --key	2024-06-12 13:20:29 +00:00
Joey Hess	6e1df33960	minimized code duplication due to type checker limitations	2024-06-11 17:16:49 -04:00
Joey Hess	5beaffb412	proxying PUT now working The almost identical code duplication between relayDATA and relayDATA' is very annoying. I tried quite a few things to parameterize them, but the type checker is having fits when I try it.	2024-06-11 16:56:52 -04:00
Joey Hess	ed4fda098b	todo	2024-06-11 15:15:58 -04:00
Joey Hess	a2f4a8eddf	proxying GET now working Memory use is small and constant; receiveBytes returns a lazy bytestring and it does stream. Comparing speed of a get of a 500 mb file over proxy from origin-origin, vs from the same remote over a direct ssh: joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from origin-origin get bigfile (from origin-origin...) ok (recording state in git...) 1.89user 0.67system 0:10.79elapsed 23%CPU (0avgtext+0avgdata 68716maxresident)k 0inputs+984320outputs (0major+10779minor)pagefaults 0swaps joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from direct-ssh get bigfile (from direct-ssh...) ok 1.79user 0.63system 0:10.49elapsed 23%CPU (0avgtext+0avgdata 65776maxresident)k 0inputs+1024312outputs (0major+9773minor)pagefaults 0swaps So the proxy doesn't add much overhead even when run on the same machine as the client and remote. Still, piping receiveBytes into sendBytes like this does suggest that the proxy could be made to use less CPU resouces by using `sendfile()`.	2024-06-11 15:09:43 -04:00
Joey Hess	09b5e53f49	set annex.uuid in proxy's Repo getRepoUUID looks at that, and was seeing the annex.uuid of the proxy. Which caused it to unncessarily set the git config. Probably also would have led to other problems.	2024-06-11 13:40:50 -04:00
yarikoptic	b96ff82871	Added a comment	2024-06-11 17:36:51 +00:00
Joey Hess	657a91527a	update	2024-06-11 13:22:03 -04:00
Joey Hess	dd429ba8fe	Merge branch 'master' of ssh://git-annex.branchable.com	2024-06-11 13:08:45 -04:00
Joey Hess	5bb7f8cd64	Merge branch 'master' into proxy	2024-06-11 13:08:23 -04:00
Joey Hess	d2e3c5c89f	update	2024-06-11 13:07:53 -04:00
Joey Hess	60e63fb85b	enable proxying for git-annex-shell p2pstdio	2024-06-11 13:07:04 -04:00
Joey Hess	58d8ba5a4f	implement simple proxy actions (untested) Still need to implement GET and PUT, and will implement CONNECT and NOTIFYCHANGE for completeness. All ServerMode checking is implemented for the proxy. There are two possible approaches for how the proxy sends back messages from the remote to the client. One would be to have a background thread that reads messages and sends them back as they come in. The other, which is being implemented so far, is to read messages from the remote at points where it is expected to send them, and relay back to the client before reading the next message from the client. At this point, I'm unsure which approach would be better. The need for proxynoresponse to be used by UNLOCKCONTENT, for example, builds protocol knowledge into the proxy which it would not need with the other method.	2024-06-11 12:56:20 -04:00
Joey Hess	373ae49c87	factor out helper functions These will be used by the proxy, which needs to check the ServerMode in the same way.	2024-06-11 12:04:58 -04:00
Joey Hess	92c83a417f	refactoring	2024-06-11 10:22:05 -04:00
NewUser	124c1313bb		2024-06-11 13:31:01 +00:00
Joey Hess	501d65eeab	started implementing git-annex-shell proxy So far, it negotiates VERSION with both parties. This is a tricky dance. Untested.	2024-06-10 18:01:36 -04:00
Joey Hess	7b1548dbfa	correct AUTH-SUCCESS and AUTH-FAILURE It's AUTH_SUCCESS internally in git-annex, but the line based serialization uses AUTH-SUCCESS.	2024-06-10 15:06:27 -04:00
Joey Hess	317786d219	remove dead code	2024-06-10 14:28:58 -04:00
Joey Hess	649b87bedd	Merge branch 'master' into proxy	2024-06-10 14:26:18 -04:00
Joey Hess	9a8391078a	git-annex-shell: block relay requests connRepo is only used when relaying git upload-pack and receive-pack. That's only supposed to be used when git-annex-remotedaemon is serving git-remote-tor-annex connections over tor. But, it was always set, and so could be used in other places possibly. Fixed by making connRepo optional in the P2P protocol interface. In Command.EnableTor, it's not needed, because it only speaks the protocol in order to check that it's able to connect back to itself via the hidden service. So changed that to pass Nothing rather than the git repo. In Remote.Helper.Ssh, it's connecting to git-annex-shell p2pstdio, so is making the requests, so will never need connRepo. In git-annex-shell p2pstdio, it was accepting git upload-pack and receive-pack requests over the P2P protocol, even though nothing sent them. This is arguably a security hole, particularly if the user has set environment variables like GIT_ANNEX_SHELL_LIMITED to prevent git push/pull via git-annex-shell.	2024-06-10 14:16:27 -04:00
Joey Hess	d2576e5f1a	git-annex-shell: accept uuid of remote that proxying is enabled for For NotifyChanges and also for the fallthrough case where git-annex-shell passes a command off to git-shell, proxying is currently ignored. So every remote that is accessed via a proxy will be treated as the same git repository. Every other command listed in cmdsMap will need to check if Annex.proxyremote is set, and if so handle the proxying appropriately. Probably only P2PStdio will need to support proxying. For now, everything else refuses to work when proxying. The part of that I don't like is that there's the possibility a command later gets added to the list that doesn't check proxying. When proxying is not enabled, it's important that git-annex-shell not leak information that it would not have exposed before. Such as the names or uuids of remotes. I decided that, in the case where a repository used to have proxying enabled, but no longer supports any proxies, it's ok to give the user a clear error message indicating that proxying is not configured, rather than a confusing uuid mismatch message. Similarly, if a repository has proxying enabled, but not for the requested repository, give a clear error message. A tricky thing here is how to handle the case where there is more than one remote, with proxying enabled, with the specified uuid. One way to handle that would be to plumb the proxyRemoteName all the way through from the remote git-annex to git-annex-shell, eg as a field, and use only a remote with the same name. That would be very intrusive though. Instead, I decided to let the proxy pick which remote it uses to access a given Remote. And so it picks the least expensive one. The client after all doesn't necessarily know any details about the proxy's configuration. This does mean though, that if the least expensive remote is not accessible, but another remote would have worked, an access via the proxy will fail.	2024-06-10 12:44:35 -04:00
Joey Hess	783eb8879a	notes on behavior	2024-06-10 11:07:04 -04:00
jlueters@79a910340cdff27611c6a650c108afbe2f61c5f6	daa2c6cce1		2024-06-10 14:24:34 +00:00
Joey Hess	b1cc8c6837	Merge branch 'master' of ssh://git-annex.branchable.com	2024-06-07 16:52:04 -04:00
Joey Hess	25a6ab6f11	Avoid grafting in export tree objects that are missing They could be missing due to an interrupted git-annex at just the wrong time during a prior graft, after which the tree objects got garbage collected. Or they could be missing because of manual messing with the git-annex branch, eg resetting it to back before the graft commit. Sponsored-by: Dartmouth College's OpenNeuro project	2024-06-07 16:51:50 -04:00
emilymaers	3947a51cc8	removed	2024-06-07 20:46:34 +00:00
emilymaers	4fbfc5e5ac	Added a comment: blockchain	2024-06-07 20:46:20 +00:00
Joey Hess	b32c4c2e98	atomic git-annex branch update when regrafting in transition Fix a bug where interrupting git-annex while it is updating the git-annex branch could lead to git fsck complaining about missing tree objects. Interrupting git-annex while regraftexports is running in a transition that is forgetting git-annex branch history would leave the repository with a git-annex branch that did not contain the tree shas listed in export.log. That lets those trees be garbage collected. A subsequent run of the same transition then regrafts the trees listed in export.log into the git-annex branch. But those trees have been lost. Note that both sides of `if neednewlocalbranch` are atomic now. I had thought only the True side needed to be, but I do think there may be cases where the False side needs to be as well. Sponsored-by: Dartmouth College's OpenNeuro project	2024-06-07 16:34:10 -04:00
Joey Hess	f5532be954	graft in exported tree before updating the export log It was possible for the export.log to get written and then git-annex was interrupted, before it could graft in the exported tree. Which could result in export.log referencing a tree that got garbage collected.	2024-06-07 15:25:02 -04:00
Joey Hess	6568ba4904	Merge branch 'master' into proxy	2024-06-07 12:35:47 -04:00

... 2 3 4 5 6 ...

45,045 commits