From 3df70c5c0cb304359c3c5541458eeb7a39daab92 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 4 Jun 2024 07:51:33 -0400 Subject: [PATCH] implementation plan --- doc/design/passthrough_proxy.mdwn | 25 ++++++++++++++-- doc/todo/git-annex_proxies.mdwn | 50 +++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+), 2 deletions(-) diff --git a/doc/design/passthrough_proxy.mdwn b/doc/design/passthrough_proxy.mdwn index 36ffbc1bd4..e943363369 100644 --- a/doc/design/passthrough_proxy.mdwn +++ b/doc/design/passthrough_proxy.mdwn @@ -87,6 +87,27 @@ to store data when eg, all the repositories that is knows about are full. Just getting the git-annex back in sync should recover from either situation. +> This seems like the clear winner. + +## UUID discovery security + +Are there any security concerns with adding UUID discovery? + +Suppose that repository A claims to be a proxy for repository B, but it's +not connected to B, and is actually evil. Then git-annex would instantiate +a remote A-B with the UUID of B. If files were sent to A-B, git-annex would +consider them present on B, and not send them to B by other remotes. + +Well, in this situation, A wrote to the git-annex branch (or used a P2P +protocol extension) in order to pose as B. Without a proxy feature A could +just as well falsify location logs to claim that B contains things it did +not. Also, without a proxy feature, A could set its UUID to be the same as +B, and so trick us into sending files to it rather than B. + +The only real difference seems to be that the UUID of a remote is cached, +so A could only do this the first time we accessed it, and not later. +With UUID discovery, A can do that at any time. + ## user interface What to name the instantiated remotes? Probably the best that could @@ -129,7 +150,7 @@ A command like `git-annex push` would see all the instantiated remotes and would pick one to send content to. Seems like the proxy might choose to `storeKey` the content on other node(s) than the requested one. Which would be fine. But, `git-annex push` would still do considerable extra work in -interating over all the instantiated remotes. So it might be better to make +iterating over all the instantiated remotes. So it might be better to make such commands not operate on instantiated remotes for sending content but only on the proxy. @@ -192,7 +213,7 @@ There's potentially a layering problem here, because exactly how encryption What if repo A is a proxy and has repo B as a remote. Meanwhile, repo B is a proxy and has repo A as a remote? -An upload to repo A will start by checkin if repo B wants the content and if so, +An upload to repo A will start by checking if repo B wants the content and if so, start an upload to repo B. Then the same happens on repo B, leading it to start an upload to repo A. diff --git a/doc/todo/git-annex_proxies.mdwn b/doc/todo/git-annex_proxies.mdwn index 9c41bdb15a..4ae8bc1f39 100644 --- a/doc/todo/git-annex_proxies.mdwn +++ b/doc/todo/git-annex_proxies.mdwn @@ -19,3 +19,53 @@ Planned schedule of work: * October: proving behavior of balanced preferred content with proxies [[!tag projects/openneuro]] + +# work notes + +For June's work on [[design/passthrough_proxy]], implementation plan: + +1. UUID discovery via git-annex branch. Add a log file listing UUIDs + accessible via proxy UUIDs. It also will contain the names + of the remotes that the proxy is a proxy for, + from the perspective of the proxy. + + Note that remote names coming from the git-annex branch need to be + limited to what's legal in git remote names. + This will also prevent remote names being a security hazard + via eg escape characters. + +1. Add a command that is run on the proxy to update the proxy log file. + This is how the user sets it up as a proxy, and selects the remotes its + proxying for. + +2. Remote instantiation for proxies. When a remote "foo" is a proxy, + and has a remote "bar", instantiate a remote "foo-bar" that has the UUID + of bar but is of the same type and configuration of remote "foo". + +3. Implement proxying in git-annex-shell so connections with the UUID + of one of the proxy's + +4. Let `storeKey` return a list of UUIDs where content was stored, + and make proxies accept uploads directed at them, rather than a specific + instantiated remote, and fan out the upload to whatever nodes behind + the proxy want it. This will need P2P protocol extensions. + +5. Make `git-annex copy --from $proxy` pick a node that contains each + file, and use the instantiated remote for getting the file. Same for + similar commands. + +6. Make `git-annex drop --from $proxy` drop, when possible, from every + remote accessible by the proxy. Communicate partial drops somehow. + +7. Make commands like `git-annex push` not iterate over instantiate + remotes, and instead just send content to the proxy for fanout. + +8. Optimise proxy speed. See design for idea. + +9. Encryption and chunking. See design for issues. + +10. Cycle prevention. See design. + +11. indirect uploads (to be considered). See design. + +