implementation plan

This commit is contained in:
Joey Hess 2024-06-04 07:51:33 -04:00
parent 6375e3be3b
commit 3df70c5c0c
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 73 additions and 2 deletions

View file

@ -87,6 +87,27 @@ to store data when eg, all the repositories that is knows about are full.
Just getting the git-annex back in sync should recover from either
situation.
> This seems like the clear winner.
## UUID discovery security
Are there any security concerns with adding UUID discovery?
Suppose that repository A claims to be a proxy for repository B, but it's
not connected to B, and is actually evil. Then git-annex would instantiate
a remote A-B with the UUID of B. If files were sent to A-B, git-annex would
consider them present on B, and not send them to B by other remotes.
Well, in this situation, A wrote to the git-annex branch (or used a P2P
protocol extension) in order to pose as B. Without a proxy feature A could
just as well falsify location logs to claim that B contains things it did
not. Also, without a proxy feature, A could set its UUID to be the same as
B, and so trick us into sending files to it rather than B.
The only real difference seems to be that the UUID of a remote is cached,
so A could only do this the first time we accessed it, and not later.
With UUID discovery, A can do that at any time.
## user interface
What to name the instantiated remotes? Probably the best that could
@ -129,7 +150,7 @@ A command like `git-annex push` would see all the instantiated remotes and
would pick one to send content to. Seems like the proxy might choose to
`storeKey` the content on other node(s) than the requested one. Which would
be fine. But, `git-annex push` would still do considerable extra work in
interating over all the instantiated remotes. So it might be better to make
iterating over all the instantiated remotes. So it might be better to make
such commands not operate on instantiated remotes for sending content but
only on the proxy.
@ -192,7 +213,7 @@ There's potentially a layering problem here, because exactly how encryption
What if repo A is a proxy and has repo B as a remote. Meanwhile, repo B is
a proxy and has repo A as a remote?
An upload to repo A will start by checkin if repo B wants the content and if so,
An upload to repo A will start by checking if repo B wants the content and if so,
start an upload to repo B. Then the same happens on repo B, leading it to
start an upload to repo A.

View file

@ -19,3 +19,53 @@ Planned schedule of work:
* October: proving behavior of balanced preferred content with proxies
[[!tag projects/openneuro]]
# work notes
For June's work on [[design/passthrough_proxy]], implementation plan:
1. UUID discovery via git-annex branch. Add a log file listing UUIDs
accessible via proxy UUIDs. It also will contain the names
of the remotes that the proxy is a proxy for,
from the perspective of the proxy.
Note that remote names coming from the git-annex branch need to be
limited to what's legal in git remote names.
This will also prevent remote names being a security hazard
via eg escape characters.
1. Add a command that is run on the proxy to update the proxy log file.
This is how the user sets it up as a proxy, and selects the remotes its
proxying for.
2. Remote instantiation for proxies. When a remote "foo" is a proxy,
and has a remote "bar", instantiate a remote "foo-bar" that has the UUID
of bar but is of the same type and configuration of remote "foo".
3. Implement proxying in git-annex-shell so connections with the UUID
of one of the proxy's
4. Let `storeKey` return a list of UUIDs where content was stored,
and make proxies accept uploads directed at them, rather than a specific
instantiated remote, and fan out the upload to whatever nodes behind
the proxy want it. This will need P2P protocol extensions.
5. Make `git-annex copy --from $proxy` pick a node that contains each
file, and use the instantiated remote for getting the file. Same for
similar commands.
6. Make `git-annex drop --from $proxy` drop, when possible, from every
remote accessible by the proxy. Communicate partial drops somehow.
7. Make commands like `git-annex push` not iterate over instantiate
remotes, and instead just send content to the proxy for fanout.
8. Optimise proxy speed. See design for idea.
9. Encryption and chunking. See design for issues.
10. Cycle prevention. See design.
11. indirect uploads (to be considered). See design.