implementation plan
This commit is contained in:
parent
6375e3be3b
commit
3df70c5c0c
2 changed files with 73 additions and 2 deletions
|
@ -87,6 +87,27 @@ to store data when eg, all the repositories that is knows about are full.
|
|||
Just getting the git-annex back in sync should recover from either
|
||||
situation.
|
||||
|
||||
> This seems like the clear winner.
|
||||
|
||||
## UUID discovery security
|
||||
|
||||
Are there any security concerns with adding UUID discovery?
|
||||
|
||||
Suppose that repository A claims to be a proxy for repository B, but it's
|
||||
not connected to B, and is actually evil. Then git-annex would instantiate
|
||||
a remote A-B with the UUID of B. If files were sent to A-B, git-annex would
|
||||
consider them present on B, and not send them to B by other remotes.
|
||||
|
||||
Well, in this situation, A wrote to the git-annex branch (or used a P2P
|
||||
protocol extension) in order to pose as B. Without a proxy feature A could
|
||||
just as well falsify location logs to claim that B contains things it did
|
||||
not. Also, without a proxy feature, A could set its UUID to be the same as
|
||||
B, and so trick us into sending files to it rather than B.
|
||||
|
||||
The only real difference seems to be that the UUID of a remote is cached,
|
||||
so A could only do this the first time we accessed it, and not later.
|
||||
With UUID discovery, A can do that at any time.
|
||||
|
||||
## user interface
|
||||
|
||||
What to name the instantiated remotes? Probably the best that could
|
||||
|
@ -129,7 +150,7 @@ A command like `git-annex push` would see all the instantiated remotes and
|
|||
would pick one to send content to. Seems like the proxy might choose to
|
||||
`storeKey` the content on other node(s) than the requested one. Which would
|
||||
be fine. But, `git-annex push` would still do considerable extra work in
|
||||
interating over all the instantiated remotes. So it might be better to make
|
||||
iterating over all the instantiated remotes. So it might be better to make
|
||||
such commands not operate on instantiated remotes for sending content but
|
||||
only on the proxy.
|
||||
|
||||
|
@ -192,7 +213,7 @@ There's potentially a layering problem here, because exactly how encryption
|
|||
What if repo A is a proxy and has repo B as a remote. Meanwhile, repo B is
|
||||
a proxy and has repo A as a remote?
|
||||
|
||||
An upload to repo A will start by checkin if repo B wants the content and if so,
|
||||
An upload to repo A will start by checking if repo B wants the content and if so,
|
||||
start an upload to repo B. Then the same happens on repo B, leading it to
|
||||
start an upload to repo A.
|
||||
|
||||
|
|
|
@ -19,3 +19,53 @@ Planned schedule of work:
|
|||
* October: proving behavior of balanced preferred content with proxies
|
||||
|
||||
[[!tag projects/openneuro]]
|
||||
|
||||
# work notes
|
||||
|
||||
For June's work on [[design/passthrough_proxy]], implementation plan:
|
||||
|
||||
1. UUID discovery via git-annex branch. Add a log file listing UUIDs
|
||||
accessible via proxy UUIDs. It also will contain the names
|
||||
of the remotes that the proxy is a proxy for,
|
||||
from the perspective of the proxy.
|
||||
|
||||
Note that remote names coming from the git-annex branch need to be
|
||||
limited to what's legal in git remote names.
|
||||
This will also prevent remote names being a security hazard
|
||||
via eg escape characters.
|
||||
|
||||
1. Add a command that is run on the proxy to update the proxy log file.
|
||||
This is how the user sets it up as a proxy, and selects the remotes its
|
||||
proxying for.
|
||||
|
||||
2. Remote instantiation for proxies. When a remote "foo" is a proxy,
|
||||
and has a remote "bar", instantiate a remote "foo-bar" that has the UUID
|
||||
of bar but is of the same type and configuration of remote "foo".
|
||||
|
||||
3. Implement proxying in git-annex-shell so connections with the UUID
|
||||
of one of the proxy's
|
||||
|
||||
4. Let `storeKey` return a list of UUIDs where content was stored,
|
||||
and make proxies accept uploads directed at them, rather than a specific
|
||||
instantiated remote, and fan out the upload to whatever nodes behind
|
||||
the proxy want it. This will need P2P protocol extensions.
|
||||
|
||||
5. Make `git-annex copy --from $proxy` pick a node that contains each
|
||||
file, and use the instantiated remote for getting the file. Same for
|
||||
similar commands.
|
||||
|
||||
6. Make `git-annex drop --from $proxy` drop, when possible, from every
|
||||
remote accessible by the proxy. Communicate partial drops somehow.
|
||||
|
||||
7. Make commands like `git-annex push` not iterate over instantiate
|
||||
remotes, and instead just send content to the proxy for fanout.
|
||||
|
||||
8. Optimise proxy speed. See design for idea.
|
||||
|
||||
9. Encryption and chunking. See design for issues.
|
||||
|
||||
10. Cycle prevention. See design.
|
||||
|
||||
11. indirect uploads (to be considered). See design.
|
||||
|
||||
|
||||
|
|
Loading…
Reference in a new issue