additional design work on proxies
Sponsored-by: Dartmouth College's OpenNeuro project
This commit is contained in:
parent
a612fe7299
commit
9cdbcedc37
1 changed files with 53 additions and 4 deletions
|
@ -47,8 +47,8 @@ the cluster.
|
||||||
Could the P2P protocol be extended to let the proxy communicate the UUIDs
|
Could the P2P protocol be extended to let the proxy communicate the UUIDs
|
||||||
of all the repositories behind it?
|
of all the repositories behind it?
|
||||||
|
|
||||||
Once the client git-annex knows the set of UUIDs behind the proxy, it can
|
Once the client git-annex knows the set of UUIDs behind the proxy, it could
|
||||||
instantiate a remote object per uuid, each of which accesses the proxy, but
|
eg instantiate a remote object per UUID, each of which accesses the proxy, but
|
||||||
with a different UUID.
|
with a different UUID.
|
||||||
|
|
||||||
But, git-annx usually only does UUID discovery the first time a ssh remote
|
But, git-annx usually only does UUID discovery the first time a ssh remote
|
||||||
|
@ -64,8 +64,7 @@ git-annex branch?
|
||||||
|
|
||||||
With this approach, git-annex would know as soon as it sees the proxy's
|
With this approach, git-annex would know as soon as it sees the proxy's
|
||||||
UUID that this is a proxy for this other set of UUIDS. (Unless its
|
UUID that this is a proxy for this other set of UUIDS. (Unless its
|
||||||
git-annex branch is not up-to-date.) And then it can instantiate a UUID for
|
git-annex branch is not up-to-date.)
|
||||||
each remote.
|
|
||||||
|
|
||||||
One difficulty with this is that, when the git-annex branch is not up to
|
One difficulty with this is that, when the git-annex branch is not up to
|
||||||
date with changes from the proxy, git-annex may try to access repositories
|
date with changes from the proxy, git-annex may try to access repositories
|
||||||
|
@ -76,6 +75,56 @@ to store data when eg, all the repositories that is knows about are full.
|
||||||
Just getting the git-annex back in sync should recover from either
|
Just getting the git-annex back in sync should recover from either
|
||||||
situation.
|
situation.
|
||||||
|
|
||||||
|
## user interface
|
||||||
|
|
||||||
|
What to name the instantiated remotes? Probably the best that could
|
||||||
|
be done is to use the proxy's own remote names as suffixes on the client.
|
||||||
|
Eg, the proxy's "node1" remote is "proxy-node1".
|
||||||
|
|
||||||
|
But the user probably doesn't want to pick which node to send content to.
|
||||||
|
They don't necessarily know anything about the nodes. Ideally the user
|
||||||
|
would `git-annex copy --to proxy` or `git-annex push` and let it pick
|
||||||
|
which instantiated remote(s) to send to.
|
||||||
|
|
||||||
|
To make `git-annex copy --to proxy` work, `storeKey` could be changed to
|
||||||
|
allow returning a UUID (or UUIDs) where the content was actually stored.
|
||||||
|
That would also allow a single upload to the proxy to fan out and be stored
|
||||||
|
in multiple nodes. The proxy would use preferred content to pick which of
|
||||||
|
its nodes to store on.
|
||||||
|
|
||||||
|
Instantiated remotes would still be needed for `git-annex get` and similar
|
||||||
|
to work.
|
||||||
|
|
||||||
|
To make `git-annex copy --from proxy` work, the proxy would need to pick
|
||||||
|
a node and stream content from it. That's doable, but how to handle a case
|
||||||
|
where a node gets corrupted? The best it could do is mark that node as no
|
||||||
|
longer containing the content (as if a fsck failed) and try another one
|
||||||
|
next time. This complication might not be necessary. Consider that
|
||||||
|
while `git-annex copy --to foo` followed later by `git-annex copy --from foo`
|
||||||
|
will usually work, it doesn't work when eg first copying to a transfer
|
||||||
|
remote, which then sends the content elsewhere and drops its copy.
|
||||||
|
|
||||||
|
What about dropping? `git-annex drop --from proxy` could be made to work,
|
||||||
|
by having `removeKey` return a list of UUIDs that the content was dropped
|
||||||
|
from. What should that do if it's able to drop from some nodes but not
|
||||||
|
others? Perhaps it would need to be able to return a list of UUIDs that
|
||||||
|
content was dropped from but still indicate it overall failed to drop.
|
||||||
|
(Note that it's entirely possible that dropping from one node of the proxy
|
||||||
|
involves lockContent on another node of the proxy in order to satisfy
|
||||||
|
numcopies.)
|
||||||
|
|
||||||
|
A command like `git-annex push` would see all the instantiated remotes and
|
||||||
|
would pick one to send content to. Seems like the proxy might choose to
|
||||||
|
`storeKey` the content on other node(s) than the requested one. Which would
|
||||||
|
be fine. But, `git-annex push` would still do considerable extra work in
|
||||||
|
interating over all the instantiated remotes. So it might be better to make
|
||||||
|
such commands not operate on instantiated remotes for sending content but
|
||||||
|
only on the proxy.
|
||||||
|
|
||||||
|
Commands like `git-annex push` and `git-annex pull`
|
||||||
|
should also skip the instantiated remotes when pushing or pulling the git
|
||||||
|
repo, because that would be extra work that accomplishes nothing.
|
||||||
|
|
||||||
## streaming to special remotes
|
## streaming to special remotes
|
||||||
|
|
||||||
As well as being an intermediary to git-annex repositories, the proxy could
|
As well as being an intermediary to git-annex repositories, the proxy could
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue