only use a remote as a node when git configuration is set
Avoids someone writing to cluster.log and nominating remotes of someone else's repository as a cluster.
This commit is contained in:
parent
f049156a03
commit
fb0fd78485
2 changed files with 59 additions and 37 deletions
|
@ -10,6 +10,7 @@
|
||||||
module Annex.Cluster where
|
module Annex.Cluster where
|
||||||
|
|
||||||
import Annex.Common
|
import Annex.Common
|
||||||
|
import qualified Annex
|
||||||
import Types.Cluster
|
import Types.Cluster
|
||||||
import Logs.Cluster
|
import Logs.Cluster
|
||||||
import P2P.Proxy
|
import P2P.Proxy
|
||||||
|
@ -20,6 +21,7 @@ import Logs.Location
|
||||||
import Types.Command
|
import Types.Command
|
||||||
import Remote.List
|
import Remote.List
|
||||||
import qualified Remote
|
import qualified Remote
|
||||||
|
import qualified Types.Remote as Remote
|
||||||
|
|
||||||
import qualified Data.Map as M
|
import qualified Data.Map as M
|
||||||
import qualified Data.Set as S
|
import qualified Data.Set as S
|
||||||
|
@ -56,8 +58,8 @@ clusterProxySelector :: ClusterUUID -> ProtocolVersion -> Annex ProxySelector
|
||||||
clusterProxySelector clusteruuid protocolversion = do
|
clusterProxySelector clusteruuid protocolversion = do
|
||||||
nodes <- (fromMaybe S.empty . M.lookup clusteruuid . clusterUUIDs)
|
nodes <- (fromMaybe S.empty . M.lookup clusteruuid . clusterUUIDs)
|
||||||
<$> getClusters
|
<$> getClusters
|
||||||
remotes <- filter (flip S.member nodes . ClusterNodeUUID . Remote.uuid)
|
clusternames <- annexClusters <$> Annex.getGitConfig
|
||||||
<$> remoteList
|
remotes <- filter (isnode nodes clusternames) <$> remoteList
|
||||||
remotesides <- mapM (proxySshRemoteSide protocolversion) remotes
|
remotesides <- mapM (proxySshRemoteSide protocolversion) remotes
|
||||||
return $ ProxySelector
|
return $ ProxySelector
|
||||||
{ proxyCHECKPRESENT = nodecontaining remotesides
|
{ proxyCHECKPRESENT = nodecontaining remotesides
|
||||||
|
@ -71,6 +73,20 @@ clusterProxySelector clusteruuid protocolversion = do
|
||||||
, proxyUNLOCKCONTENT = pure Nothing
|
, proxyUNLOCKCONTENT = pure Nothing
|
||||||
}
|
}
|
||||||
where
|
where
|
||||||
|
-- Nodes of the cluster have remote.name.annex-cluster-node
|
||||||
|
-- containing its name.
|
||||||
|
isnode nodes clusternames r =
|
||||||
|
case remoteAnnexClusterNode (Remote.gitconfig r) of
|
||||||
|
Nothing -> False
|
||||||
|
Just names
|
||||||
|
| any (isclustername clusternames) names ->
|
||||||
|
flip S.member nodes $
|
||||||
|
ClusterNodeUUID $ Remote.uuid r
|
||||||
|
| otherwise -> False
|
||||||
|
|
||||||
|
isclustername clusternames name =
|
||||||
|
M.lookup name clusternames == Just clusteruuid
|
||||||
|
|
||||||
nodecontaining remotesides k = do
|
nodecontaining remotesides k = do
|
||||||
locs <- S.fromList <$> loggedLocations k
|
locs <- S.fromList <$> loggedLocations k
|
||||||
case filter (flip S.member locs . remoteUUID) remotesides of
|
case filter (flip S.member locs . remoteUUID) remotesides of
|
||||||
|
|
|
@ -151,41 +151,6 @@ for any number of git remotes. Which might be obnoxious.
|
||||||
Ah, instead git-annex's tab completion can be made to include instantiated
|
Ah, instead git-annex's tab completion can be made to include instantiated
|
||||||
remotes, no need to list them in git config.
|
remotes, no need to list them in git config.
|
||||||
|
|
||||||
## single upload with fanout
|
|
||||||
|
|
||||||
If we want to send a file to multiple repositories that are behind the same
|
|
||||||
proxy, it would be wasteful to upload it through the proxy repeatedly.
|
|
||||||
|
|
||||||
Perhaps a good user interface to this is `git-annex copy --to proxy`.
|
|
||||||
The proxy could fan out the upload and store it in one or more nodes behind
|
|
||||||
it. Using preferred content to select which nodes to use.
|
|
||||||
This would need `storeKey` to be changed to allow returning a UUID (or UUIDs)
|
|
||||||
where the content was actually stored.
|
|
||||||
|
|
||||||
Alternatively, `git-annex copy --to proxy-foo` could notice that proxy-bar
|
|
||||||
also wants the content, and fan out a copy to there. Then it could
|
|
||||||
record in its git-annex branch that the content is present in proxy-bar.
|
|
||||||
If the user later does `git-annex copy --to proxy-bar`, it would avoid
|
|
||||||
another upload (and the user would learn at that point that it was in
|
|
||||||
proxy-bar). This avoids needing to change the `storeKey` interface.
|
|
||||||
|
|
||||||
Should a proxy always fanout? if `git-annex copy --to proxy` is what does
|
|
||||||
fanout, and `git-annex copy --to proxy-foo` doesn't, then the user has
|
|
||||||
content. But if the latter does fanout, that might be annoying to users who
|
|
||||||
want to use proxies, but want full control over what lands where, and don't
|
|
||||||
want to use preferred content to do it. So probably fanout should be
|
|
||||||
configurable. But it can't be configured client side, because the fanout
|
|
||||||
happens on the proxy. Seems like remote.name.annex-fanout could be set to
|
|
||||||
false to prevent fanout to a specific remote. (This is analagous to a
|
|
||||||
remote having `git-annex assistant` running on it, it might fan out uploads
|
|
||||||
to it to other repos, and only the owner of that repo can control it.)
|
|
||||||
|
|
||||||
A command like `git-annex push` would see all the instantiated remotes and
|
|
||||||
would pick ones to send content to. If the proxy does fanout, this would
|
|
||||||
lead to `git-annex push` doing extra work iterating over instantiated
|
|
||||||
remotes that have already received content via fanout. Could this extra
|
|
||||||
work be avoided?
|
|
||||||
|
|
||||||
## clusters
|
## clusters
|
||||||
|
|
||||||
One way to use a proxy is just as a convenient way to access a group of
|
One way to use a proxy is just as a convenient way to access a group of
|
||||||
|
@ -281,6 +246,43 @@ cluster UUIDs.
|
||||||
|
|
||||||
No other protocol extensions or special cases should be needed.
|
No other protocol extensions or special cases should be needed.
|
||||||
|
|
||||||
|
## single upload with fanout
|
||||||
|
|
||||||
|
If we want to send a file to multiple repositories that are behind the same
|
||||||
|
proxy, it would be wasteful to upload it through the proxy repeatedly.
|
||||||
|
|
||||||
|
Perhaps a good user interface to this is `git-annex copy --to proxy`.
|
||||||
|
The proxy could fan out the upload and store it in one or more nodes behind
|
||||||
|
it. Using preferred content to select which nodes to use.
|
||||||
|
This would need `storeKey` to be changed to allow returning a UUID (or UUIDs)
|
||||||
|
where the content was actually stored.
|
||||||
|
|
||||||
|
Alternatively, `git-annex copy --to proxy-foo` could notice that proxy-bar
|
||||||
|
also wants the content, and fan out a copy to there. Then it could
|
||||||
|
record in its git-annex branch that the content is present in proxy-bar.
|
||||||
|
If the user later does `git-annex copy --to proxy-bar`, it would avoid
|
||||||
|
another upload (and the user would learn at that point that it was in
|
||||||
|
proxy-bar). This avoids needing to change the `storeKey` interface.
|
||||||
|
|
||||||
|
Should a proxy always fanout? if `git-annex copy --to proxy` is what does
|
||||||
|
fanout, and `git-annex copy --to proxy-foo` doesn't, then the user has
|
||||||
|
content. But if the latter does fanout, that might be annoying to users who
|
||||||
|
want to use proxies, but want full control over what lands where, and don't
|
||||||
|
want to use preferred content to do it. So probably fanout should be
|
||||||
|
configurable. But it can't be configured client side, because the fanout
|
||||||
|
happens on the proxy. Seems like remote.name.annex-fanout could be set to
|
||||||
|
false to prevent fanout to a specific remote. (This is analagous to a
|
||||||
|
remote having `git-annex assistant` running on it, it might fan out uploads
|
||||||
|
to it to other repos, and only the owner of that repo can control it.)
|
||||||
|
|
||||||
|
Alternatively, fanout could be limited to clusters.
|
||||||
|
|
||||||
|
A command like `git-annex push` would see all the instantiated remotes and
|
||||||
|
would pick ones to send content to. If fanout is done, this would
|
||||||
|
lead to `git-annex push` doing extra work iterating over instantiated
|
||||||
|
remotes that have already received content via fanout. Could this extra
|
||||||
|
work be avoided?
|
||||||
|
|
||||||
## cluster configuration lockdown
|
## cluster configuration lockdown
|
||||||
|
|
||||||
If some organization is running a cluster, and giving others access to it,
|
If some organization is running a cluster, and giving others access to it,
|
||||||
|
@ -302,6 +304,10 @@ to lock down the proxy configuration.
|
||||||
Of course, someone with access to a cluster can also drop all data from
|
Of course, someone with access to a cluster can also drop all data from
|
||||||
it! Unless git-annex-shell is run with `GIT_ANNEX_SHELL_APPENDONLY` set.
|
it! Unless git-annex-shell is run with `GIT_ANNEX_SHELL_APPENDONLY` set.
|
||||||
|
|
||||||
|
A remote will only be treated as a node of a cluster when the git
|
||||||
|
configuration remote.name.annex-cluster-node is set, which will prevent
|
||||||
|
creating clusters in places where they are not intended to be.
|
||||||
|
|
||||||
## speed
|
## speed
|
||||||
|
|
||||||
A passthrough proxy should be as fast as possible so as not to add overhead
|
A passthrough proxy should be as fast as possible so as not to add overhead
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue