checkpresent support for clusters

This assumes that the proxy for a cluster has up-to-date location
logs. If it didn't, it might proxy the checkpresent to a node that no
longer has the content, while some other node still does, and so
it would incorrectly appear that the cluster no longer contains the
content.

Since cluster UUIDs are not stored to location logs,
git-annex fsck --fast when claiming to fix a location log when
that occurred would not cause any problems. And presumably the location
tracking would later get sorted out.

At least usually, changes to the content of nodes goes via the proxy,
and it will update its location logs, so they will be accurate. However,
if there were multiple proxies to the same cluster, or nodes were
accessed directly (or via proxy to the node and not the cluster),
the proxy's location log could certainly be wrong.

(The location log access for GET has the same issues.)
This commit is contained in:
Joey Hess 2024-06-18 11:10:48 -04:00
parent 88d9a02f7c
commit f049156a03
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 26 additions and 17 deletions

View file

@ -60,13 +60,8 @@ clusterProxySelector clusteruuid protocolversion = do
<$> remoteList
remotesides <- mapM (proxySshRemoteSide protocolversion) remotes
return $ ProxySelector
{ proxyCHECKPRESENT = \k -> error "TODO"
, proxyGET = \k -> do
locs <- S.fromList <$> loggedLocations k
case filter (flip S.member locs . remoteUUID) remotesides of
-- TODO: Avoid always using same remote
(r:_) -> return (Just r)
[] -> return Nothing
{ proxyCHECKPRESENT = nodecontaining remotesides
, proxyGET = nodecontaining remotesides
, proxyPUT = \k -> error "TODO"
, proxyREMOVE = \k -> error "TODO"
-- Content is not locked on the cluster as a whole,
@ -75,3 +70,11 @@ clusterProxySelector clusteruuid protocolversion = do
, proxyLOCKCONTENT = const (pure Nothing)
, proxyUNLOCKCONTENT = pure Nothing
}
where
nodecontaining remotesides k = do
locs <- S.fromList <$> loggedLocations k
case filter (flip S.member locs . remoteUUID) remotesides of
-- TODO: Avoid always using same remote
(r:_) -> return (Just r)
[] -> return Nothing

View file

@ -54,7 +54,7 @@ closeRemoteSide remoteside =
- actions.
- -}
data ProxySelector = ProxySelector
{ proxyCHECKPRESENT :: Key -> Annex RemoteSide
{ proxyCHECKPRESENT :: Key -> Annex (Maybe RemoteSide)
, proxyLOCKCONTENT :: Key -> Annex (Maybe RemoteSide)
, proxyUNLOCKCONTENT :: Annex (Maybe RemoteSide)
, proxyREMOVE :: Key -> Annex RemoteSide
@ -64,7 +64,7 @@ data ProxySelector = ProxySelector
singleProxySelector :: RemoteSide -> ProxySelector
singleProxySelector r = ProxySelector
{ proxyCHECKPRESENT = const (pure r)
{ proxyCHECKPRESENT = const (pure (Just r))
, proxyLOCKCONTENT = const (pure (Just r))
, proxyUNLOCKCONTENT = pure (Just r)
, proxyREMOVE = const (pure r)
@ -160,9 +160,13 @@ proxy proxydone proxymethods servermode (ClientSide clientrunst clientconn) prox
proxyclientmessage Nothing = proxydone
proxyclientmessage (Just message) = case message of
CHECKPRESENT k -> do
remoteside <- proxyCHECKPRESENT proxyselector k
proxyresponse remoteside message (const proxynextclientmessage)
CHECKPRESENT k -> proxyCHECKPRESENT proxyselector k >>= \case
Just remoteside ->
proxyresponse remoteside message
(const proxynextclientmessage)
Nothing ->
protoerrhandler proxynextclientmessage $
client $ net $ sendMessage FAILURE
LOCKCONTENT k -> proxyLOCKCONTENT proxyselector k >>= \case
Just remoteside ->
proxyresponse remoteside message

View file

@ -55,7 +55,7 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
* Basic proxying to special remote support (non-streaming).
* Getting a key from a cluster should proxy from one of the nodes that has
it, or from the proxy repository itself if it has the key.
it. (done)
* Getting a key from a cluster currently always selects the lowest cost
remote, and always the same remote if cost is the same. Should
@ -65,10 +65,6 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
* Implement upload with fanout and reporting back additional UUIDs over P2P
protocol.
* On upload to a cluster, as well as fanout to nodes, if the key is
preferred content of the proxy repository, store it there.
(But not when preferred content is not configured.)
* Implement cluster drops, trying to remove from all nodes, and returning
which UUIDs it was dropped from.
@ -85,6 +81,12 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
check may fail to realize that dropping from multiple nodes does in fact
make it worse.
* On upload to a cluster, as well as fanout to nodes, if the key is
preferred content of the proxy repository, store it there.
(But not when preferred content is not configured.)
And on download from a cluster, if the proxy repository has the content,
get it from there to avoid the overhead of proxying to a node.
* Support proxies-of-proxies better, eg foo-bar-baz.
Currently, it does work, but have to run `git-annex updateproxy`
on foo in order for it to notice the bar-baz proxied remote exists,