dropping from clusters

Dropping from a cluster drops from every node of the cluster.
Including nodes that the cluster does not think have the content.
This is different from GET and CHECKPRESENT, which do trust the
cluster's location log. The difference is that removing from a cluster
should make 100% the content is gone from every node. So doing extra
work is ok. Compare with CHECKPRESENT where checking every node could
make it very expensive, and the worst that can happen in a false
negative is extra work being done.

Extended the P2P protocol with FAILURE-PLUS to handle the case where a
drop from one node succeeds, but a drop from another node fails. In that
case the entire cluster drop has failed.

Note that SUCCESS-PLUS is returned when dropping from a proxied remote
that is not a cluster, when the protocol version supports it. This is
because P2P.Proxy does not know when it's proxying for a single node
cluster vs for a remote that is not a cluster.
This commit is contained in:
Joey Hess 2024-06-23 09:28:18 -04:00
parent a6a04b7e5e
commit 5b332a87be
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
14 changed files with 144 additions and 59 deletions

View file

@ -69,7 +69,11 @@ clusterProxySelector clusteruuid protocolversion = do
, proxyPUT = \k -> do
locs <- S.fromList <$> loggedLocations k
return $ filter (flip S.notMember locs . remoteUUID) remotesides
, proxyREMOVE = \k -> error "TODO"
-- Remove the key from every node that contains it.
-- But, since it's possible the location log for some nodes
-- could be out of date, actually try to remove from every
-- node.
, proxyREMOVE = const (pure remotesides)
-- Content is not locked on the cluster as a whole,
-- instead it can be locked on individual nodes that are
-- proxied to the client.

View file

@ -231,6 +231,7 @@ data UnVerifiedCopy = UnVerifiedRemote Remote | UnVerifiedHere
verifyEnoughCopiesToDrop
:: String -- message to print when there are no known locations
-> Key
-> Maybe UUID -- repo dropping from
-> Maybe ContentRemovalLock
-> NumCopies
-> MinCopies
@ -240,14 +241,14 @@ verifyEnoughCopiesToDrop
-> (SafeDropProof -> Annex a) -- action to perform the drop
-> Annex a -- action to perform when unable to drop
-> Annex a
verifyEnoughCopiesToDrop nolocmsg key removallock neednum needmin skip preverified tocheck dropaction nodropaction =
verifyEnoughCopiesToDrop nolocmsg key dropfrom removallock neednum needmin skip preverified tocheck dropaction nodropaction =
helper [] [] preverified (nub tocheck) []
where
helper bad missing have [] lockunsupported =
liftIO (mkSafeDropProof neednum needmin have removallock) >>= \case
Right proof -> dropaction proof
Left stillhave -> do
notEnoughCopies key neednum needmin stillhave (skip++missing) bad nolocmsg lockunsupported
notEnoughCopies key dropfrom neednum needmin stillhave (skip++missing) bad nolocmsg lockunsupported
nodropaction
helper bad missing have (c:cs) lockunsupported
| isSafeDrop neednum needmin have removallock =
@ -299,8 +300,8 @@ data DropException = DropException SomeException
instance Exception DropException
notEnoughCopies :: Key -> NumCopies -> MinCopies -> [VerifiedCopy] -> [UUID] -> [Remote] -> String -> [Remote] -> Annex ()
notEnoughCopies key neednum needmin have skip bad nolocmsg lockunsupported = do
notEnoughCopies :: Key -> Maybe UUID -> NumCopies -> MinCopies -> [VerifiedCopy] -> [UUID] -> [Remote] -> String -> [Remote] -> Annex ()
notEnoughCopies key dropfrom neednum needmin have skip bad nolocmsg lockunsupported = do
showNote "unsafe"
if length have < fromNumCopies neednum
then showLongNote $ UnquotedString $
@ -319,7 +320,29 @@ notEnoughCopies key neednum needmin have skip bad nolocmsg lockunsupported = do
++ Remote.listRemoteNames lockunsupported
Remote.showTriedRemotes bad
Remote.showLocations True key (map toUUID have++skip) nolocmsg
-- When dropping from a cluster, don't suggest making the nodes of
-- the cluster available
clusternodes <- case mkClusterUUID =<< dropfrom of
Nothing -> pure []
Just cu -> do
clusters <- getClusters
pure $ maybe [] (map fromClusterNodeUUID . S.toList) $
M.lookup cu (clusterUUIDs clusters)
let excludeset = S.fromList $ map toUUID have++skip++clusternodes
-- Don't suggest making a cluster available when dropping from its
-- node.
let exclude u
| u `S.member` excludeset = pure True
| otherwise = case (dropfrom, mkClusterUUID u) of
(Just dropfrom', Just cu) -> do
clusters <- getClusters
pure $ case M.lookup cu (clusterUUIDs clusters) of
Just nodes ->
ClusterNodeUUID dropfrom'
`S.member` nodes
Nothing -> False
_ -> pure False
Remote.showLocations True key exclude nolocmsg
pluralCopies :: Int -> String
pluralCopies 1 = "copy"

View file

@ -205,7 +205,7 @@ doDrop pcc dropfrom contentlock key afile numcopies mincopies skip preverified c
ifM (Annex.getRead Annex.force)
( dropaction Nothing
, ifM (checkRequiredContent pcc dropfrom key afile)
( verifyEnoughCopiesToDrop nolocmsg key
( verifyEnoughCopiesToDrop nolocmsg key (Just dropfrom)
contentlock numcopies mincopies
skip preverified check
(dropaction . Just)

View file

@ -108,7 +108,8 @@ getKey' key afile = dispatch
Remote.showTriedRemotes remotes
showlocs (map Remote.uuid remotes)
return False
showlocs exclude = Remote.showLocations False key exclude
showlocs exclude = Remote.showLocations False key
(\u -> pure (u `elem` exclude))
"No other repository is known to contain the file."
-- This check is to avoid an ugly message if a remote is a
-- drive that is not mounted.

View file

@ -319,7 +319,7 @@ verifyExisting key destfile (yes, no) = do
(needcopies, mincopies) <- getFileNumMinCopies destfile
(tocheck, preverified) <- verifiableCopies key []
verifyEnoughCopiesToDrop [] key Nothing needcopies mincopies [] preverified tocheck
verifyEnoughCopiesToDrop [] key Nothing Nothing needcopies mincopies [] preverified tocheck
(const yes) no
seekRemote :: Remote -> Branch -> Maybe TopFilePath -> Bool -> CheckGitIgnore -> [String] -> CommandSeek

View file

@ -193,7 +193,7 @@ toPerform' mcontentlock dest removewhen key afile fastcheck isthere = do
DropCheckNumCopies -> do
(numcopies, mincopies) <- getSafestNumMinCopies afile key
(tocheck, verified) <- verifiableCopies key [srcuuid]
verifyEnoughCopiesToDrop "" key (Just contentlock)
verifyEnoughCopiesToDrop "" key (Just srcuuid) (Just contentlock)
numcopies mincopies [srcuuid] verified
(UnVerifiedRemote dest : tocheck)
(drophere setpresentremote contentlock . showproof)
@ -299,7 +299,7 @@ fromDrop src destuuid deststartedwithcopy key afile adjusttocheck =
DropCheckNumCopies -> do
(numcopies, mincopies) <- getSafestNumMinCopies afile key
(tocheck, verified) <- verifiableCopies key [Remote.uuid src]
verifyEnoughCopiesToDrop "" key Nothing numcopies mincopies [Remote.uuid src] verified
verifyEnoughCopiesToDrop "" key (Just (Remote.uuid src)) Nothing numcopies mincopies [Remote.uuid src] verified
(adjusttocheck tocheck) (dropremote . showproof) faileddropremote
DropWorse -> faileddropremote
where

View file

@ -2,7 +2,7 @@
-
- See doc/design/p2p_protocol.mdwn
-
- Copyright 2016-2021 Joey Hess <id@joeyh.name>
- Copyright 2016-2024 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -89,6 +89,7 @@ data Message
| SUCCESS
| SUCCESS_PLUS [UUID]
| FAILURE
| FAILURE_PLUS [UUID]
| DATA Len -- followed by bytes of data
| VALIDITY Validity
| ERROR String
@ -115,6 +116,7 @@ instance Proto.Sendable Message where
formatMessage SUCCESS = ["SUCCESS"]
formatMessage (SUCCESS_PLUS uuids) = ("SUCCESS-PLUS":map Proto.serialize uuids)
formatMessage FAILURE = ["FAILURE"]
formatMessage (FAILURE_PLUS uuids) = ("FAILURE-PLUS":map Proto.serialize uuids)
formatMessage (VALIDITY Valid) = ["VALID"]
formatMessage (VALIDITY Invalid) = ["INVALID"]
formatMessage (DATA len) = ["DATA", Proto.serialize len]
@ -141,6 +143,7 @@ instance Proto.Receivable Message where
parseCommand "SUCCESS" = Proto.parse0 SUCCESS
parseCommand "SUCCESS-PLUS" = Proto.parseList SUCCESS_PLUS
parseCommand "FAILURE" = Proto.parse0 FAILURE
parseCommand "FAILURE-PLUS" = Proto.parseList FAILURE_PLUS
parseCommand "DATA" = Proto.parse1 DATA
parseCommand "ERROR" = Proto.parse1 ERROR
parseCommand "VALID" = Proto.parse0 (VALIDITY Valid)
@ -355,10 +358,10 @@ lockContentWhile runproto key a = bracket setup cleanup a
cleanup True = runproto () $ net $ sendMessage UNLOCKCONTENT
cleanup False = return ()
remove :: Key -> Proto Bool
remove :: Key -> Proto (Bool, Maybe [UUID])
remove key = do
net $ sendMessage (REMOVE key)
checkSuccess
checkSuccessFailurePlus
get :: FilePath -> Key -> Maybe IncrementalVerifier -> AssociatedFile -> Meter -> MeterUpdate -> Proto (Bool, Verification)
get dest key iv af m p =
@ -565,13 +568,7 @@ sendContent key af offset@(Offset n) p = go =<< local (contentSize key)
ver <- net getProtocolVersion
when (ver >= ProtocolVersion 1) $
net . sendMessage . VALIDITY =<< validitycheck
if ver >= ProtocolVersion 2
then checkSuccessPlus
else do
ok <- checkSuccess
if ok
then return (Just [])
else return Nothing
checkSuccessPlus
receiveContent
:: Observable t
@ -620,15 +617,30 @@ checkSuccess = do
return False
checkSuccessPlus :: Proto (Maybe [UUID])
checkSuccessPlus = do
ack <- net receiveMessage
case ack of
Just SUCCESS -> return (Just [])
Just (SUCCESS_PLUS l) -> return (Just l)
Just FAILURE -> return Nothing
_ -> do
net $ sendMessage (ERROR "expected SUCCESS or SUCCESS-PLUS or FAILURE")
return Nothing
checkSuccessPlus =
checkSuccessFailurePlus >>= return . \case
(True, v) -> v
(False, _) -> Nothing
checkSuccessFailurePlus :: Proto (Bool, Maybe [UUID])
checkSuccessFailurePlus = do
ver <- net getProtocolVersion
if ver >= ProtocolVersion 2
then do
ack <- net receiveMessage
case ack of
Just SUCCESS -> return (True, Just [])
Just (SUCCESS_PLUS l) -> return (True, Just l)
Just FAILURE -> return (False, Nothing)
Just (FAILURE_PLUS l) -> return (False, Just l)
_ -> do
net $ sendMessage (ERROR "expected SUCCESS or SUCCESS-PLUS or FAILURE or FAILURE-PLUS")
return (False, Nothing)
else do
ok <- checkSuccess
if ok
then return (True, Just [])
else return (False, Nothing)
sendSuccess :: Bool -> Proto ()
sendSuccess True = net $ sendMessage SUCCESS

View file

@ -60,11 +60,11 @@ data ProxySelector = ProxySelector
{ proxyCHECKPRESENT :: Key -> Annex (Maybe RemoteSide)
, proxyLOCKCONTENT :: Key -> Annex (Maybe RemoteSide)
, proxyUNLOCKCONTENT :: Annex (Maybe RemoteSide)
, proxyREMOVE :: Key -> Annex RemoteSide
, proxyREMOVE :: Key -> Annex [RemoteSide]
-- ^ remove from all of these remotes
, proxyGET :: Key -> Annex (Maybe RemoteSide)
-- ^ can get from any of these remotes
, proxyPUT :: Key -> Annex [RemoteSide]
-- ^ can put to some/all of these remotes
-- ^ put to some/all of these remotes
}
singleProxySelector :: RemoteSide -> ProxySelector
@ -72,7 +72,7 @@ singleProxySelector r = ProxySelector
{ proxyCHECKPRESENT = const (pure (Just r))
, proxyLOCKCONTENT = const (pure (Just r))
, proxyUNLOCKCONTENT = pure (Just r)
, proxyREMOVE = const (pure r)
, proxyREMOVE = const (pure [r])
, proxyGET = const (pure (Just r))
, proxyPUT = const (pure [r])
}
@ -187,9 +187,9 @@ proxy proxydone proxymethods servermode (ClientSide clientrunst clientconn) remo
proxynextclientmessage
Nothing -> proxynextclientmessage ()
REMOVE k -> do
remoteside <- proxyREMOVE proxyselector k
remotesides <- proxyREMOVE proxyselector k
servermodechecker checkREMOVEServerMode $
handleREMOVE remoteside k message
handleREMOVE remotesides k message
GET _ _ k -> proxyGET proxyselector k >>= \case
Just remoteside -> handleGET remoteside message
Nothing ->
@ -215,6 +215,7 @@ proxy proxydone proxymethods servermode (ClientSide clientrunst clientconn) remo
SUCCESS -> protoerr
SUCCESS_PLUS _ -> protoerr
FAILURE -> protoerr
FAILURE_PLUS _ -> protoerr
DATA _ -> protoerr
VALIDITY _ -> protoerr
-- If the client errors out, give up.
@ -266,14 +267,38 @@ proxy proxydone proxymethods servermode (ClientSide clientrunst clientconn) remo
protoerr = do
_ <- client $ net $ sendMessage (ERROR "protocol error")
giveup "protocol error"
handleREMOVE remoteside k message =
proxyresponse remoteside message $ \resp () -> do
case resp of
SUCCESS -> removedContent proxymethods
(remoteUUID remoteside) k
_ -> return ()
proxynextclientmessage ()
handleREMOVE [] _ _ =
-- When no places are provided to remove from,
-- don't report a successful remote.
protoerrhandler proxynextclientmessage $
client $ net $ sendMessage FAILURE
handleREMOVE remotesides k message = do
v <- forM remotesides $ \r ->
runRemoteSideOrSkipFailed r $ do
net $ sendMessage message
net receiveMessage >>= return . \case
Just SUCCESS ->
Just (True, [remoteUUID r])
Just (SUCCESS_PLUS us) ->
Just (True, remoteUUID r:us)
Just FAILURE ->
Just (False, [])
Just (FAILURE_PLUS us) ->
Just (False, us)
_ -> Nothing
let v' = map join v
let us = concatMap snd $ catMaybes v'
mapM_ (\u -> removedContent proxymethods u k) us
protoerrhandler proxynextclientmessage $
client $ net $ sendMessage $
if all (maybe False fst) v'
then if null us || protocolversion < 2
then SUCCESS
else SUCCESS_PLUS us
else if null us || protocolversion < 2
then FAILURE
else FAILURE_PLUS us
handleGET remoteside message = getresponse (runRemoteSide remoteside) message $
withDATA (relayGET remoteside)

View file

@ -342,11 +342,12 @@ remoteLocations (IncludeIgnored ii) locations trusted = do
{- Displays known locations of a key and helps the user take action
- to make them accessible. -}
showLocations :: Bool -> Key -> [UUID] -> String -> Annex ()
showLocations separateuntrusted key exclude nolocmsg = do
showLocations :: Bool -> Key -> (UUID -> Annex Bool) -> String -> Annex ()
showLocations separateuntrusted key checkexclude nolocmsg = do
u <- getUUID
remotes <- remoteList
uuids <- keyLocations key
exclude <- filterM checkexclude uuids
untrusteduuids <- if separateuntrusted
then trustGet UnTrusted
else pure []

View file

@ -449,7 +449,8 @@ dropKey' repo r st@(State connpool duc _ _ _) key
, giveup "remote does not have expected annex.uuid value"
)
| Git.repoIsHttp repo = giveup "dropping from http remote not supported"
| otherwise = P2PHelper.remove (Ssh.runProto r connpool (return False)) key
| otherwise = P2PHelper.remove (uuid r)
(Ssh.runProto r connpool (return (False, Nothing))) key
lockKey :: Remote -> State -> Key -> (VerifiedCopy -> Annex r) -> Annex r
lockKey r st key callback = do

View file

@ -59,11 +59,20 @@ retrieve gc runner k af dest p verifyconfig = do
Just (False, _) -> giveup "Transfer failed"
Nothing -> remoteUnavail
remove :: ProtoRunner Bool -> Key -> Annex ()
remove runner k = runner (P2P.remove k) >>= \case
Just True -> return ()
Just False -> giveup "removing content from remote failed"
remove :: UUID -> ProtoRunner (Bool, Maybe [UUID]) -> Key -> Annex ()
remove remoteuuid runner k = runner (P2P.remove k) >>= \case
Just (True, alsoremoveduuids) -> note alsoremoveduuids
Just (False, alsoremoveduuids) -> do
note alsoremoveduuids
giveup "removing content from remote failed"
Nothing -> remoteUnavail
where
-- The remote reports removal from other UUIDs than its own,
-- so record those.
note alsoremoveduuids =
forM_ (fromMaybe [] alsoremoveduuids) $ \u ->
when (u /= remoteuuid) $
logChange k u InfoMissing
checkpresent :: ProtoRunner Bool -> Key -> Annex Bool
checkpresent runner k = maybe remoteUnavail return =<< runner (P2P.checkPresent k)

View file

@ -61,7 +61,7 @@ chainGen addr r u rc gc rs = do
, retrieveKeyFile = retrieve gc protorunner
, retrieveKeyFileCheap = Nothing
, retrievalSecurityPolicy = RetrievalAllKeysSecure
, removeKey = remove protorunner
, removeKey = remove u protorunner
, lockContent = Just $ lock withconn runProtoConn u
, checkPresent = checkpresent protorunner
, checkPresentCheap = False

View file

@ -117,6 +117,10 @@ To remove a key's content from the server, the client sends:
The server responds with either SUCCESS or FAILURE.
In protocol version 2, the server can optionally reply with SUCCESS-PLUS
or FAILURE-PLUS. Each has a subsequent list of UUIDs of repositories
that the content was removed from.
## Storing content on the server
To store content on the server, the client sends:
@ -159,9 +163,8 @@ was being sent.
If the server successfully receives the data and stores the content,
it replies with SUCCESS. Otherwise, FAILURE.
In protocol version 2, the server can optionally reply with SUCCESS-PLUS.
The subsequent list of UUIDs are additional UUIDs where the content was
stored, in addition to the UUID where the client was sending it.
In protocol version 2, the server can optionally reply with SUCCESS-PLUS
and a list of UUIDs where the content was stored.
## Getting content from the server

View file

@ -55,6 +55,9 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
* Getting a key from a cluster should proxy from one of the nodes that has
it. (done)
* Implement cluster drops, trying to remove from all nodes, and returning
which UUIDs it was dropped from. (done)
* Getting a key from a cluster currently always selects the lowest cost
remote, and always the same remote if cost is the same. Should
round-robin amoung remotes, and prefer to avoid using remotes that
@ -67,13 +70,16 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
* On upload to cluster, send to nodes where it's preferred content, and not
to other nodes.
* Implement cluster drops, trying to remove from all nodes, and returning
which UUIDs it was dropped from.
Problem: `move --from cluster` in "does this make it worse"
* Problem: `move --from cluster` in "does this make it worse"
check may fail to realize that dropping from multiple nodes does in fact
make it worse.
* Bug: When a cluster has one node, copying a file to it does not update
location log to say the content is present on it. It's returning SUCCESS
rather than SUCCESS-PLUS.
* Support annex.jobs for clusters.
* On upload to a cluster, as well as fanout to nodes, if the key is
preferred content of the proxy repository, store it there.
(But not when preferred content is not configured.)