Fixed some other potential hangs in the P2P protocol
Finishes the start made in 983c9d5a53
, by
handling the case where `transfer` fails for some other reason, and so the
ReadContent callback does not get run. I don't know of a case where
`transfer` does fail other than the locking dealt with in that commit, but
it's good to have a guarantee.
StoreContent and StoreContentTo had a similar problem.
Things like `getViaTmp` may decide not to run the transfer action.
And `transfer` could certianly fail, if another transfer of the same
object was in progress. (Or a different object when annex.pidlock is set.)
If the transfer action was not run, the content of the object would
not all get consumed, and so would get interpreted as protocol commands,
which would not go well.
My approach to fixing all of these things is to set a TVar only
once all the data in the transfer is known to have been read/written.
This way the internals of `transfer`, `getViaTmp` etc don't matter.
So in ReadContent, it checks if the transfer completed.
If not, as long as it didn't throw an exception, send empty and Invalid
data to the callback. On an exception the state of the protocol is unknown
so it has to raise ProtoFailureException and close the connection,
same as before.
In StoreContent, if the transfer did not complete
some portion of the DATA has been read, so the protocol is in an unknown
state and it has to close the conection as well.
(The ProtoFailureMessage used here matches the one in Annex.Transfer, which
is the most likely reason. Not ideal to duplicate it..)
StoreContent did not ever close the protocol connection before. So this is
a protocol change, but only in an exceptional circumstance, and it's not
going to break anything, because clients already need to deal with the
connection breaking at any point.
The way this new behavior looks (here origin has annex.pidlock = true so will
only accept one upload to it at a time):
git annex copy --to origin -J2
copy x (to origin...) ok
copy y (to origin...)
Lost connection (fd:25: hGetChar: end of file)
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
This commit is contained in:
parent
9adc0b3417
commit
6ecd55a9fa
6 changed files with 90 additions and 20 deletions
|
@ -3,6 +3,7 @@ git-annex (7.20181106) UNRELEASED; urgency=medium
|
||||||
* git-annex-shell: Fix hang when transferring the same objects to two
|
* git-annex-shell: Fix hang when transferring the same objects to two
|
||||||
different clients at the same time. (Or when annex.pidlock is used,
|
different clients at the same time. (Or when annex.pidlock is used,
|
||||||
two different objects.)
|
two different objects.)
|
||||||
|
* Fixed some other potential hangs in the P2P protocol.
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Tue, 06 Nov 2018 12:44:27 -0400
|
-- Joey Hess <id@joeyh.name> Tue, 06 Nov 2018 12:44:27 -0400
|
||||||
|
|
||||||
|
|
87
P2P/Annex.hs
87
P2P/Annex.hs
|
@ -26,6 +26,7 @@ import Types.Remote (RetrievalSecurityPolicy(..))
|
||||||
import Utility.Metered
|
import Utility.Metered
|
||||||
|
|
||||||
import Control.Monad.Free
|
import Control.Monad.Free
|
||||||
|
import Control.Concurrent.STM
|
||||||
|
|
||||||
-- Full interpreter for Proto, that can receive and send objects.
|
-- Full interpreter for Proto, that can receive and send objects.
|
||||||
runFullProto :: RunState -> P2PConnection -> Proto a -> Annex (Either ProtoFailure a)
|
runFullProto :: RunState -> P2PConnection -> Proto a -> Annex (Either ProtoFailure a)
|
||||||
|
@ -56,28 +57,46 @@ runLocal runst runner a = case a of
|
||||||
Left e -> return $ Left $ ProtoFailureException e
|
Left e -> return $ Left $ ProtoFailureException e
|
||||||
Right (Left e) -> return $ Left e
|
Right (Left e) -> return $ Left e
|
||||||
Right (Right ok) -> runner (next ok)
|
Right (Right ok) -> runner (next ok)
|
||||||
|
-- If the content is not present, or the transfer doesn't
|
||||||
|
-- run for any other action, the sender action still must
|
||||||
|
-- be run, so is given empty and Invalid data.
|
||||||
|
let fallback = runner (sender mempty (return Invalid))
|
||||||
v <- tryNonAsync $ prepSendAnnex k
|
v <- tryNonAsync $ prepSendAnnex k
|
||||||
case v of
|
case v of
|
||||||
Right (Just (f, checkchanged)) -> proceed $
|
Right (Just (f, checkchanged)) -> proceed $ do
|
||||||
-- Allow multiple uploads of the same key.
|
-- alwaysUpload to allow multiple uploads of the same key.
|
||||||
transfer alwaysUpload k af $
|
let runtransfer ti = transfer alwaysUpload k af $ \p ->
|
||||||
sinkfile f o checkchanged sender
|
sinkfile f o checkchanged sender p ti
|
||||||
Right Nothing -> proceed $
|
checktransfer runtransfer fallback
|
||||||
runner (sender mempty (return Invalid))
|
Right Nothing -> proceed fallback
|
||||||
Left e -> return $ Left $ ProtoFailureException e
|
Left e -> return $ Left $ ProtoFailureException e
|
||||||
StoreContent k af o l getb validitycheck next -> do
|
StoreContent k af o l getb validitycheck next -> do
|
||||||
-- This is the same as the retrievalSecurityPolicy of
|
-- This is the same as the retrievalSecurityPolicy of
|
||||||
-- Remote.P2P and Remote.Git.
|
-- Remote.P2P and Remote.Git.
|
||||||
let rsp = RetrievalAllKeysSecure
|
let rsp = RetrievalAllKeysSecure
|
||||||
ok <- flip catchNonAsync (const $ return False) $
|
v <- tryNonAsync $ do
|
||||||
transfer download k af $ \p ->
|
let runtransfer ti =
|
||||||
getViaTmp rsp DefaultVerify k $ \tmp -> do
|
Right <$> transfer download k af (\p ->
|
||||||
storefile tmp o l getb validitycheck p
|
getViaTmp rsp DefaultVerify k $ \tmp ->
|
||||||
runner (next ok)
|
storefile tmp o l getb validitycheck p ti)
|
||||||
|
let fallback = return $ Left $
|
||||||
|
ProtoFailureMessage "transfer already in progress, or unable to take transfer lock"
|
||||||
|
checktransfer runtransfer fallback
|
||||||
|
case v of
|
||||||
|
Left e -> return $ Left $ ProtoFailureException e
|
||||||
|
Right (Left e) -> return $ Left e
|
||||||
|
Right (Right ok) -> runner (next ok)
|
||||||
StoreContentTo dest o l getb validitycheck next -> do
|
StoreContentTo dest o l getb validitycheck next -> do
|
||||||
res <- flip catchNonAsync (const $ return (False, UnVerified)) $
|
v <- tryNonAsync $ do
|
||||||
storefile dest o l getb validitycheck nullMeterUpdate
|
let runtransfer ti = Right
|
||||||
runner (next res)
|
<$> storefile dest o l getb validitycheck nullMeterUpdate ti
|
||||||
|
let fallback = return $ Left $
|
||||||
|
ProtoFailureMessage "transfer failed"
|
||||||
|
checktransfer runtransfer fallback
|
||||||
|
case v of
|
||||||
|
Left e -> return $ Left $ ProtoFailureException e
|
||||||
|
Right (Left e) -> return $ Left e
|
||||||
|
Right (Right ok) -> runner (next ok)
|
||||||
SetPresent k u next -> do
|
SetPresent k u next -> do
|
||||||
v <- tryNonAsync $ logChange k u InfoPresent
|
v <- tryNonAsync $ logChange k u InfoPresent
|
||||||
case v of
|
case v of
|
||||||
|
@ -123,7 +142,7 @@ runLocal runst runner a = case a of
|
||||||
UpdateMeterTotalSize m sz next -> do
|
UpdateMeterTotalSize m sz next -> do
|
||||||
liftIO $ setMeterTotalSize m sz
|
liftIO $ setMeterTotalSize m sz
|
||||||
runner next
|
runner next
|
||||||
RunValidityCheck check next -> runner . next =<< check
|
RunValidityCheck checkaction next -> runner . next =<< checkaction
|
||||||
where
|
where
|
||||||
transfer mk k af ta = case runst of
|
transfer mk k af ta = case runst of
|
||||||
-- Update transfer logs when serving.
|
-- Update transfer logs when serving.
|
||||||
|
@ -133,8 +152,8 @@ runLocal runst runner a = case a of
|
||||||
-- Transfer logs are updated higher in the stack when
|
-- Transfer logs are updated higher in the stack when
|
||||||
-- a client.
|
-- a client.
|
||||||
Client _ -> ta nullMeterUpdate
|
Client _ -> ta nullMeterUpdate
|
||||||
|
|
||||||
storefile dest (Offset o) (Len l) getb validitycheck p = do
|
storefile dest (Offset o) (Len l) getb validitycheck p ti = do
|
||||||
let p' = offsetMeterUpdate p (toBytesProcessed o)
|
let p' = offsetMeterUpdate p (toBytesProcessed o)
|
||||||
v <- runner getb
|
v <- runner getb
|
||||||
case v of
|
case v of
|
||||||
|
@ -143,6 +162,8 @@ runLocal runst runner a = case a of
|
||||||
when (o /= 0) $
|
when (o /= 0) $
|
||||||
hSeek h AbsoluteSeek o
|
hSeek h AbsoluteSeek o
|
||||||
meteredWrite p' h b
|
meteredWrite p' h b
|
||||||
|
indicatetransferred ti
|
||||||
|
|
||||||
rightsize <- do
|
rightsize <- do
|
||||||
sz <- liftIO $ getFileSize dest
|
sz <- liftIO $ getFileSize dest
|
||||||
return (toInteger sz == l + o)
|
return (toInteger sz == l + o)
|
||||||
|
@ -158,7 +179,7 @@ runLocal runst runner a = case a of
|
||||||
return (rightsize, MustVerify)
|
return (rightsize, MustVerify)
|
||||||
Left e -> error $ describeProtoFailure e
|
Left e -> error $ describeProtoFailure e
|
||||||
|
|
||||||
sinkfile f (Offset o) checkchanged sender p = bracket setup cleanup go
|
sinkfile f (Offset o) checkchanged sender p ti = bracket setup cleanup go
|
||||||
where
|
where
|
||||||
setup = liftIO $ openBinaryFile f ReadMode
|
setup = liftIO $ openBinaryFile f ReadMode
|
||||||
cleanup = liftIO . hClose
|
cleanup = liftIO . hClose
|
||||||
|
@ -167,8 +188,34 @@ runLocal runst runner a = case a of
|
||||||
when (o /= 0) $
|
when (o /= 0) $
|
||||||
liftIO $ hSeek h AbsoluteSeek o
|
liftIO $ hSeek h AbsoluteSeek o
|
||||||
b <- liftIO $ hGetContentsMetered h p'
|
b <- liftIO $ hGetContentsMetered h p'
|
||||||
|
|
||||||
let validitycheck = local $ runValidityCheck $
|
let validitycheck = local $ runValidityCheck $
|
||||||
checkchanged >>= return . \case
|
checkchanged >>= return . \case
|
||||||
False -> Invalid
|
False -> Invalid
|
||||||
True -> Valid
|
True -> Valid
|
||||||
runner (sender b validitycheck)
|
r <- runner (sender b validitycheck)
|
||||||
|
indicatetransferred ti
|
||||||
|
return r
|
||||||
|
|
||||||
|
-- This allows using actions like download and viaTmp
|
||||||
|
-- that may abort a transfer, and clean up the protocol after them.
|
||||||
|
--
|
||||||
|
-- Runs an action that may make a transfer, passing a transfer
|
||||||
|
-- indicator. The action should call indicatetransferred on it,
|
||||||
|
-- only after it's actually sent/received the all data.
|
||||||
|
--
|
||||||
|
-- If the action ends without having called indicatetransferred,
|
||||||
|
-- runs the fallback action, which can close the protoocol
|
||||||
|
-- connection or otherwise clean up after the transfer not having
|
||||||
|
-- occurred.
|
||||||
|
--
|
||||||
|
-- If the action throws an exception, the fallback is not run.
|
||||||
|
checktransfer ta fallback = do
|
||||||
|
ti <- liftIO $ newTVarIO False
|
||||||
|
r <- ta ti
|
||||||
|
ifM (liftIO $ atomically $ readTVar ti)
|
||||||
|
( return r
|
||||||
|
, fallback
|
||||||
|
)
|
||||||
|
|
||||||
|
indicatetransferred ti = liftIO $ atomically $ writeTVar ti True
|
||||||
|
|
|
@ -238,8 +238,11 @@ data LocalF c
|
||||||
-- present.
|
-- present.
|
||||||
| ReadContent Key AssociatedFile Offset (L.ByteString -> Proto Validity -> Proto Bool) (Bool -> c)
|
| ReadContent Key AssociatedFile Offset (L.ByteString -> Proto Validity -> Proto Bool) (Bool -> c)
|
||||||
-- ^ Reads the content of a key and sends it to the callback.
|
-- ^ Reads the content of a key and sends it to the callback.
|
||||||
|
-- Must run the callback, or terminate the protocol connection.
|
||||||
|
--
|
||||||
-- May send any amount of data, including L.empty if the content is
|
-- May send any amount of data, including L.empty if the content is
|
||||||
-- not available. The callback must deal with that.
|
-- not available. The callback must deal with that.
|
||||||
|
--
|
||||||
-- And the content may change while it's being sent.
|
-- And the content may change while it's being sent.
|
||||||
-- The callback is passed a validity check that it can run after
|
-- The callback is passed a validity check that it can run after
|
||||||
-- sending the content to detect when this happened.
|
-- sending the content to detect when this happened.
|
||||||
|
@ -248,6 +251,9 @@ data LocalF c
|
||||||
-- Once the whole content of the key has been stored, moves the
|
-- Once the whole content of the key has been stored, moves the
|
||||||
-- temp file into place as the content of the key, and returns True.
|
-- temp file into place as the content of the key, and returns True.
|
||||||
--
|
--
|
||||||
|
-- Must consume the whole lazy ByteString, or if unable to do
|
||||||
|
-- so, terminate the protocol connection.
|
||||||
|
--
|
||||||
-- If the validity check is provided and fails, the content was
|
-- If the validity check is provided and fails, the content was
|
||||||
-- changed while it was being sent, so verificiation of the
|
-- changed while it was being sent, so verificiation of the
|
||||||
-- received content should be forced.
|
-- received content should be forced.
|
||||||
|
|
|
@ -94,3 +94,5 @@ get .heudiconv/qa/ses-20171113/info/filegroup_ses-20171113.json (from origin...)
|
||||||
so to me smells like some race condition due to high -J value.
|
so to me smells like some race condition due to high -J value.
|
||||||
|
|
||||||
[[!meta author=yoh]]
|
[[!meta author=yoh]]
|
||||||
|
|
||||||
|
> [[fixed|done]] --[[Joey]]
|
||||||
|
|
|
@ -0,0 +1,10 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 28"""
|
||||||
|
date="2018-11-06T18:48:29Z"
|
||||||
|
content="""
|
||||||
|
I now have a comprehensive fix in place, including fixing similar bad
|
||||||
|
behavior when uploading different objects concurrently to a remote with
|
||||||
|
annex.pidlock=yes or the same object concurrently to a remote not using
|
||||||
|
pidlock.
|
||||||
|
"""]]
|
|
@ -81,6 +81,10 @@ If the sender finds itself unable to send as many bytes of data as it
|
||||||
promised (perhaps because a file got truncated while it was being sent),
|
promised (perhaps because a file got truncated while it was being sent),
|
||||||
its only option is to close the protocol connection.
|
its only option is to close the protocol connection.
|
||||||
|
|
||||||
|
And if the receiver finds itself unable to receive all the data for some
|
||||||
|
reason (eg, out of disk space), its only option is to close the protocol
|
||||||
|
connection.
|
||||||
|
|
||||||
## Checking if content is present
|
## Checking if content is present
|
||||||
|
|
||||||
To check if a key is currently present on the server, the client sends:
|
To check if a key is currently present on the server, the client sends:
|
||||||
|
|
Loading…
Reference in a new issue