cleanly close proxy connection on interrupted PUT

An interrupted PUT to cluster that has a node that is a special remote
over http left open the connection to the cluster, so the next request
opens another one. So did an interrupted PUT directly to the proxied
special remote over http.

proxySpecialRemote was stuck waiting for all the DATA. Its connection
remained open so it kept waiting.

In servePut, checktooshort handles closing the P2P connection
when too short a data is received from PUT. But, checktooshort was only
called after the protoaction, which is what runs the proxy, which is
what was getting stuck. Modified it to run as a background thread,
which waits for the tooshortv to be written to, which gather always does
once it gets to the end of the data received from the http client.

That makes proxyConnection's releaseconn run once all data is received
from the http client. Made it close the connection handles before
waiting on the asyncworker thread. This lets proxySpecialRemote finish
processing any data from the handle, and then it will give up,
more or less cleanly, if it didn't receive enough data.

I say "more or less cleanly" because with both sides of the P2P
connection taken down, some protocol unhappyness results. Which can lead
to some ugly debug messages. But also can cause the asyncworker thread
to throw an exception. So made withP2PConnections not crash when it
receives an exception from releaseconn.

This did have a small change to the behavior of an interrupted PUT when
proxying to a regular remote. proxyConnection has a protoerrorhandler
that closes the proxy connection on a protocol error. But the proxy
connection is also closed by checktooshort when it closes the P2P
connection. Closing the same proxy connection twice is not a problem,
it just results in duplicated debug messages about it.
This commit is contained in:
Joey Hess 2024-07-29 10:33:26 -04:00
parent c8e7231f48
commit 4f3ae96666
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 7 additions and 14 deletions

View file

@ -314,9 +314,9 @@ servePut st resultmangle su apiver (DataLength len) (B64Key k) cu bypass baf mof
tooshortv <- liftIO newEmptyTMVarIO
content <- liftIO $ S.unSourceT stream (gather validityv tooshortv)
res <- withP2PConnection' apiver st cu su bypass sec auth WriteAction
(\cst -> cst { connectionWaitVar = False }) $ \conn ->
(\cst -> cst { connectionWaitVar = False }) $ \conn -> do
liftIO $ void $ async $ checktooshort conn tooshortv
liftIO (protoaction conn content validitycheck)
`finally` checktooshort conn tooshortv
case res of
Right (Right (Just plusuuids)) -> return $ resultmangle $
PutResultPlus True (map B64UUID plusuuids)
@ -385,8 +385,8 @@ servePut st resultmangle su apiver (DataLength len) (B64Key k) cu bypass baf mof
-- The connection can no longer be used when too short a DATA has
-- been written to it.
checktooshort conn tooshortv =
liftIO $ whenM (atomically $ fromMaybe True <$> tryTakeTMVar tooshortv) $
checktooshort conn tooshortv = do
liftIO $ whenM (atomically $ takeTMVar tooshortv) $
closeP2PConnection conn
servePutOffset

View file

@ -220,7 +220,7 @@ withP2PConnections workerpool proxyconnectionpoolsize clusterconcurrency a = do
>>= atomically . putTMVar respvar
servicer myuuid myproxies proxypool reqv relv endv
Left (Right releaseconn) -> do
releaseconn
void $ tryNonAsync releaseconn
servicer myuuid myproxies proxypool reqv relv endv
Left (Left ()) -> return ()
@ -378,11 +378,11 @@ proxyConnection proxyconnectionpoolsize relv connparams workerpool proxypool pro
liftIO $ runNetProto proxyfromclientrunst proxyfromclientconn $
P2P.net P2P.receiveMessage
let releaseconn returntopool =
let releaseconn returntopool = do
atomically $ void $ tryPutTMVar relv $ do
r <- liftIO $ wait asyncworker
liftIO $ closeConnection proxyfromclientconn
liftIO $ closeConnection clientconn
r <- liftIO $ wait asyncworker
if returntopool
then liftIO $ do
now <- getPOSIXTime

View file

@ -28,13 +28,6 @@ Planned schedule of work:
## work notes
* An interrupted PUT to cluster that has a node that is a special remote
over http leaves open the connection to the cluster, so the next request
opens another one.
So does an interrupted PUT directly to the proxied
special remote over http.
* When part of a file has been sent to a cluster via the http server,
the transfer interrupted, and another node is added to the cluster,
and the transfer of the file performed again, there is a failure