cleanly close proxy connection on interrupted PUT

An interrupted PUT to cluster that has a node that is a special remote over http left open the connection to the cluster, so the next request opens another one. So did an interrupted PUT directly to the proxied special remote over http. proxySpecialRemote was stuck waiting for all the DATA. Its connection remained open so it kept waiting. In servePut, checktooshort handles closing the P2P connection when too short a data is received from PUT. But, checktooshort was only called after the protoaction, which is what runs the proxy, which is what was getting stuck. Modified it to run as a background thread, which waits for the tooshortv to be written to, which gather always does once it gets to the end of the data received from the http client. That makes proxyConnection's releaseconn run once all data is received from the http client. Made it close the connection handles before waiting on the asyncworker thread. This lets proxySpecialRemote finish processing any data from the handle, and then it will give up, more or less cleanly, if it didn't receive enough data. I say "more or less cleanly" because with both sides of the P2P connection taken down, some protocol unhappyness results. Which can lead to some ugly debug messages. But also can cause the asyncworker thread to throw an exception. So made withP2PConnections not crash when it receives an exception from releaseconn. This did have a small change to the behavior of an interrupted PUT when proxying to a regular remote. proxyConnection has a protoerrorhandler that closes the proxy connection on a protocol error. But the proxy connection is also closed by checktooshort when it closes the P2P connection. Closing the same proxy connection twice is not a problem, it just results in duplicated debug messages about it.
2024-07-29 10:33:26 -04:00 · 2024-07-29 10:33:26 -04:00 · 4f3ae96666
commit 4f3ae96666
parent c8e7231f48
3 changed files with 7 additions and 14 deletions
--- a/P2P/Http/Server.hs
+++ b/P2P/Http/Server.hs
@ -314,9 +314,9 @@ servePut st resultmangle su apiver (DataLength len) (B64Key k) cu bypass baf mof
 	tooshortv <- liftIO newEmptyTMVarIO
 	content <- liftIO $ S.unSourceT stream (gather validityv tooshortv)
 	res <- withP2PConnection' apiver st cu su bypass sec auth WriteAction
-		(\cst -> cst { connectionWaitVar = False }) $ \conn ->
+		(\cst -> cst { connectionWaitVar = False }) $ \conn -> do
+			liftIO $ void $ async $ checktooshort conn tooshortv
 			liftIO (protoaction conn content validitycheck)
-				`finally` checktooshort conn tooshortv
 	case res of
 		Right (Right (Just plusuuids)) -> return $ resultmangle $
 			PutResultPlus True (map B64UUID plusuuids)
@ -385,8 +385,8 @@ servePut st resultmangle su apiver (DataLength len) (B64Key k) cu bypass baf mof
 			
 	-- The connection can no longer be used when too short a DATA has
 	-- been written to it.
-	checktooshort conn tooshortv =
-		liftIO $ whenM (atomically $ fromMaybe True <$> tryTakeTMVar tooshortv) $
+	checktooshort conn tooshortv = do
+		liftIO $ whenM (atomically $ takeTMVar tooshortv) $
 			closeP2PConnection conn

 servePutOffset
--- a/P2P/Http/State.hs
+++ b/P2P/Http/State.hs
@ -220,7 +220,7 @@ withP2PConnections workerpool proxyconnectionpoolsize clusterconcurrency a = do
 					>>= atomically . putTMVar respvar
 				servicer myuuid myproxies proxypool reqv relv endv
 			Left (Right releaseconn) -> do
-				releaseconn
+				void $ tryNonAsync releaseconn
 				servicer myuuid myproxies proxypool reqv relv endv
 			Left (Left ()) -> return ()
 	
@ -378,11 +378,11 @@ proxyConnection proxyconnectionpoolsize relv connparams workerpool proxypool pro
 				liftIO $ runNetProto proxyfromclientrunst proxyfromclientconn $
 					P2P.net P2P.receiveMessage
 	
-	let releaseconn returntopool =
+	let releaseconn returntopool = do
 		atomically $ void $ tryPutTMVar relv $ do
-			r <- liftIO $ wait asyncworker
 			liftIO $ closeConnection proxyfromclientconn
 			liftIO $ closeConnection clientconn
+			r <- liftIO $ wait asyncworker
 			if returntopool
 				then liftIO $ do
 					now <- getPOSIXTime
--- a/doc/todo/git-annex_proxies.mdwn
+++ b/doc/todo/git-annex_proxies.mdwn
@ -28,13 +28,6 @@ Planned schedule of work:

 ## work notes

-* An interrupted PUT to cluster that has a node that is a special remote
-  over http leaves open the connection to the cluster, so the next request
-  opens another one.
-
-  So does an interrupted PUT directly to the proxied
-  special remote over http.
-
 * When part of a file has been sent to a cluster via the http server,
  the transfer interrupted, and another node is added to the cluster,
  and the transfer of the file performed again, there is a failure