clean shut down of cluster connection when PUT is interrupted

An interrupted `git-annex copy --to` a cluster via the http server,
when repeated, failed. The http server output "transfer already in
progress, or unable to take transfer lock". Apparently a second
connection was opened to the cluster, because the first connection
never got shut down.

Turned out the problem was that when proxying to a cluster, it would read a
short ByteString from the client, and send that to the nodes. But that left the
nodes warning more. Meanwhile, the proxy was expecting a SUCCESS/FAILURE
message from the nodes. So it didn't return, and so the cluster connection
stayed open.
This commit is contained in:
Joey Hess 2024-07-28 14:15:28 -04:00
parent bdde6d829c
commit 5e205f215d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 13 additions and 10 deletions

View file

@ -558,7 +558,6 @@ proxyRequest proxydone proxyparams requestcomplete requestmessage protoerrhandle
(const protoerr)
relayPUTMulti minoffset remotes k (Len datalen) _ = do
let totallen = datalen + minoffset
-- Tell each remote how much data to expect, depending
-- on the remote's offset.
rs <- forMC (proxyConcurrencyConfig proxyparams) remotes $ \r@(remoteside, remoteoffset) ->
@ -569,6 +568,8 @@ proxyRequest proxydone proxyparams requestcomplete requestmessage protoerrhandle
protoerrhandler (send (catMaybes rs) minoffset) $
client $ net $ receiveBytes (Len datalen) nullMeterUpdate
where
totallen = datalen + minoffset
chunksize = fromIntegral defaultChunkSize
-- Stream the lazy bytestring out to the remotes in chunks.
@ -593,13 +594,21 @@ proxyRequest proxydone proxyparams requestcomplete requestmessage protoerrhandle
return r
else return (Just r)
if L.null b'
then sent (catMaybes rs')
then do
-- If we didn't receive as much
-- data as expected, close
-- connections to all the remotes,
-- because they are still waiting
-- on the rest of the data.
when (n' /= totallen) $
mapM_ (closeRemoteSide . fst) rs
sent (catMaybes rs')
else send (catMaybes rs') n' b'
sent [] = proxydone
sent rs = relayDATAFinishMulti k (map fst rs)
runRemoteSideOrSkipFailed remoteside a =
runRemoteSideOrSkipFailed remoteside a =
runRemoteSide remoteside a >>= \case
Right v -> return (Just v)
Left _ -> do
@ -640,7 +649,7 @@ proxyRequest proxydone proxyparams requestcomplete requestmessage protoerrhandle
net receiveMessage
where
finish a = do
storeduuids <- forMC (proxyConcurrencyConfig proxyparams) rs $ \r ->
storeduuids <- forMC (proxyConcurrencyConfig proxyparams) rs $ \r ->
runRemoteSideOrSkipFailed r a >>= \case
Just (Just resp) ->
relayPUTRecord k r resp

View file

@ -28,12 +28,6 @@ Planned schedule of work:
## work notes
* An interrupted `git-annex copy --to` a cluster via the http server,
when repeated, fails. The http server outputs "transfer already in
progress, or unable to take transfer lock". Apparently a second
connection gets opened to the cluster, because the first connection
never got shut down.
* When part of a file has been sent to a cluster via the http server,
the transfer interrupted, and another node is added to the cluster,
and the transfer of the file performed again, there is a failure