fix drop hang reported by musicmatze

Fix hang when dropping content needs to lock the content on a ssh remote,
which occurred when the remote has git-annex version 5.20151019 or newer.

Analysis: `race` runs 2 threads at once, and the hGetLine finishes first.
So, it tries to cancel the waitForProcess, but unfortunately that is making
a foreign call and so cannot be canceled. The remote git-annex-shell
is waiting for a line on stdin before it will exit. Deadlock.

This only occurred sometimes; I reproduced it going from darkstar to
elephant, but not from darkstar to darkstar. Not sure how that fits into
the above analysis -- perhaps a race condition is also involved?

Fixed by not using `race`; now the hGetLine will fail with an exception
if the remote git-annex-shell exits without any output.
This commit is contained in:
Joey Hess 2016-04-18 14:04:20 -04:00
parent f24dba7616
commit 2d7e46ea98
Failed to extract signature
2 changed files with 13 additions and 12 deletions
Remote
debian

View file

@ -57,7 +57,6 @@ import Types.NumCopies
import Control.Concurrent
import Control.Concurrent.MSampleVar
import Control.Concurrent.Async
import qualified Data.Map as M
import Network.URI
@ -387,17 +386,14 @@ lockKey r key callback
, std_out = CreatePipe
, std_err = UseHandle nullh
}
-- Wait for either the process to exit, or for it to
-- indicate the content is locked.
v <- liftIO $ race
(waitForProcess p)
(hGetLine hout)
let signaldone = void $ tryNonAsync $ liftIO $ do
hPutStrLn hout ""
hFlush hout
hClose hin
hClose hout
void $ waitForProcess p
v <- liftIO $ tryIO $ hGetLine hout
let signaldone = void $ tryNonAsync $ liftIO $ mapM_ tryNonAsync
[ hPutStrLn hout ""
, hFlush hout
, hClose hin
, hClose hout
, void $ waitForProcess p
]
let checkexited = not . isJust <$> getProcessExitCode p
case v of
Left _exited -> do
@ -405,6 +401,7 @@ lockKey r key callback
liftIO $ do
hClose hin
hClose hout
void $ waitForProcess p
failedlock
Right l
| l == Ssh.contentLockedMarker -> bracket_

4
debian/changelog vendored
View file

@ -17,6 +17,10 @@ git-annex (6.20160413) UNRELEASED; urgency=medium
* fsck: Warn when core.sharedRepository is set and an annex object file's
write bit is not set and cannot be set due to the file being owned
by a different user.
* Fix hang when dropping content needs to lock the content on a
ssh remote, which occurred when the remote has git-annex version
5.20151019 or newer. (The bug was in the client side; the remote
git-annex-shell does not need to be upgraded.)
-- Joey Hess <id@joeyh.name> Wed, 13 Apr 2016 13:30:32 -0400