From 7a338031933cbba7b021468ee83bb63fb3d6d42a Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 27 Apr 2011 22:10:57 -0400 Subject: [PATCH] Avoid pipeline stall when running git annex drop or fsck on a lot of files. When it's stalled, there are 3 processes: git annex git ls-files git check-attr git-annex stalls trying to write to git check-attr, which stalls trying to write to stdout (read by git-annex). git ls-files does not seem to be involved directly; I've seen the stall when it was still streaming out the file list, and after it had exited and zombified. The read and write are supposed to be handled by two different threads, which pipeBoth forks off, thus avoiding deadlock. But it does deadlock. (Certian signals unblock the deadlock for a while, then it stalls again.) So, this is another case of WTF is the ghc IO manager doing today? I avoid the issue by converting the writer to a separate process. Possibly this was caused by some change in ghc 7 -- I'm offline and cannot verify now, but I'm sure I used to be able to run git annex drop w/o it hanging! And the code does not seem to have changed, except for commit c1dc4079419cff94cca72441d5e67a866110ec7e, which I tried reverting without success. In fact, I reverted all the way back to 0.20110316 and still saw the stall. Update: Minimal test case: import System.Cmd.Utils main = do as <- checkAttr "blah" $ map show [1..100000] sequence $ map (putStrLn . show) as checkAttr attr files = do (_, s) <- pipeBoth "git" params $ unlines files return $ lines s where params = ["check-attr", attr, "--stdin"] Bug filed on ghc in debian, #624389 --- GitRepo.hs | 10 +++++++++- debian/changelog | 2 ++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/GitRepo.hs b/GitRepo.hs index 2bf320eda2..9ecaa8ffcb 100644 --- a/GitRepo.hs +++ b/GitRepo.hs @@ -78,6 +78,7 @@ import Data.Word (Word8) import Codec.Binary.UTF8.String (encode) import Text.Printf import Data.List (isInfixOf, isPrefixOf) +import System.Exit import Utility @@ -482,7 +483,14 @@ checkAttr repo attr files = do -- in its output back to relative. cwd <- getCurrentDirectory let absfiles = map (absPathFrom cwd) files - (_, s) <- pipeBoth "git" (toCommand params) $ join "\0" absfiles + (_, fromh, toh) <- hPipeBoth "git" (toCommand params) + _ <- forkProcess $ do + hClose fromh + hPutStr toh $ join "\0" absfiles + hClose toh + exitSuccess + hClose toh + s <- hGetContents fromh return $ map (topair $ cwd++"/") $ lines s where params = gitCommandLine repo [Param "check-attr", Param attr, Params "-z --stdin"] diff --git a/debian/changelog b/debian/changelog index 991c9e0bb9..a69f03f5ef 100644 --- a/debian/changelog +++ b/debian/changelog @@ -4,6 +4,8 @@ git-annex (0.20110426) UNRELEASED; urgency=low Debian stable. * Added rsync special remotes. This could be used, for example, to store annexed content on rsync.net (encrypted naturally). Or anywhere else. + * Bugfix: Avoid pipeline stall when running git annex drop or fsck on a + lot of files. Possibly only occured with ghc 7. -- Joey Hess Tue, 26 Apr 2011 11:23:54 -0400