much flailing
This commit is contained in:
parent
58be3af084
commit
0096db7b42
3 changed files with 120 additions and 0 deletions
|
@ -0,0 +1,20 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-11-13T16:34:34Z"
|
||||
content="""
|
||||
Reproduced.
|
||||
|
||||
After building git-annex with the DebugLocks flag, I got this:
|
||||
|
||||
debugLocks, called at ./Annex/Transfer.hs:248:18 in main:Annex.Transfer
|
||||
debugLocks, called at ./CmdLine/Action.hs:263:26 in main:CmdLine.Action
|
||||
|
||||
Which points to pickRemote and ensureOnlyActionOn. But pickRemote
|
||||
does no STM actions when there's only 1 remote, so it must really be
|
||||
the latter.
|
||||
|
||||
Also, I notice that when 5 files to get are provided, it crashes, but with
|
||||
less than 5, it succeeds.
|
||||
Even this trivial case crashes: `git annex get -J1 1 2`
|
||||
"""]]
|
|
@ -0,0 +1,83 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2019-11-13T17:07:29Z"
|
||||
content="""
|
||||
Ok, I see the bug. ensureOnlyActionOn does a STM
|
||||
retry if it finds in the activekeys map some other thread
|
||||
is operating on the same key.
|
||||
But, there is no running STM transaction what will update
|
||||
the map. So, STM detects that the retry would deadlock.
|
||||
|
||||
It's not really a deadlock, because once the other thread finishes,
|
||||
it will update the map to remove itself. But STM can't know that.
|
||||
The solution will be to not use STM for waiting on the other thread.
|
||||
|
||||
Hmm, I tried the obvious approach, using a MVar semaphore to wait for the
|
||||
thread, but that just resulted in more STM and MVar deadlocks.
|
||||
|
||||
I don't understand why after puzzling over it for two hours. I did
|
||||
instrument all calls to atomically, and it looks, unfortunately, like
|
||||
the one in finishCommandActions is deadlocking. If the problem extends
|
||||
beyond ensureOnlyActionOn it may be much more complicated.
|
||||
|
||||
Patch that does not work and I don't know why.
|
||||
|
||||
diff --git a/CmdLine/Action.hs b/CmdLine/Action.hs
|
||||
index 87298a95f..bf4bdd589 100644
|
||||
--- a/CmdLine/Action.hs
|
||||
+++ b/CmdLine/Action.hs
|
||||
@@ -268,16 +268,30 @@ ensureOnlyActionOn k a = debugLocks $
|
||||
go ConcurrentPerCpu = goconcurrent
|
||||
goconcurrent = do
|
||||
tv <- Annex.getState Annex.activekeys
|
||||
- bracket (setup tv) id (const a)
|
||||
- setup tv = liftIO $ do
|
||||
+ bracketIO (setup tv) id (const a)
|
||||
+ setup tv = do
|
||||
+ mysem <- newEmptyMVar
|
||||
mytid <- myThreadId
|
||||
- atomically $ do
|
||||
+ finishsetup <- atomically $ do
|
||||
m <- readTVar tv
|
||||
case M.lookup k m of
|
||||
- Just tid
|
||||
- | tid /= mytid -> retry
|
||||
- | otherwise -> return $ return ()
|
||||
+ Just (tid, theirsem)
|
||||
+ | tid /= mytid -> return $ do
|
||||
+ -- wait for the other
|
||||
+ -- thread to finish, and
|
||||
+ -- retry (STM retry would
|
||||
+ -- deadlock)
|
||||
+ readMVar theirsem
|
||||
+ setup tv
|
||||
+ | otherwise -> return $
|
||||
+ -- same thread, so no
|
||||
+ -- blocking
|
||||
+ return $ return ()
|
||||
Nothing -> do
|
||||
- writeTVar tv $! M.insert k mytid m
|
||||
- return $ liftIO $ atomically $
|
||||
- modifyTVar tv $ M.delete k
|
||||
+ writeTVar tv $! M.insert k (mytid, mysem) m
|
||||
+ return $ return $ do
|
||||
+ atomically $ modifyTVar tv $
|
||||
+ M.delete k
|
||||
+ -- indicate finished
|
||||
+ putMVar mysem ()
|
||||
+ finishsetup
|
||||
diff --git a/Annex.hs b/Annex.hs
|
||||
index 9eb4c5f39..936399ae7 100644
|
||||
--- a/Annex.hs
|
||||
+++ b/Annex.hs
|
||||
@@ -143,7 +143,7 @@ data AnnexState = AnnexState
|
||||
, existinghooks :: M.Map Git.Hook.Hook Bool
|
||||
, desktopnotify :: DesktopNotify
|
||||
, workers :: Maybe (TMVar (WorkerPool AnnexState))
|
||||
- , activekeys :: TVar (M.Map Key ThreadId)
|
||||
+ , activekeys :: TVar (M.Map Key (ThreadId, MVar ()))
|
||||
, activeremotes :: MVar (M.Map (Types.Remote.RemoteA Annex) Integer)
|
||||
, keysdbhandle :: Maybe Keys.DbHandle
|
||||
, cachedcurrentbranch :: (Maybe (Maybe Git.Branch, Maybe Adjustment))
|
||||
"""]]
|
|
@ -0,0 +1,17 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2019-11-13T19:07:49Z"
|
||||
content="""
|
||||
Tried going back to c04b2af3e1a8316e7cf640046ad0aa68826650ed,
|
||||
which is before the separation of perform and cleanup stages.
|
||||
The same code was in onlyActionOn back then. And the test case does not
|
||||
crash.
|
||||
|
||||
So, that gives a good commit to start a bisection. Which will probably
|
||||
find the bug was introduced in the separation of perform and cleanup stages,
|
||||
because that added a lot of STM complexity.
|
||||
|
||||
(Have to cherry-pick 018b5b81736a321f3eb9762a2afb7124e19dbdf9
|
||||
onto those old commits to make them build with current libraries.)
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue