much flailing
This commit is contained in:
parent
58be3af084
commit
0096db7b42
3 changed files with 120 additions and 0 deletions
|
@ -0,0 +1,20 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 1"""
|
||||||
|
date="2019-11-13T16:34:34Z"
|
||||||
|
content="""
|
||||||
|
Reproduced.
|
||||||
|
|
||||||
|
After building git-annex with the DebugLocks flag, I got this:
|
||||||
|
|
||||||
|
debugLocks, called at ./Annex/Transfer.hs:248:18 in main:Annex.Transfer
|
||||||
|
debugLocks, called at ./CmdLine/Action.hs:263:26 in main:CmdLine.Action
|
||||||
|
|
||||||
|
Which points to pickRemote and ensureOnlyActionOn. But pickRemote
|
||||||
|
does no STM actions when there's only 1 remote, so it must really be
|
||||||
|
the latter.
|
||||||
|
|
||||||
|
Also, I notice that when 5 files to get are provided, it crashes, but with
|
||||||
|
less than 5, it succeeds.
|
||||||
|
Even this trivial case crashes: `git annex get -J1 1 2`
|
||||||
|
"""]]
|
|
@ -0,0 +1,83 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 2"""
|
||||||
|
date="2019-11-13T17:07:29Z"
|
||||||
|
content="""
|
||||||
|
Ok, I see the bug. ensureOnlyActionOn does a STM
|
||||||
|
retry if it finds in the activekeys map some other thread
|
||||||
|
is operating on the same key.
|
||||||
|
But, there is no running STM transaction what will update
|
||||||
|
the map. So, STM detects that the retry would deadlock.
|
||||||
|
|
||||||
|
It's not really a deadlock, because once the other thread finishes,
|
||||||
|
it will update the map to remove itself. But STM can't know that.
|
||||||
|
The solution will be to not use STM for waiting on the other thread.
|
||||||
|
|
||||||
|
Hmm, I tried the obvious approach, using a MVar semaphore to wait for the
|
||||||
|
thread, but that just resulted in more STM and MVar deadlocks.
|
||||||
|
|
||||||
|
I don't understand why after puzzling over it for two hours. I did
|
||||||
|
instrument all calls to atomically, and it looks, unfortunately, like
|
||||||
|
the one in finishCommandActions is deadlocking. If the problem extends
|
||||||
|
beyond ensureOnlyActionOn it may be much more complicated.
|
||||||
|
|
||||||
|
Patch that does not work and I don't know why.
|
||||||
|
|
||||||
|
diff --git a/CmdLine/Action.hs b/CmdLine/Action.hs
|
||||||
|
index 87298a95f..bf4bdd589 100644
|
||||||
|
--- a/CmdLine/Action.hs
|
||||||
|
+++ b/CmdLine/Action.hs
|
||||||
|
@@ -268,16 +268,30 @@ ensureOnlyActionOn k a = debugLocks $
|
||||||
|
go ConcurrentPerCpu = goconcurrent
|
||||||
|
goconcurrent = do
|
||||||
|
tv <- Annex.getState Annex.activekeys
|
||||||
|
- bracket (setup tv) id (const a)
|
||||||
|
- setup tv = liftIO $ do
|
||||||
|
+ bracketIO (setup tv) id (const a)
|
||||||
|
+ setup tv = do
|
||||||
|
+ mysem <- newEmptyMVar
|
||||||
|
mytid <- myThreadId
|
||||||
|
- atomically $ do
|
||||||
|
+ finishsetup <- atomically $ do
|
||||||
|
m <- readTVar tv
|
||||||
|
case M.lookup k m of
|
||||||
|
- Just tid
|
||||||
|
- | tid /= mytid -> retry
|
||||||
|
- | otherwise -> return $ return ()
|
||||||
|
+ Just (tid, theirsem)
|
||||||
|
+ | tid /= mytid -> return $ do
|
||||||
|
+ -- wait for the other
|
||||||
|
+ -- thread to finish, and
|
||||||
|
+ -- retry (STM retry would
|
||||||
|
+ -- deadlock)
|
||||||
|
+ readMVar theirsem
|
||||||
|
+ setup tv
|
||||||
|
+ | otherwise -> return $
|
||||||
|
+ -- same thread, so no
|
||||||
|
+ -- blocking
|
||||||
|
+ return $ return ()
|
||||||
|
Nothing -> do
|
||||||
|
- writeTVar tv $! M.insert k mytid m
|
||||||
|
- return $ liftIO $ atomically $
|
||||||
|
- modifyTVar tv $ M.delete k
|
||||||
|
+ writeTVar tv $! M.insert k (mytid, mysem) m
|
||||||
|
+ return $ return $ do
|
||||||
|
+ atomically $ modifyTVar tv $
|
||||||
|
+ M.delete k
|
||||||
|
+ -- indicate finished
|
||||||
|
+ putMVar mysem ()
|
||||||
|
+ finishsetup
|
||||||
|
diff --git a/Annex.hs b/Annex.hs
|
||||||
|
index 9eb4c5f39..936399ae7 100644
|
||||||
|
--- a/Annex.hs
|
||||||
|
+++ b/Annex.hs
|
||||||
|
@@ -143,7 +143,7 @@ data AnnexState = AnnexState
|
||||||
|
, existinghooks :: M.Map Git.Hook.Hook Bool
|
||||||
|
, desktopnotify :: DesktopNotify
|
||||||
|
, workers :: Maybe (TMVar (WorkerPool AnnexState))
|
||||||
|
- , activekeys :: TVar (M.Map Key ThreadId)
|
||||||
|
+ , activekeys :: TVar (M.Map Key (ThreadId, MVar ()))
|
||||||
|
, activeremotes :: MVar (M.Map (Types.Remote.RemoteA Annex) Integer)
|
||||||
|
, keysdbhandle :: Maybe Keys.DbHandle
|
||||||
|
, cachedcurrentbranch :: (Maybe (Maybe Git.Branch, Maybe Adjustment))
|
||||||
|
"""]]
|
|
@ -0,0 +1,17 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 3"""
|
||||||
|
date="2019-11-13T19:07:49Z"
|
||||||
|
content="""
|
||||||
|
Tried going back to c04b2af3e1a8316e7cf640046ad0aa68826650ed,
|
||||||
|
which is before the separation of perform and cleanup stages.
|
||||||
|
The same code was in onlyActionOn back then. And the test case does not
|
||||||
|
crash.
|
||||||
|
|
||||||
|
So, that gives a good commit to start a bisection. Which will probably
|
||||||
|
find the bug was introduced in the separation of perform and cleanup stages,
|
||||||
|
because that added a lot of STM complexity.
|
||||||
|
|
||||||
|
(Have to cherry-pick 018b5b81736a321f3eb9762a2afb7124e19dbdf9
|
||||||
|
onto those old commits to make them build with current libraries.)
|
||||||
|
"""]]
|
Loading…
Add table
Reference in a new issue