fix MVar deadlock when sqlite commit fails

The database queue was left empty, which caused subsequent calls to
flushDbQueue to deadlock.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2022-06-06 12:16:55 -04:00
parent 7851d8fb42
commit 331c97df88
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 29 additions and 4 deletions

View file

@ -1,6 +1,6 @@
{- Persistent sqlite database queues
-
- Copyright 2015 Joey Hess <id@joeyh.name>
- Copyright 2015-2022 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -20,6 +20,7 @@ module Database.Queue (
import Utility.Monad
import Utility.RawFilePath
import Utility.DebugLocks
import Utility.Exception
import Database.Handle
import Database.Persist.Sqlite
@ -54,9 +55,11 @@ flushDbQueue :: DbQueue -> IO ()
flushDbQueue (DQ hdl qvar) = do
q@(Queue sz _ qa) <- debugLocks $ takeMVar qvar
if sz > 0
then do
commitDb hdl qa
debugLocks $ putMVar qvar =<< emptyQueue
then tryNonAsync (commitDb hdl qa) >>= \case
Right () -> debugLocks $ putMVar qvar =<< emptyQueue
Left e -> do
debugLocks $ putMVar qvar q
throwM e
else debugLocks $ putMVar qvar q
{- Makes a query using the DbQueue's database connection.

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="joey"
subject="""comment 19"""
date="2022-06-06T15:55:15Z"
content="""
So in flushDbQueue, which is waiting for all queued writes to complete.
When the write failed with an exception, a previous flushDbQueue
would have left the queue's MVar empty.
So, now I understand how the original problem can lead to this MVar
problem. And I've fixed that part of it. Now a write failing this way will
refill the queue with what it failed to write. So it will try to write it
again later.
This flushDbQueue probably also explains the hang at
"recording state in git" since there is also a final flushDbQueue at that
point, and I guess it fails to detect a deadlock at that point so just
hangs forever. So my fix should also avoid that.
None of which means this is fixed, really, just the fallout from the
write timeout problem will be less severe now.
"""]]