retry when sqlite throws ErrorIO
I suspect this may be due to SQLITE_IOERR_SHORT_READ, but have not
verified.
I was able to reproduce it on Linux after running the test suite in a loop
for 1-3 hours until it failed.
The WAL mode entry change in 3963c5fcf5
may have hidden the problem I was seeing; I have not seen an ErrorIO
since then.
This commit is contained in:
parent
3963c5fcf5
commit
1428568554
3 changed files with 82 additions and 25 deletions
|
@ -31,7 +31,6 @@ import Control.Exception (throwIO, BlockedIndefinitelyOnMVar(..))
|
||||||
import qualified Data.Text as T
|
import qualified Data.Text as T
|
||||||
import Control.Monad.Trans.Resource (runResourceT)
|
import Control.Monad.Trans.Resource (runResourceT)
|
||||||
import Control.Monad.Logger (runNoLoggingT)
|
import Control.Monad.Logger (runNoLoggingT)
|
||||||
import Data.List
|
|
||||||
import System.IO
|
import System.IO
|
||||||
|
|
||||||
{- A DbHandle is a reference to a worker thread that communicates with
|
{- A DbHandle is a reference to a worker thread that communicates with
|
||||||
|
@ -175,36 +174,68 @@ workerThread db tablename jobs = go
|
||||||
liftIO (a (runSqliteRobustly tablename db))
|
liftIO (a (runSqliteRobustly tablename db))
|
||||||
loop
|
loop
|
||||||
|
|
||||||
-- like runSqlite, but calls settle on the raw sql Connection.
|
-- Like runSqlite, but more robust.
|
||||||
|
--
|
||||||
|
-- New database connections can sometimes take a while to become usable.
|
||||||
|
-- This may be due to WAL mode recovering after a crash, or perhaps a bug
|
||||||
|
-- like described in blob 500f777a6ab6c45ca5f9790e0a63575f8e3cb88f.
|
||||||
|
-- So, loop until a select succeeds; once one succeeds the connection will
|
||||||
|
-- stay usable.
|
||||||
|
--
|
||||||
|
-- And sqlite sometimes throws ErrorIO when there's not really an IO problem,
|
||||||
|
-- but perhaps just a short read(). That's caught and retried several times.
|
||||||
runSqliteRobustly :: TableName -> T.Text -> (SqlPersistM a) -> IO a
|
runSqliteRobustly :: TableName -> T.Text -> (SqlPersistM a) -> IO a
|
||||||
runSqliteRobustly tablename db a = do
|
runSqliteRobustly tablename db a = do
|
||||||
conn <- Sqlite.open db
|
conn <- opensettle maxretries
|
||||||
settle conn
|
go conn maxretries
|
||||||
runResourceT $ runNoLoggingT $
|
|
||||||
withSqlConn (wrapConnection conn) $
|
|
||||||
runSqlConn a
|
|
||||||
where
|
where
|
||||||
-- Work around a bug in sqlite: New database connections can
|
maxretries = 100 :: Int
|
||||||
-- sometimes take a while to become usable; select statements will
|
|
||||||
-- fail with ErrorBusy for some time. So, loop until a select
|
rethrow msg e = throwIO $ userError $ show e ++ "(" ++ msg ++ ")"
|
||||||
-- succeeds; once one succeeds the connection will stay usable.
|
|
||||||
--
|
go conn retries = do
|
||||||
-- I reported this bug, but the sqlite developers did not respond.
|
r <- try $ runResourceT $ runNoLoggingT $
|
||||||
-- Bug report is archived in blob 500f777a6ab6c45ca5f9790e0a63575f8e3cb88f
|
withSqlConn (wrapConnection conn) $
|
||||||
-- in git-annex git repo.
|
runSqlConn a
|
||||||
settle conn = do
|
case r of
|
||||||
r <- tryNonAsync $ do
|
Right v -> return v
|
||||||
|
Left ex@(Sqlite.SqliteException { Sqlite.seError = e })
|
||||||
|
| e == Sqlite.ErrorIO ->
|
||||||
|
let retries' = retries - 1
|
||||||
|
in if retries' < 1
|
||||||
|
then rethrow "after successful open" ex
|
||||||
|
else go conn retries'
|
||||||
|
| otherwise -> rethrow "after successful open" ex
|
||||||
|
|
||||||
|
opensettle retries = do
|
||||||
|
conn <- Sqlite.open db
|
||||||
|
settle conn retries
|
||||||
|
|
||||||
|
settle conn retries = do
|
||||||
|
r <- try $ do
|
||||||
stmt <- Sqlite.prepare conn nullselect
|
stmt <- Sqlite.prepare conn nullselect
|
||||||
void $ Sqlite.step stmt
|
void $ Sqlite.step stmt
|
||||||
void $ Sqlite.finalize stmt
|
void $ Sqlite.finalize stmt
|
||||||
case r of
|
case r of
|
||||||
Right _ -> return ()
|
Right _ -> return conn
|
||||||
Left e -> do
|
Left ex@(Sqlite.SqliteException { Sqlite.seError = e })
|
||||||
if "ErrorBusy" `isInfixOf` show e
|
| e == Sqlite.ErrorBusy -> do
|
||||||
then do
|
-- Wait and retry any number of times; it
|
||||||
threadDelay 1000 -- 1/1000th second
|
-- will stop being busy eventually.
|
||||||
settle conn
|
briefdelay
|
||||||
else throwIO e
|
settle conn retries
|
||||||
|
| e == Sqlite.ErrorIO -> do
|
||||||
|
-- Could be a real IO error,
|
||||||
|
-- so don't retry indefinitely.
|
||||||
|
Sqlite.close conn
|
||||||
|
briefdelay
|
||||||
|
let retries' = retries - 1
|
||||||
|
if retries' < 1
|
||||||
|
then rethrow "while opening database connection" ex
|
||||||
|
else opensettle retries'
|
||||||
|
| otherwise -> rethrow "while opening database connection" ex
|
||||||
|
|
||||||
-- This should succeed for any table.
|
-- This should succeed for any table.
|
||||||
nullselect = T.pack $ "SELECT null from " ++ tablename ++ " limit 1"
|
nullselect = T.pack $ "SELECT null from " ++ tablename ++ " limit 1"
|
||||||
|
|
||||||
|
briefdelay = threadDelay 1000 -- 1/1000th second
|
||||||
|
|
|
@ -59,4 +59,4 @@ Oh yeah, I am still discovering this powerfull git annex tool.
|
||||||
|
|
||||||
In fact, collegues and I are forming a group during the process to exchange about different use cases, encountered problems and help each other.
|
In fact, collegues and I are forming a group during the process to exchange about different use cases, encountered problems and help each other.
|
||||||
|
|
||||||
[[!meta title="sqlite crash on Windows running linux binary via WSL"]]
|
[[!meta title="v7: intermittent sqlite ErrorIO crash (especially in WSL)"]]
|
||||||
|
|
|
@ -0,0 +1,26 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 6"""
|
||||||
|
date="2018-10-30T16:23:21Z"
|
||||||
|
content="""
|
||||||
|
I saw a different "SQLite3 returned ErrorIO while attempting to perform step",
|
||||||
|
which seems likely to be closely related to this problem.
|
||||||
|
(The "step" action happens right after the "prepare" action.)
|
||||||
|
|
||||||
|
failed to commit changes to sqlite database: Just SQLite3 returned ErrorIO while attempting to perform step.
|
||||||
|
CallStack (from HasCallStack):
|
||||||
|
error, called at ./Database/Handle.hs:116:26 in main:Database.Handle
|
||||||
|
|
||||||
|
The test suite occasionally fails due to this, and it seems always in
|
||||||
|
`test_lock_v7_force`. Which notably deletes the sqlite database just before
|
||||||
|
the failure, and so causes it to be re-created.
|
||||||
|
|
||||||
|
Dumping the keys database after such a failure, it is freshly created,
|
||||||
|
contains the tables but no data has been written to it.
|
||||||
|
|
||||||
|
I've made git-annex catch ErrorIO and retry. Not sure if that fixes the bug,
|
||||||
|
but it may.
|
||||||
|
|
||||||
|
Please try, if you can, the new git-annex version 7, and see if this bug
|
||||||
|
still happens.
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue