fix git command queue to be concurrency safe
Probably not noticed until now because the queue is large enough that two threads each filling theirs at the same time and flushing is unlikely to happen. Also made explicit that each worker thread gets its own queue. I think that was the case before, but if something was put in the queue before worker threads were forked off, they could have each inherited the same queue. Could have gone with a single shared queue, but per-worker queues is more efficient, because a worker can add lots of stuff to its own queue without any locking. This commit was sponsored by Ole-Morten Duesund on Patreon.
This commit is contained in:
parent
3a71f3a4a9
commit
759a87ad70
6 changed files with 31 additions and 13 deletions
4
Annex.hs
4
Annex.hs
|
@ -73,6 +73,7 @@ import "mtl" Control.Monad.Reader
|
|||
import Control.Concurrent
|
||||
import Control.Concurrent.Async
|
||||
import Control.Concurrent.STM
|
||||
import qualified Control.Concurrent.SSem as SSem
|
||||
import qualified Data.Map.Strict as M
|
||||
import qualified Data.Set as S
|
||||
|
||||
|
@ -111,6 +112,7 @@ data AnnexState = AnnexState
|
|||
, daemon :: Bool
|
||||
, branchstate :: BranchState
|
||||
, repoqueue :: Maybe Git.Queue.Queue
|
||||
, repoqueuesem :: SSem.SSem
|
||||
, catfilehandles :: M.Map FilePath CatFileHandle
|
||||
, hashobjecthandle :: Maybe HashObjectHandle
|
||||
, checkattrhandle :: Maybe CheckAttrHandle
|
||||
|
@ -153,6 +155,7 @@ newState c r = do
|
|||
emptyactivekeys <- newTVarIO M.empty
|
||||
o <- newMessageState
|
||||
sc <- newTMVarIO False
|
||||
qsem <- SSem.new 1
|
||||
return $ AnnexState
|
||||
{ repo = r
|
||||
, repoadjustment = return
|
||||
|
@ -168,6 +171,7 @@ newState c r = do
|
|||
, daemon = False
|
||||
, branchstate = startBranchState
|
||||
, repoqueue = Nothing
|
||||
, repoqueuesem = qsem
|
||||
, catfilehandles = M.empty
|
||||
, hashobjecthandle = Nothing
|
||||
, checkattrhandle = Nothing
|
||||
|
|
|
@ -41,9 +41,13 @@ forkState a = do
|
|||
dupState :: Annex AnnexState
|
||||
dupState = do
|
||||
st <- Annex.getState id
|
||||
-- avoid sharing eg, open file handles
|
||||
return $ st
|
||||
{ Annex.workers = []
|
||||
-- each thread has its own repoqueue, but the repoqueuesem
|
||||
-- is shared to prevent more than one thread flushing its
|
||||
-- queue at the same time
|
||||
, Annex.repoqueue = Nothing
|
||||
-- avoid sharing eg, open file handles
|
||||
, Annex.catfilehandles = M.empty
|
||||
, Annex.checkattrhandle = Nothing
|
||||
, Annex.checkignorehandle = Nothing
|
||||
|
|
|
@ -23,6 +23,8 @@ import Annex hiding (new)
|
|||
import qualified Git.Queue
|
||||
import qualified Git.UpdateIndex
|
||||
|
||||
import qualified Control.Concurrent.SSem as SSem
|
||||
|
||||
{- Adds a git command to the queue. -}
|
||||
addCommand :: String -> [CommandParam] -> [FilePath] -> Annex ()
|
||||
addCommand command params files = do
|
||||
|
@ -56,10 +58,23 @@ flush = do
|
|||
unless (0 == Git.Queue.size q) $ do
|
||||
store =<< flush' q
|
||||
|
||||
{- When there are multiple worker threads, each has its own queue.
|
||||
-
|
||||
- But, flushing two queues at the same time could lead to failures due to
|
||||
- git locking files. So, only one queue is allowed to flush at a time.
|
||||
- The repoqueuesem is shared between threads.
|
||||
-}
|
||||
flush' :: Git.Queue.Queue -> Annex Git.Queue.Queue
|
||||
flush' q = do
|
||||
showStoringStateAction
|
||||
inRepo $ Git.Queue.flush q
|
||||
flush' q = bracket lock unlock go
|
||||
where
|
||||
lock = do
|
||||
s <- getState repoqueuesem
|
||||
liftIO $ SSem.wait s
|
||||
return s
|
||||
unlock = liftIO . SSem.signal
|
||||
go _ = do
|
||||
showStoringStateAction
|
||||
inRepo $ Git.Queue.flush q
|
||||
|
||||
{- Gets the size of the queue. -}
|
||||
size :: Annex Int
|
||||
|
|
|
@ -19,6 +19,7 @@ git-annex (6.20180808) UNRELEASED; urgency=medium
|
|||
* v6: Update associated files database when git has staged changes
|
||||
to pointer files.
|
||||
* v6: Fix some race conditions.
|
||||
* Fix git command queue to be concurrency safe.
|
||||
* linux standalone: When LOCPATH is already set, use it instead of the
|
||||
bundled locales. It can be set to an empty string to use the system
|
||||
locales too.
|
||||
|
|
|
@ -79,3 +79,5 @@ operating system: linux x86_64
|
|||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
[[!meta title="git annex add -J crash due to index somehow getting locked"]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
||||
|
|
|
@ -13,13 +13,5 @@ without it all 10000 files get added ok here.
|
|||
|
||||
It's not specific to v6 at all.
|
||||
|
||||
Hmm, worker threads don't actually get separate
|
||||
Annex.repoqueue; a single queue is inherited
|
||||
from the parent. Two worker threads could try
|
||||
to flush the shared queue at the same time.
|
||||
|
||||
Since the queue updates are also done non-atomically,
|
||||
which looks susceptable to races too,
|
||||
I think each thread needs to get its own queue, but
|
||||
only one thread ought to be allowed to flush its queue at a time.
|
||||
Two worker threads are flushing their queues at the same time.
|
||||
"""]]
|
||||
|
|
Loading…
Reference in a new issue