disable buggy concurrency in Command.Export
Fix a crash or potentially not all files being exported when sync -J --content is used with an export remote. Crash as described in fixed bug report. waitForAllRunningCommandActions inserted in several points where all the commandActions started before need to have finished before moving on to the next stage of the export. A race across those points could have maybe resulted in not all files being exported, or a wrong tree being export. For example, changeExport starting up an action like a rename of A to B. Then, with that action still running, fillExport uploading a new A, *before* the rename occurred. That race seems unlikely to have happened. There are some other ones that this also fixes.
This commit is contained in:
parent
51432a6704
commit
864ba4ecaa
4 changed files with 52 additions and 8 deletions
|
@ -4,6 +4,8 @@ git-annex (8.20200523) UNRELEASED; urgency=medium
|
|||
* import: Added --json-progress.
|
||||
* addurl: Make --preserve-filename also apply when eg a torrent contains
|
||||
multiple files.
|
||||
* Fix a crash or potentially not all files being exported when
|
||||
sync -J --content is used with an export remote.
|
||||
* export: Let concurrent transfers be done with -J or annex.jobs.
|
||||
* move --to, copy --to, mirror --to: When concurrency is enabled, run
|
||||
cleanup actions in separate job pool from uploads.
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
{- git-annex command-line actions and concurrency
|
||||
-
|
||||
- Copyright 2010-2019 Joey Hess <id@joeyh.name>
|
||||
- Copyright 2010-2020 Joey Hess <id@joeyh.name>
|
||||
-
|
||||
- Licensed under the GNU AGPL version 3 or higher.
|
||||
-}
|
||||
|
@ -136,6 +136,15 @@ finishCommandActions = Annex.getState Annex.workers >>= \case
|
|||
else retry
|
||||
mapM_ mergeState sts
|
||||
|
||||
{- Waits for all worker threads that have been started so far to finish. -}
|
||||
waitForAllRunningCommandActions :: Annex ()
|
||||
waitForAllRunningCommandActions = Annex.getState Annex.workers >>= \case
|
||||
Nothing -> noop
|
||||
Just tv -> liftIO $ atomically $ do
|
||||
pool <- readTMVar tv
|
||||
unless (allIdle pool)
|
||||
retry
|
||||
|
||||
{- Like commandAction, but without the concurrency. -}
|
||||
includeCommandAction :: CommandStart -> CommandCleanup
|
||||
includeCommandAction start =
|
||||
|
|
|
@ -136,6 +136,7 @@ changeExport r db (PreferredFiltered new) = do
|
|||
(Git.DiffTree.file diff)
|
||||
forM_ (incompleteExportedTreeishes old) $ \incomplete ->
|
||||
mapdiff recover incomplete new
|
||||
waitForAllRunningCommandActions
|
||||
|
||||
-- Diff the old and new trees, and delete or rename to new name all
|
||||
-- changed files in the export. After this, every file that remains
|
||||
|
@ -158,12 +159,14 @@ changeExport r db (PreferredFiltered new) = do
|
|||
(Just oldf, Nothing) ->
|
||||
startUnexport' r db oldf ek
|
||||
_ -> stop
|
||||
waitForAllRunningCommandActions
|
||||
-- Rename from temp to new files.
|
||||
seekdiffmap $ \(ek, (moldf, mnewf)) ->
|
||||
case (moldf, mnewf) of
|
||||
(Just _oldf, Just newf) ->
|
||||
startMoveFromTempName r db ek newf
|
||||
_ -> stop
|
||||
waitForAllRunningCommandActions
|
||||
ts -> do
|
||||
warning "Resolving export conflict.."
|
||||
forM_ ts $ \oldtreesha -> do
|
||||
|
@ -181,6 +184,7 @@ changeExport r db (PreferredFiltered new) = do
|
|||
(\diff -> commandAction $ startUnexport r db (Git.DiffTree.file diff) (unexportboth diff))
|
||||
oldtreesha new
|
||||
updateExportTree db emptyTree new
|
||||
waitForAllRunningCommandActions
|
||||
liftIO $ recordExportTreeCurrent db new
|
||||
|
||||
-- Waiting until now to record the export guarantees that,
|
||||
|
@ -239,6 +243,7 @@ fillExport r db (PreferredFiltered newtree) mtbcommitsha = do
|
|||
allfilledvar <- liftIO $ newMVar (AllFilled True)
|
||||
commandActions $ map (startExport r db cvar allfilledvar) l
|
||||
void $ liftIO $ cleanup
|
||||
waitForAllRunningCommandActions
|
||||
|
||||
case mtbcommitsha of
|
||||
Nothing -> noop
|
||||
|
@ -484,3 +489,4 @@ filterPreferredContent r tree = logExportExcluded (uuid r) $ \logwriter -> do
|
|||
)
|
||||
-- Always export non-annexed files.
|
||||
Nothing -> return (Just ti)
|
||||
|
||||
|
|
|
@ -1,13 +1,40 @@
|
|||
git annex sync exportremote -J2 --content
|
||||
|
||||
git-annex: thread blocked indefinitely in an MVar operation
|
||||
failed
|
||||
git-annex: thread blocked indefinitely in an STM transaction
|
||||
git annex sync exportremote -J2 --content
|
||||
...
|
||||
git-annex: thread blocked indefinitely in an MVar operation
|
||||
failed
|
||||
git-annex: thread blocked indefinitely in an STM transaction
|
||||
|
||||
Also, git-annex export -J2 crashes the same way. I discovered this bug
|
||||
when adding -J to export, but then found sync had the same bug.
|
||||
|
||||
To reproduce this, there may need there to be a tree of several annexed
|
||||
files whose content is not locally available. In my case,
|
||||
there were 338 of them. It seems to export all, or all but 1
|
||||
before crashing. --[[Joey]]
|
||||
there were 338 of them. It seems to act on almost all before
|
||||
crashing. --[[Joey]]
|
||||
|
||||
----
|
||||
|
||||
It's crashing in finishCommandActions. DebugLocks does not show a backtrace.
|
||||
|
||||
Dumping the worker pool inside the crashing STM action, it looks like this:
|
||||
|
||||
WorkerPool UsedStages {initialStage = PerformStage, stageSet = fromList [PerformStage,CleanupStage]} [IdleWorker StartStage,ActiveWorker PerformStage,IdleWorker PerformStage,IdleWorker StartStage,IdleWorker PerformStage,IdleWorker StartStage,IdleWorker CleanupStage,IdleWorker CleanupStage,IdleWorker CleanupStage] 8
|
||||
|
||||
Always ends with an ActiveWorker PerformStage. So a worker thread is
|
||||
apparently still running, but the retry blocks indefinitely, so
|
||||
somehow the worker thread never transitions back to idle.
|
||||
|
||||
Also, the MVar crash is not from this code, so maybe the MVar crash is
|
||||
the real culprit and it just also leads to the STM crash.
|
||||
|
||||
---
|
||||
|
||||
Added debugLocks to the MVar uses in Command.Export, and it's
|
||||
the one in failedsend that is causing the MVar deadlock. So that must be
|
||||
the root cause. Looks like fillExport is starting threads with
|
||||
commandActions, but then assumes they'll all be done, so takes a MVar,
|
||||
before all the threads *are* done, so a thread tries to modify the MVar and
|
||||
deadlocks.
|
||||
|
||||
Ok, [[fixed|done]] by using includeCommandAction instead, although that
|
||||
does reduce the actual concurrency. --[[Joey]]
|
||||
|
|
Loading…
Reference in a new issue