disable buggy concurrency in Command.Export

Fix a crash or potentially not all files being exported when sync -J
--content is used with an export remote.

Crash as described in fixed bug report.

waitForAllRunningCommandActions inserted in several points where all the
commandActions started before need to have finished before moving on to
the next stage of the export. A race across those points could have
maybe resulted in not all files being exported, or a wrong tree being
export.

For example, changeExport starting up an action like
a rename of A to B. Then, with that action still running, fillExport
uploading a new A, *before* the rename occurred. That race seems
unlikely to have happened. There are some other ones that this also
fixes.
This commit is contained in:
Joey Hess 2020-05-26 13:44:43 -04:00
parent 51432a6704
commit 864ba4ecaa
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 52 additions and 8 deletions

View file

@ -4,6 +4,8 @@ git-annex (8.20200523) UNRELEASED; urgency=medium
* import: Added --json-progress.
* addurl: Make --preserve-filename also apply when eg a torrent contains
multiple files.
* Fix a crash or potentially not all files being exported when
sync -J --content is used with an export remote.
* export: Let concurrent transfers be done with -J or annex.jobs.
* move --to, copy --to, mirror --to: When concurrency is enabled, run
cleanup actions in separate job pool from uploads.

View file

@ -1,6 +1,6 @@
{- git-annex command-line actions and concurrency
-
- Copyright 2010-2019 Joey Hess <id@joeyh.name>
- Copyright 2010-2020 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -136,6 +136,15 @@ finishCommandActions = Annex.getState Annex.workers >>= \case
else retry
mapM_ mergeState sts
{- Waits for all worker threads that have been started so far to finish. -}
waitForAllRunningCommandActions :: Annex ()
waitForAllRunningCommandActions = Annex.getState Annex.workers >>= \case
Nothing -> noop
Just tv -> liftIO $ atomically $ do
pool <- readTMVar tv
unless (allIdle pool)
retry
{- Like commandAction, but without the concurrency. -}
includeCommandAction :: CommandStart -> CommandCleanup
includeCommandAction start =

View file

@ -136,6 +136,7 @@ changeExport r db (PreferredFiltered new) = do
(Git.DiffTree.file diff)
forM_ (incompleteExportedTreeishes old) $ \incomplete ->
mapdiff recover incomplete new
waitForAllRunningCommandActions
-- Diff the old and new trees, and delete or rename to new name all
-- changed files in the export. After this, every file that remains
@ -158,12 +159,14 @@ changeExport r db (PreferredFiltered new) = do
(Just oldf, Nothing) ->
startUnexport' r db oldf ek
_ -> stop
waitForAllRunningCommandActions
-- Rename from temp to new files.
seekdiffmap $ \(ek, (moldf, mnewf)) ->
case (moldf, mnewf) of
(Just _oldf, Just newf) ->
startMoveFromTempName r db ek newf
_ -> stop
waitForAllRunningCommandActions
ts -> do
warning "Resolving export conflict.."
forM_ ts $ \oldtreesha -> do
@ -181,6 +184,7 @@ changeExport r db (PreferredFiltered new) = do
(\diff -> commandAction $ startUnexport r db (Git.DiffTree.file diff) (unexportboth diff))
oldtreesha new
updateExportTree db emptyTree new
waitForAllRunningCommandActions
liftIO $ recordExportTreeCurrent db new
-- Waiting until now to record the export guarantees that,
@ -239,6 +243,7 @@ fillExport r db (PreferredFiltered newtree) mtbcommitsha = do
allfilledvar <- liftIO $ newMVar (AllFilled True)
commandActions $ map (startExport r db cvar allfilledvar) l
void $ liftIO $ cleanup
waitForAllRunningCommandActions
case mtbcommitsha of
Nothing -> noop
@ -484,3 +489,4 @@ filterPreferredContent r tree = logExportExcluded (uuid r) $ \logwriter -> do
)
-- Always export non-annexed files.
Nothing -> return (Just ti)

View file

@ -1,13 +1,40 @@
git annex sync exportremote -J2 --content
git-annex: thread blocked indefinitely in an MVar operation
failed
git-annex: thread blocked indefinitely in an STM transaction
git annex sync exportremote -J2 --content
...
git-annex: thread blocked indefinitely in an MVar operation
failed
git-annex: thread blocked indefinitely in an STM transaction
Also, git-annex export -J2 crashes the same way. I discovered this bug
when adding -J to export, but then found sync had the same bug.
To reproduce this, there may need there to be a tree of several annexed
files whose content is not locally available. In my case,
there were 338 of them. It seems to export all, or all but 1
before crashing. --[[Joey]]
there were 338 of them. It seems to act on almost all before
crashing. --[[Joey]]
----
It's crashing in finishCommandActions. DebugLocks does not show a backtrace.
Dumping the worker pool inside the crashing STM action, it looks like this:
WorkerPool UsedStages {initialStage = PerformStage, stageSet = fromList [PerformStage,CleanupStage]} [IdleWorker StartStage,ActiveWorker PerformStage,IdleWorker PerformStage,IdleWorker StartStage,IdleWorker PerformStage,IdleWorker StartStage,IdleWorker CleanupStage,IdleWorker CleanupStage,IdleWorker CleanupStage] 8
Always ends with an ActiveWorker PerformStage. So a worker thread is
apparently still running, but the retry blocks indefinitely, so
somehow the worker thread never transitions back to idle.
Also, the MVar crash is not from this code, so maybe the MVar crash is
the real culprit and it just also leads to the STM crash.
---
Added debugLocks to the MVar uses in Command.Export, and it's
the one in failedsend that is causing the MVar deadlock. So that must be
the root cause. Looks like fillExport is starting threads with
commandActions, but then assumes they'll all be done, so takes a MVar,
before all the threads *are* done, so a thread tries to modify the MVar and
deadlocks.
Ok, [[fixed|done]] by using includeCommandAction instead, although that
does reduce the actual concurrency. --[[Joey]]