finish CommandStart transition

The hoped for optimisation of CommandStart with -J did not materialize.
In fact, not runnign CommandStart in parallel is slower than -J3.
So, CommandStart are still run in parallel.

(The actual bad performance I've been seeing with -J in my big repo
has to do with building the remoteList.)

But, this is still progress toward making -J faster, because it gets rid
of the onlyActionOn roadblock in the way of making CommandCleanup jobs
run separate from CommandPerform jobs.

Added OnlyActionOn constructor for ActionItem which fixes the
onlyActionOn breakage in the last commit.

Made CustomOutput include an ActionItem, so even things using it can
specify OnlyActionOn.

In Command.Move and Command.Sync, there were CommandStarts that used
includeCommandAction, so output messages, which is no longer allowed.
Fixed by using startingCustomOutput, but that's still not quite right,
since it prevents message display for the includeCommandAction run
inside it too.
This commit is contained in:
Joey Hess 2019-06-12 09:23:26 -04:00
parent 436f107715
commit 8e5ea28c26
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
26 changed files with 142 additions and 97 deletions

View file

@ -47,8 +47,8 @@ type Reason = String
- In direct mode, all associated files are checked, and only if all
- of them are unwanted are they dropped.
-
- The runner is used to run commands, and so can be either callCommand
- or commandAction.
- The runner is used to run CommandStart sequentially, it's typically
- callCommandAction.
-}
handleDropsFrom :: [UUID] -> [Remote] -> Reason -> Bool -> Key -> AssociatedFile -> [VerifiedCopy] -> (CommandStart -> CommandCleanup) -> Annex ()
handleDropsFrom locs rs reason fromhere key afile preverified runner = do

View file

@ -34,6 +34,10 @@ git-annex (7.20190508) UNRELEASED; urgency=medium
in a separate queue than the main action queue. This can make some
commands faster, because less time is spent on bookkeeping in
between each file transfer.
* When a command like git-annex get skips over a lot of files
that it does not need to do anything with, the -J switch used to slow
it down significantly due to unncessary concurrenty overhead.
That slowdown has been fixed.
-- Joey Hess <id@joeyh.name> Mon, 06 May 2019 13:52:02 -0400

View file

@ -1,4 +1,4 @@
{- git-annex command-line actions
{- git-annex command-line actions and concurrency
-
- Copyright 2010-2019 Joey Hess <id@joeyh.name>
-
@ -54,33 +54,70 @@ commandActions = mapM_ commandAction
- This should only be run in the seek stage.
-}
commandAction :: CommandStart -> Annex ()
commandAction a = Annex.getState Annex.concurrency >>= \case
NonConcurrent -> run
commandAction start = Annex.getState Annex.concurrency >>= \case
NonConcurrent -> void $ includeCommandAction start
Concurrent n -> runconcurrent n
ConcurrentPerCpu -> runconcurrent =<< liftIO getNumProcessors
where
run = void $ includeCommandAction a
runconcurrent n = do
tv <- Annex.getState Annex.workers
workerst <- waitWorkerSlot n (== PerformStage) tv
aid <- liftIO $ async $ snd <$> Annex.run workerst
(concurrentjob workerst)
liftIO $ atomically $ do
pool <- takeTMVar tv
let !pool' = addWorkerPool (ActiveWorker aid PerformStage) pool
putTMVar tv pool'
void $ liftIO $ forkIO $ do
aid <- async $ snd <$> Annex.run workerst
(inOwnConsoleRegion (Annex.output workerst) run)
atomically $ do
pool <- takeTMVar tv
let !pool' = addWorkerPool (ActiveWorker aid PerformStage) pool
putTMVar tv pool'
-- There won't usually be exceptions because the
-- async is running includeCommandAction, which
-- catches exceptions. Just in case, avoid
-- stalling by using the original workerst.
-- accountCommandAction will usually catch
-- exceptions. Just in case, fall back to the
-- original workerst.
workerst' <- either (const workerst) id
<$> waitCatch aid
atomically $ do
pool <- takeTMVar tv
let !pool' = deactivateWorker pool aid workerst'
putTMVar tv pool'
concurrentjob workerst = start >>= \case
Nothing -> noop
Just (startmsg, perform) ->
concurrentjob' workerst startmsg perform
concurrentjob' workerst startmsg perform = case mkActionItem startmsg of
OnlyActionOn k _ -> ensureOnlyActionOn k $
-- If another job performed the same action while we
-- waited, there may be nothing left to do, so re-run
-- the start stage to see if it still wants to do
-- something.
start >>= \case
Just (startmsg', perform') ->
case mkActionItem startmsg' of
OnlyActionOn k' _ | k' /= k ->
concurrentjob' workerst startmsg' perform'
_ -> mkjob workerst startmsg' perform'
Nothing -> noop
_ -> mkjob workerst startmsg perform
mkjob workerst startmsg perform =
inOwnConsoleRegion (Annex.output workerst) $
void $ accountCommandAction $
performconcurrent startmsg perform
-- Like callCommandAction, but the start stage has already run,
-- and the worker thread's stage is changed before starting the
-- cleanup action.
performconcurrent startmsg perform = do
showStartMessage startmsg
perform >>= \case
Just cleanup -> do
changeStageTo CleanupStage
r <- cleanup
implicitMessage (showEndResult r)
return r
Nothing -> do
implicitMessage (showEndResult False)
return False
-- | Wait until there's an idle worker in the pool, remove it from the
-- pool, and return its state.
@ -138,16 +175,17 @@ finishCommandActions = do
swapTMVar tv UnallocatedWorkerPool
case pool of
UnallocatedWorkerPool -> noop
WorkerPool l -> forM_ (mapMaybe workerAsync l) $ \aid ->
WorkerPool l -> forM_ (mapMaybe workerAsync l) $ \aid ->
liftIO (waitCatch aid) >>= \case
Left _ -> noop
Right st -> mergeState st
{- Changes the current thread's stage in the worker pool.
-
- An idle worker with the desired stage is found in the pool
- (waiting if necessary for one to become idle)
- and the stages of it and the current thread are swapped.
- The pool needs to continue to contain the same number of worker threads
- for each stage. So, an idle worker with the desired stage is found in
- the pool (waiting if necessary for one to become idle), and the stages
- of it and the current thread are swapped.
-}
changeStageTo :: WorkerStage -> Annex ()
changeStageTo newstage = do
@ -168,23 +206,26 @@ changeStageTo newstage = do
{- Like commandAction, but without the concurrency. -}
includeCommandAction :: CommandStart -> CommandCleanup
includeCommandAction a = account =<< tryNonAsync (callCommandAction a)
where
account (Right True) = return True
account (Right False) = incerr
account (Left err) = case fromException err of
includeCommandAction = accountCommandAction . callCommandAction
accountCommandAction :: CommandCleanup -> CommandCleanup
accountCommandAction a = tryNonAsync a >>= \case
Right True -> return True
Right False -> incerr
Left err -> case fromException err of
Just exitcode -> liftIO $ exitWith exitcode
Nothing -> do
toplevelWarning True (show err)
implicitMessage showEndFail
incerr
where
incerr = do
Annex.incError
return False
{- Runs a single command action through the start, perform and cleanup
- stages, without catching errors. Useful if one command wants to run
- part of another command. -}
- stages, without catching errors and without incrementing error counter.
- Useful if one command wants to run part of another command. -}
callCommandAction :: CommandStart -> CommandCleanup
callCommandAction = fromMaybe True <$$> callCommandAction'
@ -203,9 +244,7 @@ callCommandActionQuiet start =
showStartMessage startmsg
perform >>= \case
Nothing -> return (Just False)
Just cleanup -> do
changeStageTo CleanupStage
Just <$> cleanup
Just cleanup -> Just <$> cleanup
{- Do concurrent output when that has been requested. -}
allowConcurrentOutput :: Annex a -> Annex a
@ -253,22 +292,12 @@ allowConcurrentOutput a = do
liftIO $ setNumCapabilities n
{- Ensures that only one thread processes a key at a time.
- Other threads will block until it's done. -}
{-
onlyActionOn :: Key -> CommandStart -> CommandStart
onlyActionOn k a = onlyActionOn' k run
where
-- Run whole action, not just start stage, so other threads
-- block until it's done.
run = callCommandActionQuiet a >>= \case
Nothing -> return Nothing
Just r' -> return $ Just $ return $ Just $ return r'
-}
{- Ensures that only one thread processes a key at a time.
- Other threads will block until it's done. -}
onlyActionOn' :: Key -> Annex a -> Annex a
onlyActionOn' k a = go =<< Annex.getState Annex.concurrency
- Other threads will block until it's done.
-
- May be called repeatedly by the same thread without blocking. -}
ensureOnlyActionOn :: Key -> Annex a -> Annex a
ensureOnlyActionOn k a =
go =<< Annex.getState Annex.concurrency
where
go NonConcurrent = a
go (Concurrent _) = goconcurrent
@ -283,7 +312,7 @@ onlyActionOn' k a = go =<< Annex.getState Annex.concurrency
case M.lookup k m of
Just tid
| tid /= mytid -> retry
| otherwise -> return (return ())
| otherwise -> return $ return ()
Nothing -> do
writeTVar tv $! M.insert k mytid m
return $ liftIO $ atomically $

View file

@ -84,8 +84,8 @@ startingUsualMessages msg t a = next (StartUsualMessages msg (mkActionItem t), a
{- For commands that do not display usual start or end messages,
- but have some other custom output. -}
startingCustomOutput :: CommandPerform -> CommandStart
startingCustomOutput a = next (CustomOutput, a)
startingCustomOutput :: MkActionItem t => t -> CommandPerform -> CommandStart
startingCustomOutput t a = next (CustomOutput (mkActionItem t), a)
{- For perform stage to indicate what step to run next. -}
next :: a -> Annex (Maybe a)

View file

@ -59,7 +59,7 @@ seek (UnsetConfig name) = commandAction $
unsetConfig (ConfigKey name)
next $ return True
seek (GetConfig name) = commandAction $
startingCustomOutput $ do
startingCustomOutput (ActionItemOther Nothing) $ do
getGlobalConfig name >>= \case
Nothing -> return ()
Just v -> liftIO $ putStrLn v

View file

@ -91,12 +91,12 @@ startKeys o (key, ai) = start' o key (AssociatedFile Nothing) ai
startLocal :: AssociatedFile -> ActionItem -> NumCopies -> Key -> [VerifiedCopy] -> CommandStart
startLocal afile ai numcopies key preverified =
stopUnless (inAnnex key) $
starting "drop" ai $
starting "drop" (OnlyActionOn key ai) $
performLocal key afile numcopies preverified
startRemote :: AssociatedFile -> ActionItem -> NumCopies -> Key -> Remote -> CommandStart
startRemote afile ai numcopies key remote =
starting ("drop " ++ Remote.name remote) ai $
starting ("drop " ++ Remote.name remote) (OnlyActionOn key ai) $
performRemote key afile numcopies remote
performLocal :: Key -> AssociatedFile -> NumCopies -> [VerifiedCopy] -> CommandPerform

View file

@ -66,7 +66,7 @@ seek o = case batchOption o of
start :: FindOptions -> FilePath -> Key -> CommandStart
start o file key =
stopUnless (limited <||> inAnnex key) $
startingCustomOutput $ do
startingCustomOutput key $ do
showFormatted (formatOption o) file $ ("file", file) : keyVars key
next $ return True

View file

@ -71,7 +71,7 @@ start' expensivecheck from key afile ai =
stopUnless (Command.Move.fromOk src key) $
go $ Command.Move.fromPerform src Command.Move.RemoveNever key afile
where
go = starting "get" ai
go = starting "get" (OnlyActionOn key ai)
perform :: Key -> AssociatedFile -> CommandPerform
perform key afile = stopUnless (getKey key afile) $

View file

@ -28,7 +28,7 @@ start (name:g:[]) = do
setGroup u (toGroup g)
start (name:[]) = do
u <- Remote.nameToUUID name
startingCustomOutput $ do
startingCustomOutput (ActionItemOther Nothing) $ do
liftIO . putStrLn . unwords . map fmt . S.toList
=<< lookupGroups u
next $ return True

View file

@ -22,7 +22,7 @@ seek :: CmdParams -> CommandSeek
seek = withWords (commandAction . start)
start :: [String] -> CommandStart
start (g:[]) = startingCustomOutput $
start (g:[]) = startingCustomOutput (ActionItemOther Nothing) $
performGet groupPreferredContentMapRaw (toGroup g)
start (g:expr:[]) = startingUsualMessages "groupwanted" (ActionItemOther (Just g)) $
performSet groupPreferredContentSet expr (toGroup g)

View file

@ -279,7 +279,8 @@ seekRemote remote branch msubdir = do
, ". Re-run command to resume import."
]
Just imported -> void $
includeCommandAction $ commitimport imported
includeCommandAction $
commitimport imported
where
importmessage = "import from " ++ Remote.name remote

View file

@ -48,7 +48,7 @@ start s _file k
| otherwise = stop
start' :: Key -> CommandStart
start' k = startingCustomOutput $ do
start' k = startingCustomOutput k $ do
tmpf <- fromRepo $ gitAnnexTmpObjectLocation k
whenM (liftIO $ doesFileExist tmpf) $
liftIO $ putStrLn tmpf

View file

@ -99,7 +99,7 @@ start c o file k = startKeys c o (k, mkActionItem (k, afile))
startKeys :: VectorClock -> MetaDataOptions -> (Key, ActionItem) -> CommandStart
startKeys c o (k, ai) = case getSet o of
Get f -> startingCustomOutput $ do
Get f -> startingCustomOutput k $ do
l <- S.toList . currentMetaDataValues f <$> getCurrentMetaData k
liftIO $ forM_ l $
B8.putStrLn . fromMetaValue

View file

@ -108,8 +108,9 @@ toStart' dest removewhen afile key ai = do
)
else go False (Remote.hasKey dest key)
where
go fastcheck isthere = starting (describeMoveAction removewhen) ai $
toPerform dest removewhen key afile fastcheck =<< isthere
go fastcheck isthere =
starting (describeMoveAction removewhen) (OnlyActionOn key ai) $
toPerform dest removewhen key afile fastcheck =<< isthere
expectedPresent :: Remote -> Key -> Annex Bool
expectedPresent dest key = do
@ -182,7 +183,7 @@ fromStart removewhen afile key ai src = case removewhen of
RemoveSafe -> go
where
go = stopUnless (fromOk src key) $
starting (describeMoveAction removewhen) ai $
starting (describeMoveAction removewhen) (OnlyActionOn key ai) $
fromPerform src removewhen key afile
fromOk :: Remote -> Key -> Annex Bool
@ -246,13 +247,13 @@ toHereStart removewhen afile key ai = case removewhen of
RemoveNever -> stopUnless (not <$> inAnnex key) go
RemoveSafe -> go
where
go = do
go = startingCustomOutput (OnlyActionOn key ai) $ do
rs <- Remote.keyPossibilities key
forM_ rs $ \r ->
includeCommandAction $
starting (describeMoveAction removewhen) ai $
fromPerform r removewhen key afile
stop
next $ return True
{- The goal of this command is to allow the user maximum freedom to move
- files as they like, while avoiding making bad situations any worse

View file

@ -33,7 +33,7 @@ start [s] = case readish s of
start _ = giveup "Specify a single number."
startGet :: CommandStart
startGet = startingCustomOutput $ next $ do
startGet = startingCustomOutput (ActionItemOther Nothing) $ next $ do
v <- getGlobalNumCopies
case v of
Just n -> liftIO $ putStrLn $ show $ fromNumCopies n

View file

@ -27,7 +27,7 @@ seek [u] = commandAction $ start $ toUUID u
seek _ = giveup "missing UUID parameter"
start :: UUID -> CommandStart
start theiruuid = startingCustomOutput $ do
start theiruuid = startingCustomOutput (ActionItemOther Nothing) $ do
servermode <- liftIO $ do
ro <- Checks.checkEnvSet Checks.readOnlyEnv
ao <- Checks.checkEnvSet Checks.appendOnlyEnv

View file

@ -84,13 +84,14 @@ seek ps = lockPreCommitHook $ ifM isDirect
startInjectUnlocked :: FilePath -> CommandStart
startInjectUnlocked f = startingCustomOutput $ do
startInjectUnlocked f = startingCustomOutput (ActionItemOther Nothing) $ do
unlessM (callCommandAction $ Command.Add.start f) $
error $ "failed to add " ++ f ++ "; canceling commit"
next $ return True
startDirect :: [String] -> CommandStart
startDirect _ = startingCustomOutput $ next preCommitDirect
startDirect _ = startingCustomOutput (ActionItemOther Nothing) $
next preCommitDirect
addViewMetaData :: View -> ViewedFile -> Key -> CommandStart
addViewMetaData v f k = starting "metadata" (mkActionItem (k, f)) $

View file

@ -27,7 +27,7 @@ start = parse
where
parse (name:[]) = do
u <- Remote.nameToUUID name
startingCustomOutput $
startingCustomOutput (ActionItemOther Nothing) $
performGet u
parse (name:expr:[]) = do
u <- Remote.nameToUUID name

View file

@ -626,10 +626,14 @@ seekSyncContent o rs currbranch = do
gokey mvar bloom (k, _) = go (Left bloom) mvar (AssociatedFile Nothing) k
go ebloom mvar af k = commandAction $ do
whenM (syncFile ebloom rs af k) $
void $ liftIO $ tryPutMVar mvar ()
return Nothing
go ebloom mvar af k = do
-- Run syncFile as a command action so file transfers run
-- concurrently.
let ai = OnlyActionOn k (ActionItemKey k)
commandAction $ startingCustomOutput ai $ do
whenM (syncFile ebloom rs af k) $
void $ liftIO $ tryPutMVar mvar ()
next $ return True
{- If it's preferred content, and we don't have it, get it from one of the
- listed remotes (preferring the cheaper earlier ones).
@ -645,7 +649,7 @@ seekSyncContent o rs currbranch = do
- Returns True if any file transfers were made.
-}
syncFile :: Either (Maybe (Bloom Key)) (Key -> Annex ()) -> [Remote] -> AssociatedFile -> Key -> Annex Bool
syncFile ebloom rs af k = onlyActionOn' k $ do
syncFile ebloom rs af k = do
inhere <- inAnnex k
locs <- map Remote.uuid <$> Remote.keyPossibilities k
let (have, lack) = partition (\r -> Remote.uuid r `elem` locs) rs

View file

@ -45,7 +45,7 @@ seek :: TransferKeyOptions -> CommandSeek
seek o = withKeys (commandAction . start o) (keyOptions o)
start :: TransferKeyOptions -> Key -> CommandStart
start o key = startingCustomOutput $ case fromToOptions o of
start o key = startingCustomOutput key $ case fromToOptions o of
ToRemote dest -> toPerform key (fileOption o) =<< getParsed dest
FromRemote src -> fromPerform key (fileOption o) =<< getParsed src

View file

@ -34,7 +34,7 @@ cmd' name desc getter setter = noMessages $
start (rname:[]) = do
u <- Remote.nameToUUID rname
startingCustomOutput $
startingCustomOutput (ActionItemOther Nothing) $
performGet getter u
start (rname:expr:[]) = do
u <- Remote.nameToUUID rname

View file

@ -37,6 +37,7 @@ newtype TopFilePath = TopFilePath { getTopFilePath :: FilePath }
{- A file in a branch or other treeish. -}
data BranchFilePath = BranchFilePath Ref TopFilePath
deriving (Show, Eq, Ord)
{- Git uses the branch:file form to refer to a BranchFilePath -}
descBranchFilePath :: BranchFilePath -> String

View file

@ -92,6 +92,7 @@ showStartMessage (StartMessage command ai) = case ai of
ActionItemFailedTransfer t _ -> showStartKey command (transferKey t) ai
ActionItemWorkTreeFile file -> showStart command file
ActionItemOther msg -> showStart' command msg
OnlyActionOn _ ai' -> showStartMessage (StartMessage command ai')
showStartMessage (StartUsualMessages command ai) = do
outputType <$> Annex.getState Annex.output >>= \case
QuietOutput -> Annex.setOutput NormalOutput
@ -99,7 +100,7 @@ showStartMessage (StartUsualMessages command ai) = do
Annex.changeState $ \s -> s
{ Annex.output = (Annex.output s) { implicitMessages = True } }
showStartMessage (StartMessage command ai)
showStartMessage CustomOutput = do
showStartMessage (CustomOutput _) = do
Annex.setOutput QuietOutput
Annex.changeState $ \s -> s
{ Annex.output = (Annex.output s) { implicitMessages = False } }

View file

@ -22,6 +22,10 @@ data ActionItem
| ActionItemFailedTransfer Transfer TransferInfo
| ActionItemWorkTreeFile FilePath
| ActionItemOther (Maybe String)
-- Use to avoid more than one thread concurrently processing the
-- same Key.
| OnlyActionOn Key ActionItem
deriving (Show, Eq)
class MkActionItem t where
mkActionItem :: t -> ActionItem
@ -60,6 +64,7 @@ actionItemDesc (ActionItemFailedTransfer t i) = actionItemDesc $
ActionItemAssociatedFile (associatedFile i) (transferKey t)
actionItemDesc (ActionItemWorkTreeFile f) = f
actionItemDesc (ActionItemOther s) = fromMaybe "" s
actionItemDesc (OnlyActionOn _ ai) = actionItemDesc ai
actionItemKey :: ActionItem -> Maybe Key
actionItemKey (ActionItemAssociatedFile _ k) = Just k
@ -68,13 +73,16 @@ actionItemKey (ActionItemBranchFilePath _ k) = Just k
actionItemKey (ActionItemFailedTransfer t _) = Just (transferKey t)
actionItemKey (ActionItemWorkTreeFile _) = Nothing
actionItemKey (ActionItemOther _) = Nothing
actionItemKey (OnlyActionOn _ ai) = actionItemKey ai
actionItemWorkTreeFile :: ActionItem -> Maybe FilePath
actionItemWorkTreeFile (ActionItemAssociatedFile (AssociatedFile af) _) = af
actionItemWorkTreeFile (ActionItemWorkTreeFile f) = Just f
actionItemWorkTreeFile (OnlyActionOn _ ai) = actionItemWorkTreeFile ai
actionItemWorkTreeFile _ = Nothing
actionItemTransferDirection :: ActionItem -> Maybe Direction
actionItemTransferDirection (ActionItemFailedTransfer t _) = Just $
transferDirection t
actionItemTransferDirection (OnlyActionOn _ ai) = actionItemTransferDirection ai
actionItemTransferDirection _ = Nothing

View file

@ -41,14 +41,21 @@ type CommandCleanup = Annex Bool
{- Message that is displayed when starting to perform an action on
- something. The String is typically the name of the command or action
- being performed.
-
- CustomOutput prevents any start, end, or other implicit messages from
- being displayed, letting a command output its own custom format.
-}
data StartMessage
= StartMessage String ActionItem
| StartUsualMessages String ActionItem
| CustomOutput
-- ^ Like StartMessage, but makes sure to enable usual message
-- display in case it was disabled by cmdnomessages.
| CustomOutput ActionItem
-- ^ Prevents any start, end, or other implicit messages from
-- being displayed, letting a command output its own custom format.
deriving (Show)
instance MkActionItem StartMessage where
mkActionItem (StartMessage _ ai) = ai
mkActionItem (StartUsualMessages _ ai) = ai
mkActionItem (CustomOutput ai) = ai
{- A command is defined by specifying these things. -}
data Command = Command

View file

@ -22,18 +22,6 @@ are still some things that could be improved, tracked here:
cleanup action. Currently, it's bundled into the same action that
transfers content.
* onlyActionOn collapses the cleanup action into the start action,
and so prevents use of the separate cleanup queue.
* Don't parallelize start stage actions. They are supposed to run fast,
and often a huge number of them don't print out anything. The overhead of
bookkeeping for parallizing those swamps the benefit of parallelizing by
what seems to be a large degree. Compare `git annex get` in a directory
where the first several thousand files are already present with and
without -J.
Only once the start stage has decided
something needs to be done should a job be started up.
This probably needs display of any output to be moved out of the start
stage, because no console region will be allocated for it.
* Using -J can sometimes lead to a slowdown while a rsync special remote
runs Remote.Rsync.rsyncTransport, which sets up a ssh connection to the
remote. This is done even when the remote is not used.