c2e46f4707
So that eg, addurl of several large files that take time to download will update the index for each file, rather than deferring the index updates to the end. In cases like an add of many smallish files, where a new file is being added every few seconds. In that case, the queue will still build up a lot of changes which are flushed at once, for best performance. Since the default queue size is 10240, often it only gets flushed once at the end, same as before. (Notice that updateQueue updated _lastchanged when adding a new item to the queue without flushing it; that is necessary to avoid it flushing the queue every 5 minutes in this case.) But, when it takes more than a 5 minutes to add a file, the overhead of updating the index immediately is probably small, so do it after each file. This avoids git-annex potentially taking a very very long time indeed to stage newly added files, which can be annoying to the user who would like to get on with doing something with the files it's already added, eg using git mv to rename them to a better name. This is only likely to cause a problem if it takes say, 30 seconds to update the index; doing an extra 30 seconds of work after every 5 minute file add would be less optimal. Normally, updating the index takes significantly less time than that. On a SSD with 100k files it takes less than 1 second, and the index write time is bound by disk read and write so is not too much worse on a hard drive. So I hope this will not impact users, although if it does turn out to, the time limit could be made configurable. A perhaps better way to do it would be to have a background worker thread that wakes up every 60 seconds or so and flushes the queue. That is made somewhat difficult because the queue can contain Annex actions and so this would add a new source of concurrency issues. So I'm trying to avoid that approach if possible. Sponsored-by: Erik Bjäreholt on Patreon
95 lines
2.5 KiB
Haskell
95 lines
2.5 KiB
Haskell
{- git-annex command queue
|
|
-
|
|
- Copyright 2011-2021 Joey Hess <id@joeyh.name>
|
|
-
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
|
-}
|
|
|
|
{-# LANGUAGE BangPatterns #-}
|
|
|
|
module Annex.Queue (
|
|
addCommand,
|
|
addInternalAction,
|
|
addUpdateIndex,
|
|
flush,
|
|
flushWhenFull,
|
|
size,
|
|
get,
|
|
mergeFrom,
|
|
) where
|
|
|
|
import Annex.Common
|
|
import Annex hiding (new)
|
|
import Annex.LockFile
|
|
import qualified Git.Queue
|
|
import qualified Git.UpdateIndex
|
|
|
|
{- Adds a git command to the queue. -}
|
|
addCommand :: [CommandParam] -> String -> [CommandParam] -> [FilePath] -> Annex ()
|
|
addCommand commonparams command params files = do
|
|
q <- get
|
|
store =<< flushWhenFull =<<
|
|
(Git.Queue.addCommand commonparams command params files q =<< gitRepo)
|
|
|
|
addInternalAction :: Git.Queue.InternalActionRunner Annex -> [(RawFilePath, IO Bool, FileSize)] -> Annex ()
|
|
addInternalAction runner files = do
|
|
q <- get
|
|
store =<< flushWhenFull =<<
|
|
(Git.Queue.addInternalAction runner files q =<< gitRepo)
|
|
|
|
{- Adds an update-index stream to the queue. -}
|
|
addUpdateIndex :: Git.UpdateIndex.Streamer -> Annex ()
|
|
addUpdateIndex streamer = do
|
|
q <- get
|
|
store =<< flushWhenFull =<<
|
|
(Git.Queue.addUpdateIndex streamer q =<< gitRepo)
|
|
|
|
{- Runs the queue if it is full. -}
|
|
flushWhenFull :: Git.Queue.Queue Annex -> Annex (Git.Queue.Queue Annex)
|
|
flushWhenFull q
|
|
| Git.Queue.full q = flush' q
|
|
| otherwise = return q
|
|
|
|
{- Runs (and empties) the queue. -}
|
|
flush :: Annex ()
|
|
flush = do
|
|
q <- get
|
|
unless (0 == Git.Queue.size q) $ do
|
|
store =<< flush' q
|
|
|
|
{- When there are multiple worker threads, each has its own queue.
|
|
- And of course multiple git-annex processes may be running each with its
|
|
- own queue.
|
|
-
|
|
- But, flushing two queues at the same time could lead to failures due to
|
|
- git locking files. So, only one queue is allowed to flush at a time.
|
|
-}
|
|
flush' :: Git.Queue.Queue Annex -> Annex (Git.Queue.Queue Annex)
|
|
flush' q = withExclusiveLock gitAnnexGitQueueLock $ do
|
|
showStoringStateAction
|
|
Git.Queue.flush q =<< gitRepo
|
|
|
|
{- Gets the size of the queue. -}
|
|
size :: Annex Int
|
|
size = Git.Queue.size <$> get
|
|
|
|
get :: Annex (Git.Queue.Queue Annex)
|
|
get = maybe new return =<< getState repoqueue
|
|
|
|
new :: Annex (Git.Queue.Queue Annex)
|
|
new = do
|
|
sz <- annexQueueSize <$> getGitConfig
|
|
q <- liftIO $ Git.Queue.new sz Nothing
|
|
store q
|
|
return q
|
|
|
|
store :: Git.Queue.Queue Annex -> Annex ()
|
|
store q = changeState $ \s -> s { repoqueue = Just q }
|
|
|
|
mergeFrom :: AnnexState -> Annex ()
|
|
mergeFrom st = case repoqueue st of
|
|
Nothing -> noop
|
|
Just newq -> do
|
|
q <- get
|
|
let !q' = Git.Queue.merge q newq
|
|
store =<< flushWhenFull q'
|