2021-01-06 18:11:08 +00:00
|
|
|
{- git check-attr interface
|
2012-02-14 03:42:44 +00:00
|
|
|
-
|
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.
In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.
Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 14:38:44 +00:00
|
|
|
- Copyright 2012-2020 Joey Hess <id@joeyh.name>
|
2012-02-14 03:42:44 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2012-02-14 03:42:44 +00:00
|
|
|
-}
|
|
|
|
|
|
|
|
module Annex.CheckAttr (
|
|
|
|
checkAttr,
|
2021-01-06 18:11:08 +00:00
|
|
|
checkAttrs,
|
2015-04-10 21:53:58 +00:00
|
|
|
checkAttrStop,
|
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.
In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.
Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 14:38:44 +00:00
|
|
|
mkConcurrentCheckAttrHandle,
|
2012-02-14 03:42:44 +00:00
|
|
|
) where
|
|
|
|
|
2016-01-20 20:36:33 +00:00
|
|
|
import Annex.Common
|
2012-02-14 03:42:44 +00:00
|
|
|
import qualified Git.CheckAttr as Git
|
|
|
|
import qualified Annex
|
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.
In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.
Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 14:38:44 +00:00
|
|
|
import Utility.ResourcePool
|
|
|
|
import Types.Concurrency
|
2020-04-21 15:20:10 +00:00
|
|
|
import Annex.Concurrent.Utility
|
2012-02-14 03:42:44 +00:00
|
|
|
|
|
|
|
{- All gitattributes used by git-annex. -}
|
|
|
|
annexAttrs :: [Git.Attr]
|
|
|
|
annexAttrs =
|
|
|
|
[ "annex.backend"
|
2016-02-02 19:18:17 +00:00
|
|
|
, "annex.largefiles"
|
2021-01-06 18:11:08 +00:00
|
|
|
, "annex.numcopies"
|
|
|
|
, "annex.mincopies"
|
2012-02-14 03:42:44 +00:00
|
|
|
]
|
|
|
|
|
2020-10-28 21:25:59 +00:00
|
|
|
checkAttr :: Git.Attr -> RawFilePath -> Annex String
|
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.
In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.
Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 14:38:44 +00:00
|
|
|
checkAttr attr file = withCheckAttrHandle $ \h ->
|
2012-02-14 03:42:44 +00:00
|
|
|
liftIO $ Git.checkAttr h attr file
|
|
|
|
|
2021-01-06 18:11:08 +00:00
|
|
|
checkAttrs :: [Git.Attr] -> RawFilePath -> Annex [String]
|
|
|
|
checkAttrs attrs file = withCheckAttrHandle $ \h ->
|
|
|
|
liftIO $ Git.checkAttrs h attrs file
|
|
|
|
|
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.
In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.
Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 14:38:44 +00:00
|
|
|
withCheckAttrHandle :: (Git.CheckAttrHandle -> Annex a) -> Annex a
|
|
|
|
withCheckAttrHandle a =
|
|
|
|
maybe mkpool go =<< Annex.getState Annex.checkattrhandle
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.
In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.
Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 14:38:44 +00:00
|
|
|
go p = withResourcePool p start a
|
|
|
|
start = inRepo $ Git.checkAttrStart annexAttrs
|
|
|
|
mkpool = do
|
|
|
|
-- This only runs in non-concurrent code paths;
|
|
|
|
-- a concurrent pool is set up earlier when needed.
|
|
|
|
p <- mkResourcePoolNonConcurrent start
|
|
|
|
Annex.changeState $ \s -> s { Annex.checkattrhandle = Just p }
|
|
|
|
go p
|
|
|
|
|
|
|
|
mkConcurrentCheckAttrHandle :: Concurrency -> Annex (ResourcePool Git.CheckAttrHandle)
|
|
|
|
mkConcurrentCheckAttrHandle c =
|
|
|
|
Annex.getState Annex.checkattrhandle >>= \case
|
|
|
|
Just p@(ResourcePool {}) -> return p
|
|
|
|
_ -> mkResourcePool =<< liftIO (maxCheckAttrs c)
|
|
|
|
|
|
|
|
{- git check-attr is typically CPU bound, and is not likely to be the main
|
|
|
|
- bottleneck for any command. So limit to the number of CPU cores, maximum,
|
|
|
|
- while respecting the -Jn value.
|
|
|
|
-}
|
|
|
|
maxCheckAttrs :: Concurrency -> IO Int
|
2020-04-21 15:20:10 +00:00
|
|
|
maxCheckAttrs = concurrencyUpToCpus
|
2015-04-10 21:53:58 +00:00
|
|
|
|
|
|
|
checkAttrStop :: Annex ()
|
|
|
|
checkAttrStop = maybe noop stop =<< Annex.getState Annex.checkattrhandle
|
|
|
|
where
|
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.
In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.
Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 14:38:44 +00:00
|
|
|
stop p = do
|
|
|
|
liftIO $ freeResourcePool p Git.checkAttrStop
|
2015-04-10 21:53:58 +00:00
|
|
|
Annex.changeState $ \s -> s { Annex.checkattrhandle = Nothing }
|