check-attr resource pool

Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.

In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.

Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
This commit is contained in:
Joey Hess 2020-04-21 10:38:44 -04:00
parent cee6b344b4
commit 45fb7af21c
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 47 additions and 16 deletions

View file

@ -71,6 +71,7 @@ import Types.CatFileHandles
import qualified Database.Keys.Handle as Keys
import Utility.InodeCache
import Utility.Url
import Utility.ResourcePool
import "mtl" Control.Monad.Reader
import Control.Concurrent
@ -118,7 +119,7 @@ data AnnexState = AnnexState
, repoqueue :: Maybe (Git.Queue.Queue Annex)
, catfilehandles :: CatFileHandles
, hashobjecthandle :: Maybe HashObjectHandle
, checkattrhandle :: Maybe CheckAttrHandle
, checkattrhandle :: Maybe (ResourcePool CheckAttrHandle)
, checkignorehandle :: Maybe CheckIgnoreHandle
, forcebackend :: Maybe String
, globalnumcopies :: Maybe NumCopies