assistant: Add 1/200th second delay between checking each file in the full transfer scan, to avoid using too much CPU.

The slowdown is not going to be large in typical small-ish repos.
And it does not seem to matter if the assistant reacts a little bit slower
in situations involving the expensive scan, since:

a) Those situations typically involve getting back in sync after something
   has changed on a remote, often after a disconnect of some duration.
   So taking a few seconds more is not noticable.
b) If the scan finds things that it needs to do, it will start
   blocking anyway after 10 transfers are queued (due to use of
   queueTransferWhenSmall). So, only the speed of finding the first 10
   transfers will be impacted by this change.

This commit was sponsored by Jochen Bartl on Patreon.
This commit is contained in:
Joey Hess 2017-03-06 13:32:47 -04:00
parent 113b48ba19
commit af2a6d578e
No known key found for this signature in database
GPG key ID: C910D9222512E3C7
3 changed files with 44 additions and 0 deletions

View file

@ -25,6 +25,7 @@ import qualified Types.Remote as Remote
import Utility.ThreadScheduler
import Utility.NotificationBroadcaster
import Utility.Batch
import Utility.ThreadScheduler
import qualified Git.LsFiles as LsFiles
import Annex.WorkTree
import Annex.Content
@ -32,6 +33,7 @@ import Annex.Wanted
import CmdLine.Action
import qualified Data.Set as S
import Control.Concurrent
{- This thread waits until a remote needs to be scanned, to find transfers
- that need to be made, to keep data in sync.
@ -145,6 +147,10 @@ expensiveScan urlrenderer rs = batch <~> do
(findtransfers f unwanted)
=<< liftAnnex (lookupFile f)
mapM_ (enqueue f) ts
{- Delay for a short time to avoid using too much CPU. -}
liftIO $ threadDelay $ fromIntegral $ oneSecond `div` 200
scan unwanted' fs
enqueue f (r, t) =

View file

@ -5,6 +5,8 @@ git-annex (6.20170301.2) UNRELEASED; urgency=medium
* status: Propigate nonzero exit code from git status.
* Linux standalone builds put the bundled ssh last in PATH,
so any system ssh will be preferred over it.
* assistant: Add 1/200th second delay between checking each file
in the full transfer scan, to avoid using too much CPU.
-- Joey Hess <id@joeyh.name> Thu, 02 Mar 2017 12:51:40 -0400

View file

@ -0,0 +1,36 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2017-03-06T17:03:12Z"
content="""
The scan that is skipped is one of the files on disk in order to find
changes that were made while the assistant was not running.
What you are seeing is the full transfer scan. While annex.startupscan
could be made to also skip that scan, a full transfer scan is not only run
at startup, but after merging git-annex branch changes from a remote. So
disabling it only at startup does not seem very useful.
There could be an option to disable the full transfer scan ever running.
However, this would make the assistant not notice certian transfers/drops
that you would normally want it to do. For example, if a remote got a bunch
of files in an archive/ directory from somewhere else, and the local
repository contains those files, the full transfer scan is needed to notice
that the archived files can now be removed from the local repository.
In other situations, the local repository would not get files that it
ought to contain.
So, I think it might be better to make the expensive transfer scan run a
little bit slower so it doesn't peg your CPU. I've added a 1/200th second
delay after each file it checks.
That will make it use something like
5-10% of the CPU, instead of 100%. At the same time it doesn't slow down the
total scan very much. In a repository with 5k files, it makes the scan 25
seconds slower, which makes the assistant react that much slower -- but
the expensive scan is only needed to make sure things turn out consistent,
so its overall speed is not super important.
Check it out, let me know if it's still using too much CPU. We could always
make that 1/200th second tunable, or find a better value for it.
"""]]