rcu: Reduce cache-miss initialization latencies for large systems
Commit #0209f649
(rcu: limit rcu_node leaf-level fanout) set an upper
limit of 16 on the leaf-level fanout for the rcu_node tree. This was
needed to reduce lock contention that was induced by the synchronization
of scheduling-clock interrupts, which was in turn needed to improve
energy efficiency for moderate-sized lightly loaded servers.
However, reducing the leaf-level fanout means that there are more
leaf-level rcu_node structures in the tree, which in turn means that
RCU's grace-period initialization incurs more cache misses. This is
not a problem on moderate-sized servers with only a few tens of CPUs,
but becomes a major source of real-time latency spikes on systems with
many hundreds of CPUs. In addition, the workloads running on these large
systems tend to be CPU-bound, which eliminates the energy-efficiency
advantages of synchronizing scheduling-clock interrupts. Therefore,
these systems need maximal values for the rcu_node leaf-level fanout.
This commit addresses this problem by introducing a new kernel parameter
named RCU_FANOUT_LEAF that directly controls the leaf-level fanout.
This parameter defaults to 16 to handle the common case of a moderate
sized lightly loaded servers, but may be set higher on larger systems.
Reported-by: Mike Galbraith <efault@gmx.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit is contained in:
parent
d8169d4c36
commit
8932a63d5e
3 changed files with 31 additions and 8 deletions
27
init/Kconfig
27
init/Kconfig
|
@ -458,6 +458,33 @@ config RCU_FANOUT
|
|||
Select a specific number if testing RCU itself.
|
||||
Take the default if unsure.
|
||||
|
||||
config RCU_FANOUT_LEAF
|
||||
int "Tree-based hierarchical RCU leaf-level fanout value"
|
||||
range 2 RCU_FANOUT if 64BIT
|
||||
range 2 RCU_FANOUT if !64BIT
|
||||
depends on TREE_RCU || TREE_PREEMPT_RCU
|
||||
default 16
|
||||
help
|
||||
This option controls the leaf-level fanout of hierarchical
|
||||
implementations of RCU, and allows trading off cache misses
|
||||
against lock contention. Systems that synchronize their
|
||||
scheduling-clock interrupts for energy-efficiency reasons will
|
||||
want the default because the smaller leaf-level fanout keeps
|
||||
lock contention levels acceptably low. Very large systems
|
||||
(hundreds or thousands of CPUs) will instead want to set this
|
||||
value to the maximum value possible in order to reduce the
|
||||
number of cache misses incurred during RCU's grace-period
|
||||
initialization. These systems tend to run CPU-bound, and thus
|
||||
are not helped by synchronized interrupts, and thus tend to
|
||||
skew them, which reduces lock contention enough that large
|
||||
leaf-level fanouts work well.
|
||||
|
||||
Select a specific number if testing RCU itself.
|
||||
|
||||
Select the maximum permissible value for large systems.
|
||||
|
||||
Take the default if unsure.
|
||||
|
||||
config RCU_FANOUT_EXACT
|
||||
bool "Disable tree-based hierarchical RCU auto-balancing"
|
||||
depends on TREE_RCU || TREE_PREEMPT_RCU
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue