Commit graph

134 commits

Author SHA1 Message Date
Paul E. McKenney
0d3c55bc9f Merge branches 'doc.2013.12.03a', 'fixes.2013.12.12a', 'rcutorture.2013.12.03a' and 'sparse.2013.12.12a' into HEAD
doc.2013.12.03a: Topic branch for documentation changes.
fixes.2013.12.12a: Topic branch for miscellaneous fixes.
rcutorture.2013.12.03a: Topic branch for new rcutorture/KVM scripting.
sparse.2013.12.12a: Topic branch for sparse-RCU changes.
2013-12-12 12:35:38 -08:00
Paul E. McKenney
96d3fd0d31 rcu: Break call_rcu() deadlock involving scheduler and perf
Dave Jones got the following lockdep splat:

>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.12.0-rc3+ #92 Not tainted
>  -------------------------------------------------------
>  trinity-child2/15191 is trying to acquire lock:
>   (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50
>
> but task is already holding lock:
>   (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&ctx->lock){-.-...}:
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
>         [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0
>         [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0
>         [<ffffffff81732052>] __schedule+0x1d2/0xa20
>         [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0
>         [<ffffffff817352b6>] retint_kernel+0x26/0x30
>         [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50
>         [<ffffffff813f0504>] pty_write+0x54/0x60
>         [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0
>         [<ffffffff813e5838>] tty_write+0x158/0x2d0
>         [<ffffffff811c4850>] vfs_write+0xc0/0x1f0
>         [<ffffffff811c52cc>] SyS_write+0x4c/0xa0
>         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
>
> -> #2 (&rq->lock){-.-.-.}:
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
>         [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0
>         [<ffffffff81054336>] do_fork+0x126/0x460
>         [<ffffffff81054696>] kernel_thread+0x26/0x30
>         [<ffffffff8171ff93>] rest_init+0x23/0x140
>         [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403
>         [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c
>         [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4
>
> -> #1 (&p->pi_lock){-.-.-.}:
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
>         [<ffffffff810979d1>] try_to_wake_up+0x31/0x350
>         [<ffffffff81097d62>] default_wake_function+0x12/0x20
>         [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40
>         [<ffffffff8108ea38>] __wake_up_common+0x58/0x90
>         [<ffffffff8108ff59>] __wake_up+0x39/0x50
>         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
>         [<ffffffff81111450>] __call_rcu+0x140/0x820
>         [<ffffffff81111b8d>] call_rcu+0x1d/0x20
>         [<ffffffff81093697>] cpu_attach_domain+0x287/0x360
>         [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0
>         [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a
>         [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202
>         [<ffffffff817200be>] kernel_init+0xe/0x190
>         [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0
>
> -> #0 (&rdp->nocb_wq){......}:
>         [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
>         [<ffffffff8108ff43>] __wake_up+0x23/0x50
>         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
>         [<ffffffff81111450>] __call_rcu+0x140/0x820
>         [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
>         [<ffffffff81149abf>] put_ctx+0x4f/0x70
>         [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
>         [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
>         [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
>         [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
>         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
>
> other info that might help us debug this:
>
> Chain exists of:
>   &rdp->nocb_wq --> &rq->lock --> &ctx->lock
>
>   Possible unsafe locking scenario:
>
>         CPU0                    CPU1
>         ----                    ----
>    lock(&ctx->lock);
>                                 lock(&rq->lock);
>                                 lock(&ctx->lock);
>    lock(&rdp->nocb_wq);
>
>  *** DEADLOCK ***
>
> 1 lock held by trinity-child2/15191:
>  #0:  (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
>
> stack backtrace:
> CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92
>  ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40
>  ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0
>  ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0
> Call Trace:
>  [<ffffffff8172a363>] dump_stack+0x4e/0x82
>  [<ffffffff81726741>] print_circular_bug+0x200/0x20f
>  [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
>  [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60
>  [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
>  [<ffffffff810cc243>] lock_acquire+0x93/0x200
>  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
>  [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
>  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
>  [<ffffffff8108ff43>] __wake_up+0x23/0x50
>  [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
>  [<ffffffff81111450>] __call_rcu+0x140/0x820
>  [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50
>  [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
>  [<ffffffff81149abf>] put_ctx+0x4f/0x70
>  [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
>  [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
>  [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0
>  [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
>  [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
>  [<ffffffff8173d4e4>] tracesys+0xdd/0xe2

The underlying problem is that perf is invoking call_rcu() with the
scheduler locks held, but in NOCB mode, call_rcu() will with high
probability invoke the scheduler -- which just might want to use its
locks.  The reason that call_rcu() needs to invoke the scheduler is
to wake up the corresponding rcuo callback-offload kthread, which
does the job of starting up a grace period and invoking the callbacks
afterwards.

One solution (championed on a related problem by Lai Jiangshan) is to
simply defer the wakeup to some point where scheduler locks are no longer
held.  Since we don't want to unnecessarily incur the cost of such
deferral, the task before us is threefold:

1.	Determine when it is likely that a relevant scheduler lock is held.

2.	Defer the wakeup in such cases.

3.	Ensure that all deferred wakeups eventually happen, preferably
	sooner rather than later.

We use irqs_disabled_flags() as a proxy for relevant scheduler locks
being held.  This works because the relevant locks are always acquired
with interrupts disabled.  We may defer more often than needed, but that
is at least safe.

The wakeup deferral is tracked via a new field in the per-CPU and
per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.

This flag is checked by the RCU core processing.  The __rcu_pending()
function now checks this flag, which causes rcu_check_callbacks()
to initiate RCU core processing at each scheduling-clock interrupt
where this flag is set.  Of course this is not sufficient because
scheduling-clock interrupts are often turned off (the things we used to
be able to count on!).  So the flags are also checked on entry to any
state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
state and NO_HZ_FULL user-mode-execution state.

This approach should allow call_rcu() to be invoked regardless of what
locks you might be holding, the key word being "should".

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2013-12-03 10:10:18 -08:00
Paul E. McKenney
c9592ecb98 rcu: Fix typo in Documentation/RCU/trace.txt
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-12-03 10:08:56 -08:00
Michael Opdenacker
4b0d3f0fde rcu: Fix occurrence of "the the" in checklist.txt
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add "then" as suggested by Josh Triplett. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-09-25 10:07:02 -07:00
Paul E. McKenney
64d3b7a1d5 rcu: Update stall-warning documentation
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-09-25 06:49:37 -07:00
Paul E. McKenney
25f27ce4a6 Merge branches 'doc.2013.08.19a', 'fixes.2013.08.20a', 'sysidle.2013.08.31a' and 'torture.2013.08.20a' into HEAD
doc.2013.08.19a: Documentation updates
fixes.2013.08.20a: Miscellaneous fixes
sysidle.2013.08.31a: Detect system-wide idle state.
torture.2013.08.20a: rcutorture updates.
2013-08-31 14:44:45 -07:00
Paul E. McKenney
2ec1f2d987 rcu: Increase rcutorture test coverage
Currently, rcutorture has separate torture_types to test synchronous,
asynchronous, and expedited grace-period primitives.  This has
two disadvantages: (1) Three times the number of runs to cover the
combinations and (2) Little testing of concurrent combinations of the
three options.  This commit therefore adds a pair of module parameters
that control normal and expedited state, with the default being both
types, randomly selected, by the fakewriter processes, thus reducing
source-code size and increasing test coverage.  In addtion, the writer
task switches between asynchronous-normal and expedited grace-period
primitives driven by the same pair of module parameters.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-08-20 11:38:41 -07:00
Paul E. McKenney
6ae3771850 rcu: Update RTFP documentation
Note that this commit also updates the formatting of serveral of the
bibtex entries to conform to that of my .bib files.  I started
accumulating entries back in the 1980s, back when bibtex insisted
that comma (",") was a separator, not a terminator.  This rule forced
commas to the fronts of lines.  25 years later, bibtex allows commas
to be terminators, but I am too lazy to rework all my .bib files.
Keeping the same format as my .bib files allows my to simply
incorporate my RCU.bib file into Documentation/RCU/RTFP.txt, which
is much easier than my earlier practice of keeping track of what
had changed and adding individual entries.  (I sometimes find relevant
papers that were published some years back, for example.)

In addition, this change adds entries for papers published in the
last year or so.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-08-19 21:39:27 -07:00
Paul E. McKenney
d84297c99b rcu: Fix rcu_barrier() documentation
There was a time when rcu_barrier() was guaranteed to wait for at least
a grace period, but that time ended due to energy-efficiency concerns.
So now rcu_barrier() is a no-op if there are no RCU callbacks queued in
the system.  This commit updates the documentation to reflect this change.

Now, rcu_barrier() often does wait for a grace period, so, one could
imagine some modification to rcu_barrier() to more efficiently handle
cases where both rcu_barrier() and a grace period are needed.  But this
must wait until someone shows a real-world need for a change.

Reported-by: Bob Copeland <bob@cozybit.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-08-18 17:05:32 -07:00
Paul E. McKenney
be77f87c00 Merge branches 'cbnum.2013.06.10a', 'doc.2013.06.10a', 'fixes.2013.06.10a', 'srcu.2013.06.10a' and 'tiny.2013.06.10a' into HEAD
cbnum.2013.06.10a: Apply simplifications stemming from the new callback
	numbering.

doc.2013.06.10a: Documentation updates.

fixes.2013.06.10a: Miscellaneous fixes.

srcu.2013.06.10a: Updates to SRCU.

tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.
2013-06-10 13:46:44 -07:00
Paul E. McKenney
7807acdb6b rcu: Remove TINY_PREEMPT_RCU tracing documentation
Because TINY_PREEMPT_RCU is no more, this commit removes its tracing
formats from the documentation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:52 -07:00
Paul E. McKenney
99f88919f8 rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw().
These interfaces never did get used, so this commit removes them,
their rcutorture tests, and documentation referencing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:25 -07:00
Frederic Weisbecker
c032862fba Merge commit '8700c95adb' into timers/nohz
The full dynticks tree needs the latest RCU and sched
upstream updates in order to fix some dependencies.

Merge a common upstream merge point that has these
updates.

Conflicts:
	include/linux/perf_event.h
	kernel/rcutree.h
	kernel/rcutree_plugin.h

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-05-02 17:54:19 +02:00
Frederic Weisbecker
3451d0243c nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON
We are planning to convert the dynticks Kconfig options layout
into a choice menu. The user must be able to easily pick
any of the following implementations: constant periodic tick,
idle dynticks, full dynticks.

As this implies a mutual exclusion, the two dynticks implementions
need to converge on the selection of a common Kconfig option in order
to ease the sharing of a common infrastructure.

It would thus seem pretty natural to reuse CONFIG_NO_HZ to
that end. It already implements all the idle dynticks code
and the full dynticks depends on all that code for now.
So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.

On the other hand we want to stay backward compatible: if
CONFIG_NO_HZ is set in an older config file, we want to
enable CONFIG_NO_HZ_IDLE by default.

But we can't afford both at the same time or we run into
a circular dependency:

1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
   CONFIG_NO_HZ
2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE

We might be able to support that from Kconfig/Kbuild but it
may not be wise to introduce such a confusing behaviour.

So to solve this, create a new CONFIG_NO_HZ_COMMON option
which gathers the common code between idle and full dynticks
(that common code for now is simply the idle dynticks code)
and select it from their referring Kconfig.

Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
to it for backward compatibility.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-03 13:56:03 +02:00
Paul E. McKenney
6d87669357 Merge branches 'doc.2013.03.12a', 'fixes.2013.03.13a' and 'idlenocb.2013.03.26b' into HEAD
doc.2013.03.12a: Documentation changes.

fixes.2013.03.13a: Miscellaneous fixes.

idlenocb.2013.03.26b: Remove restrictions on no-CBs CPUs, make
	RCU_FAST_NO_HZ take advantage of numbered callbacks, add
	callback acceleration based on numbered callbacks.
2013-03-26 08:07:38 -07:00
Paul E. McKenney
6231069bda rcu: Add softirq-stall indications to stall-warning messages
If RCU's softirq handler is prevented from executing, an RCU CPU stall
warning can result.  Ways to prevent RCU's softirq handler from executing
include: (1) CPU spinning with interrupts disabled, (2) infinite loop
in some softirq handler, and (3) in -rt kernels, an infinite loop in a
set of real-time threads running at priorities higher than that of RCU's
softirq handler.

Because this situation can be difficult to track down, this commit causes
the count of RCU softirq handler invocations to be printed with RCU
CPU stall warnings.  This information does require some interpretation,
as now documented in Documentation/RCU/stallwarn.txt.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-03-13 14:43:56 -07:00
Paul E. McKenney
3f944adb9d rcu: Documentation update
This commit applies a few updates based on a quick review of the RCU
documentations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-03-12 14:09:02 -07:00
Paul E. McKenney
4357fb570b rcu: Make bugginess of code sample more evident
One of the code samples in whatisRCU.txt shows a bug, but someone scanning
the document quickly might mistake it for a valid use of RCU.  Add some
screaming comments to help keep speed-readers on track.

Reported-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-03-12 14:09:01 -07:00
Paul E. McKenney
aac1cda34b Merge branches 'urgent.2012.10.27a', 'doc.2012.11.16a', 'fixes.2012.11.13a', 'srcu.2012.10.27a', 'stall.2012.11.13a', 'tracing.2012.11.08a' and 'idle.2012.10.24a' into HEAD
urgent.2012.10.27a: Fix for RCU user-mode transition (already in -tip).

doc.2012.11.08a: Documentation updates, most notably codifying the
	memory-barrier guarantees inherent to grace periods.

fixes.2012.11.13a: Miscellaneous fixes.

srcu.2012.10.27a: Allow statically allocated and initialized srcu_struct
	structures (courtesy of Lai Jiangshan).

stall.2012.11.13a: Add more diagnostic information to RCU CPU stall
	warnings, also decrease from 60 seconds to 21 seconds.

hotplug.2012.11.08a: Minor updates to CPU hotplug handling.

tracing.2012.11.08a: Improved debugfs tracing, courtesy of Michael Wang.

idle.2012.10.24a: Updates to RCU idle/adaptive-idle handling, including
	a boot parameter that maps normal grace periods to expedited.

Resolved conflict in kernel/rcutree.c due to side-by-side change.
2012-11-16 09:59:58 -08:00
Paul E. McKenney
d484a21513 rcu: Add documentation for the new rcuexp debugfs trace file
This commit adds the documentation of the rcuexp debugfs trace file
that records statistics for expedited grace periods.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-16 09:55:31 -08:00
Paul E. McKenney
40e80c469f rcu: Update documentation for TREE_RCU debugfs tracing
This commit updates the tracing documentation to reflect the new
format that has per-RCU-flavor directories.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-16 09:54:02 -08:00
Paul E. McKenney
bb08f76d84 rcu: Remove list_for_each_continue_rcu()
The list_for_each_continue_rcu() macro is no longer used, so this commit
removes it.  The list_for_each_entry_continue_rcu() macro should be
used instead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-13 14:08:21 -08:00
Paul E. McKenney
a4d611fdca rcu: Document alternative RCU/reference-count algorithms
The approach for mixing RCU and reference counting listed in the RCU
documentation only describes one possible approach.  This approach can
result in failure on the read side, which is nice if you want fresh data,
but not so good if you want simple code.  This commit therefore adds
two additional approaches that feature unconditional reference-count
acquisition by RCU readers.  These approaches are very similar to that
used in the security code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-08 11:44:38 -08:00
Kees Cook
57d34a6cee rcu: Update docs to include kfree_rcu()
Mention kfree_rcu() in the call_rcu() section.  Additionally fix the
example code for list replacement that used the wrong structure element.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-08 11:44:25 -08:00
Dhaval Giani
0f9574d832 rcu: Correct the name of a reference in list of RCU papers
Trying to go through the history of RCU (not for the weak
minded) led me to search for a non-existent paper.

Correct it to the actual reference

Signed-off-by: Dhaval Giani <dhaval.giani@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23 14:44:47 -07:00
Paul E. McKenney
bda4ec9f6a Merge branches 'bigrt.2012.09.23a', 'doctorture.2012.09.23a', 'fixes.2012.09.23a', 'hotplug.2012.09.23a' and 'idlechop.2012.09.23a' into HEAD
bigrt.2012.09.23a contains additional commits to reduce scheduling latency
	from RCU on huge systems (many hundrends or thousands of CPUs).

doctorture.2012.09.23a contains documentation changes and rcutorture fixes.

fixes.2012.09.23a contains miscellaneous fixes.

hotplug.2012.09.23a contains CPU-hotplug-related changes.

idle.2012.09.23a fixes architectures for which RCU no longer considered
	the idle loop to be a quiescent state due to earlier
	adaptive-dynticks changes.  Affected architectures are alpha,
	cris, frv, h8300, m32r, m68k, mn10300, parisc, score, xtensa,
	and ia64.
2012-09-24 20:02:22 -07:00
Paul E. McKenney
86f343b50b rcu: Fix CONFIG_RCU_FAST_NO_HZ stall warning message
The print_cpu_stall_fast_no_hz() function attempts to print -1 when
the ->idle_gp_timer is not pending, but unsigned arithmetic causes it
to instead print ULONG_MAX, which is 4294967295 on 32-bit systems and
18446744073709551615 on 64-bit systems.  Neither of these are the most
reader-friendly values, so this commit instead causes "timer not pending"
to be printed when ->idle_gp_timer is not pending.

Reported-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-09-23 07:42:52 -07:00
Paul E. McKenney
2aef619c75 rcu: Document SRCU dead-CPU capabilities, emphasize read-side limits
The current documentation did not help someone grepping for SRCU to
learn that disabling preemption is not a replacement for srcu_read_lock(),
so upgrade the documentation to bring this out, not just for SRCU,
but also for RCU-bh.  Also document the fact that SRCU readers are
respected on CPUs executing in user mode, idle CPUs, and even on
offline CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2012-09-23 07:42:23 -07:00
Paul E. McKenney
4605c0143c rcu: Adjust debugfs tracing for kthread-based quiescent-state forcing
Moving quiescent-state forcing into a kthread dispenses with the need
for the ->n_rp_need_fqs field, so this commit removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23 07:41:54 -07:00
Paul E. McKenney
74d874e7bd rcu: Update documentation to cover call_srcu() and srcu_barrier().
The advent of call_srcu() and srcu_barrier() obsoleted some of the
documentation, so this commit brings that up to date.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-07-02 12:34:03 -07:00
Paul E. McKenney
fae4b54f28 rcu: Introduce rcutorture testing for rcu_barrier()
Although rcutorture does invoke rcu_barrier() and friends, it cannot
really be called a torture test given that it invokes them only once
at the end of the test.  This commit therefore introduces heavy-duty
rcutorture testing for rcu_barrier(), which may be carried out
concurrently with normal rcutorture testing.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:18 -07:00
Paul E. McKenney
236fefafe5 rcu: Call out dangers of expedited RCU primitives
The expedited RCU primitives can be quite useful, but they have some
high costs as well.  This commit updates and creates docbook comments
calling out the costs, and updates the RCU documentation as well.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:06:08 -08:00
Paul E. McKenney
2036d94a7b rcu: Rework detection of use of RCU by offline CPUs
Because newly offlined CPUs continue executing after completing the
CPU_DYING notifiers, they legitimately enter the scheduler and use
RCU while appearing to be offline.  This calls for a more sophisticated
approach as follows:

1.	RCU marks the CPU online during the CPU_UP_PREPARE phase.

2.	RCU marks the CPU offline during the CPU_DEAD phase.

3.	Diagnostics regarding use of read-side RCU by offline CPUs use
	RCU's accounting rather than the cpu_online_map.  (Note that
	__call_rcu() still uses cpu_online_map to detect illegal
	invocations within CPU_DYING notifiers.)

4.	Offline CPUs are prevented from hanging the system by
	force_quiescent_state(), which pays attention to cpu_online_map.
	Some additional work (in a later commit) will be needed to
	guarantee that force_quiescent_state() waits a full jiffy before
	assuming that a CPU is offline, for example, when called from
	idle entry.  (This commit also makes the one-jiffy wait
	explicit, since the old-style implicit wait can now be defeated
	by RCU_FAST_NO_HZ and by rcutorture.)

This approach avoids the false positives encountered when attempting to
use more exact classification of CPU online/offline state.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:06:07 -08:00
Paul E. McKenney
24cd7fd0ea rcu: Update stall-warning documentation
Add documentation of CONFIG_RCU_CPU_STALL_VERBOSE, CONFIG_RCU_CPU_STALL_INFO,
and RCU_STALL_DELAY_DELTA.  Describe multiple stall-warning messages from
a single stall, and the timing of the subsequent messages.  Add headings.
Remove RCU_SECONDS_TILL_STALL_RECHECK because this value is now computed
at runtime from RCU_CPU_STALL_TIMEOUT, so that sysfs changes to the timeout
value now directly affect the RCU_SECONDS_TILL_STALL_RECHECK value.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:52 -08:00
Paul E. McKenney
c13f3757d0 rcu: Add CPU-stall capability to rcutorture
Add module parameters to rcutorture that induce a CPU stall.
The stall_cpu parameter specifies how long to stall in seconds,
defaulting to zero, which indicates no stalling is to be undertaken.
The stall_cpu_holdoff parameter specifies how many seconds after
insmod (or boot, if rcutorture is built into the kernel) that this
stall is to start.  The default value for stall_cpu_holdoff is ten
seconds.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:52 -08:00
Paul E. McKenney
105617da8d rcu: Make documentation give more realistic rcutorture duration
The torture.txt documentation gives an example rcutorture run with a
100-second duration.  This is ridiculously short, unless maybe testing
a fix for a egregious bug.  Use a more-realistic one-hour duration for
the example.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:51 -08:00
Paul E. McKenney
9b9ec9b90e rcutorture: Permit holding off CPU-hotplug operations during boot
When rcutorture is started automatically at boot time, it might well
also start CPU-hotplug operations at that time, which might not be
desirable.  This commit therefore adds an rcutorture parameter that
allows CPU-hotplug operations to be held off for the specified number
of seconds after the start of boot.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:50 -08:00
Paul E. McKenney
4c62abc90b rcu: Bring RTFP.txt up to date.
Add publications from 2010 and 2011 to RTFP.txt.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:21 -08:00
Kees Cook
d493011a37 docs: Additional LWN links to RCU API
Tyler Hicks pointed me at an additional article on RCU and I figured
it should probably be mentioned with the others.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:32:23 -08:00
Paul E. McKenney
b58bdccaa8 rcu: Add rcutorture CPU-hotplug capability
Running CPU-hotplug operations concurrently with rcutorture has
historically been a good way to find bugs in both RCU and CPU hotplug.
This commit therefore adds an rcutorture module parameter called
"onoff_interval" that causes a randomly selected CPU-hotplug operation to
be executed at the specified interval, in seconds.  The default value of
"onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug
operations.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:56 -08:00
Paul E. McKenney
d5f546d834 rcu: Add rcutorture system-shutdown capability
Although it is easy to run rcutorture tests under KVM, there is currently
no nice way to run such a test for a fixed time period, collect all of
the rcutorture data, and then shut the system down cleanly.  This commit
therefore adds an rcutorture module parameter named "shutdown_secs" that
specified the run duration in seconds, after which rcutorture terminates
the test and powers the system down.  The default value for "shutdown_secs"
is zero, which disables shutdown.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:46 -08:00
Paul E. McKenney
9ceae0e248 rcu: Add documentation for raw SRCU read-side primitives
Update various files in Documentation/RCU to reflect srcu_read_lock_raw()
and srcu_read_unlock_raw().  Credit to Peter Zijlstra for suggesting
use of the existing _raw suffix instead of the earlier bulkref names.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:41 -08:00
Paul E. McKenney
2c01531f08 rcu: Document failing tick as cause of RCU CPU stall warning
One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
These turned out to be caused by a bug that stopped scheduling-clock
tick interrupts from being sent to a given CPU for several hundred seconds.
This commit therefore updates the documentation to call this out as a
possible cause for RCU CPU stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:26 -08:00
Paul E. McKenney
9b2e4f1880 rcu: Track idleness independent of idle tasks
Earlier versions of RCU used the scheduling-clock tick to detect idleness
by checking for the idle task, but handled idleness differently for
CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
critical sections in the idle task, for example, for tracing.  A more
fine-grained detection of idleness is therefore required.

This commit presses the old dyntick-idle code into full-time service,
so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
always invoked at the beginning of an idle loop iteration.  Similarly,
rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
at the end of an idle-loop iteration.  This allows the idle task to
use RCU everywhere except between consecutive rcu_idle_enter() and
rcu_idle_exit() calls, in turn allowing architecture maintainers to
specify exactly where in the idle loop that RCU may be used.

Because some of the userspace upcall uses can result in what looks
to RCU like half of an interrupt, it is not possible to expect that
the irq_enter() and irq_exit() hooks will give exact counts.  This
patch therefore expands the ->dynticks_nesting counter to 64 bits
and uses two separate bitfields to count process/idle transitions
and interrupt entry/exit transitions.  It is presumed that userspace
upcalls do not happen in the idle loop or from usermode execution
(though usermode might do a system call that results in an upcall).
The counter is hard-reset on each process/idle transition, which
avoids the interrupt entry/exit error from accumulating.  Overflow
is avoided by the 64-bitness of the ->dyntick_nesting counter.

This commit also adds warnings if a non-idle task asks RCU to enter
idle state (and these checks will need some adjustment before applying
Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
In addition, validation of ->dynticks and ->dynticks_nesting is added.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:24 -08:00
Paul E. McKenney
d7bd2d68aa rcu: Document interpretation of RCU-lockdep splats
There has been quite a bit of confusion about what RCU-lockdep splats
mean, so this commit adds some documentation describing how to
interpret them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:38:28 -07:00
Paul E. McKenney
8cd889cbb6 rcu: Update documentation for additional RCU lockdep functions
Add documentation for rcu_dereference_bh_check(),
rcu_dereference_sched_check(), srcu_dereference_check(), and
rcu_dereference_index_check().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:38:25 -07:00
Michal Hocko
e5177ec77d rcu: Not necessary to pass rcu_read_lock_held() to rcu_dereference_protected()
Since ca5ecddf (rcu: define __rcu address space modifier for sparse)
rcu_dereference_check() use rcu_read_lock_held() as a part of condition
automatically.  Therefore, callers of rcu_dereference_check() no longer
need to pass rcu_read_lock_held() to rcu_dereference_check().

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:38:23 -07:00
Paul E. McKenney
e4cc1f22b2 rcu: Simplify quiescent-state accounting
There is often a delay between the time that a CPU passes through a
quiescent state and the time that this quiescent state is reported to the
RCU core.  It is quite possible that the grace period ended before the
quiescent state could be reported, for example, some other CPU might have
deduced that this CPU passed through dyntick-idle mode.  It is critically
important that quiescent state be counted only against the grace period
that was in effect at the time that the quiescent state was detected.

Previously, this was handled by recording the number of the last grace
period to complete when passing through a quiescent state.  The RCU
core then checks this number against the current value, and rejects
the quiescent state if there is a mismatch.  However, one additional
possibility must be accounted for, namely that the quiescent state was
recorded after the prior grace period completed but before the current
grace period started.  In this case, the RCU core must reject the
quiescent state, but the recorded number will match.  This is handled
when the CPU becomes aware of a new grace period -- at that point,
it invalidates any prior quiescent state.

This works, but is a bit indirect.  The new approach records the current
grace period, and the RCU core checks to see (1) that this is still the
current grace period and (2) that this grace period has not yet ended.
This approach simplifies reasoning about correctness, and this commit
changes over to this new approach.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:38:22 -07:00
Paul E. McKenney
b15a2e7d16 rcu: Fix RCU's NMI documentation
It has long been the case that the architecture must call nmi_enter()
and nmi_exit() rather than irq_enter() and irq_exit() in order to
permit RCU read-side critical sections in NMIs.  Catch the documentation
up with reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-09-28 21:36:44 -07:00
Paul E. McKenney
bdf2a43649 rcu: Catch rcutorture up to new RCU API additions
Now that the RCU API contains synchronize_rcu_bh(), synchronize_sched(),
call_rcu_sched(), and rcu_bh_expedited()...

Make rcutorture test synchronize_rcu_bh(), getting rid of the old
rcu_bh_torture_synchronize() workaround.  Similarly, make rcutorture test
synchronize_sched(), getting rid of the old sched_torture_synchronize()
workaround.  Make rcutorture test call_rcu_sched() instead of wrappering
synchronize_sched().  Also add testing of rcu_bh_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:36:43 -07:00