linux-uconsole/kernel
Paul E. McKenney 3f6ea7b4b5 rcu: Fix day-one dyntick-idle stall-warning bug
commit a10d206ef1 upstream.

Each grace period is supposed to have at least one callback waiting
for that grace period to complete.  However, if CONFIG_NO_HZ=n, an
extra callback-free grace period is no big problem -- it will chew up
a tiny bit of CPU time, but it will complete normally.  In contrast,
CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to
sleep indefinitely, in turn indefinitely delaying completion of the
callback-free grace period.  Given that nothing is waiting on this grace
period, this is also not a problem.

That is, unless RCU CPU stall warnings are also enabled, as they are
in recent kernels.  In this case, if a CPU wakes up after at least one
minute of inactivity, an RCU CPU stall warning will result.  The reason
that no one noticed until quite recently is that most systems have enough
OS noise that they will never remain absolutely idle for a full minute.
But there are some embedded systems with cut-down userspace configurations
that consistently get into this situation.

All this begs the question of exactly how a callback-free grace period
gets started in the first place.  This can happen due to the fact that
CPUs do not necessarily agree on which grace period is in progress.
If a CPU still believes that the grace period that just completed is
still ongoing, it will believe that it has callbacks that need to wait for
another grace period, never mind the fact that the grace period that they
were waiting for just completed.  This CPU can therefore erroneously
decide to start a new grace period.  Note that this can happen in
TREE_RCU and TREE_PREEMPT_RCU even on a single-CPU system:  Deadlock
considerations mean that the CPU that detected the end of the grace
period is not necessarily officially informed of this fact for some time.

Once this CPU notices that the earlier grace period completed, it will
invoke its callbacks.  It then won't have any callbacks left.  If no
other CPU has any callbacks, we now have a callback-free grace period.

This commit therefore makes CPUs check more carefully before starting a
new grace period.  This new check relies on an array of tail pointers
into each CPU's list of callbacks.  If the CPU is up to date on which
grace periods have completed, it checks to see if any callbacks follow
the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks
follow the RCU_WAIT_TAIL segment.  The reason that this works is that
the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment
as soon as the CPU is officially notified that the old grace period
has ended.

This change is to cpu_needs_another_gp(), which is called in a number
of places.  The only one that really matters is in rcu_start_gp(), where
the root rcu_node structure's ->lock is held, which prevents any
other CPU from starting or completing a grace period, so that the
comparison that determines whether the CPU is missing the completion
of a grace period is stable.

Reported-by: Becky Bruce <bgillbruce@gmail.com>
Reported-by: Subodh Nijsure <snijsure@grid-net.com>
Reported-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13 05:28:11 +09:00
..
debug kgdb,debug_core: pass the breakpoint struct instead of address and memory 2012-04-13 08:14:07 -07:00
events perf_event: Switch to internal refcount, fix race with close() 2012-10-02 09:47:24 -07:00
gcov gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL 2011-06-15 20:04:01 -07:00
irq random: remove rand_initialize_irq() 2012-08-15 12:04:28 -07:00
power ftrace: Disable function tracing during suspend/resume and hibernation, again 2012-08-09 08:27:35 -07:00
time time: Move ktime_t overflow checking into timespec_valid_strict 2012-10-02 09:47:52 -07:00
trace tracing: change CPU ring buffer state from tracing_cpumask 2012-07-16 08:47:51 -07:00
.gitignore
acct.c
async.c Fix a dead loop in async_synchronize_full() 2012-10-02 09:47:41 -07:00
audit.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
audit.h audit: make functions static 2010-10-30 01:42:19 -04:00
audit_tree.c audit: fix refcounting in audit-tree 2012-09-14 10:00:38 -07:00
audit_watch.c kill path_lookup() 2011-03-14 09:15:23 -04:00
auditfilter.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
auditsc.c audit: acquire creds selectively to reduce atomic op overhead 2011-04-27 15:11:03 +02:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c Merge branch 'master' into next 2011-05-19 18:51:57 +10:00
cgroup.c cgroup: fix to allow mounting a hierarchy by name 2012-01-12 11:35:08 -08:00
cgroup_freezer.c cgroup_freezer: fix freezing groups with stopped tasks 2011-12-09 08:52:27 -08:00
compat.c compat: Fix RT signal mask corruption via sigprocmask 2012-05-21 09:40:04 -07:00
configs.c
cpu.c PM / Sleep: Fix race between CPU hotplug and freezer 2012-01-12 11:35:46 -08:00
cpuset.c cpuset: mm: reduce large amounts of memory barrier related damage v3 2012-08-01 12:27:20 -07:00
crash_dump.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
cred.c cred: copy_process() should clear child->replacement_session_keyring 2012-04-13 08:14:08 -07:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c sched: Fix ancient race in do_exit() 2012-10-02 09:47:55 -07:00
extable.c extable, core_kernel_data(): Make sure all archs define _sdata 2011-05-20 08:56:56 +02:00
fork.c cpuset: mm: reduce large amounts of memory barrier related damage v3 2012-08-01 12:27:20 -07:00
freezer.c Freezer: Use SMP barriers 2011-05-17 23:19:17 +02:00
futex.c futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi() 2012-08-09 08:27:54 -07:00
futex_compat.c futex: Do not leak robust list to unprivileged process 2012-04-22 16:21:45 -07:00
groups.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
hrtimer.c hrtimer: Update hrtimer base offsets each hrtimer_interrupt 2012-07-19 08:58:46 -07:00
hung_task.c hung_task: fix false positive during vfork 2012-01-06 14:14:13 -08:00
irq_work.c irq_work: Use per cpu atomics instead of regular atomics 2010-12-18 15:54:48 +01:00
itimer.c
jump_label.c jump_label: jump_label_inc may return before the code is patched 2011-12-09 08:52:50 -08:00
kallsyms.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks arch:Kconfig.locks Remove unused config option. 2011-04-10 17:01:05 +02:00
Kconfig.preempt
kexec.c PM: Remove sysdev suspend, resume and shutdown operations 2011-05-11 21:37:15 +02:00
kfifo.c
kmod.c kmod: prevent kmod_loop_msg overflow in __request_module() 2011-11-11 09:35:48 -08:00
kprobes.c kprobes: adjust "fix a memory leak in function pre_handler_kretprobe()" 2012-03-12 10:32:57 -07:00
ksysfs.c kernel/ksysfs.c: expose file_caps_enabled in sysfs 2011-04-19 16:45:51 -07:00
kthread.c cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed 2011-05-28 17:02:57 +02:00
latencytop.c Fix common misspellings 2011-03-31 11:26:23 -03:00
lockdep.c lockdep: Fix lock_is_held() on recursion 2011-06-07 12:25:50 +02:00
lockdep_internals.h
lockdep_proc.c lockdep: Remove unused 'factor' variable from lockdep_stats_show() 2011-03-23 13:54:47 +01:00
lockdep_states.h
Makefile cgroup: remove the ns_cgroup 2011-05-26 17:12:34 -07:00
module.c module: Remove module size limit 2012-04-02 09:27:20 -07:00
mutex-debug.c mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c lockdep, mutex: provide mutex_lock_nest_lock 2011-05-25 08:39:17 -07:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c
nsproxy.c cgroup: remove the ns_cgroup 2011-05-26 17:12:34 -07:00
padata.c Fix common misspellings 2011-03-31 11:26:23 -03:00
panic.c lockdep, bug: Exclude TAINT_FIRMWARE_WORKAROUND from disabling lockdep 2012-02-13 11:06:10 -08:00
params.c params.c: Use new strtobool function to process boolean inputs 2011-05-19 16:55:28 +09:30
pid.c next_pidmap: fix overflow condition 2011-04-18 10:35:30 -07:00
pid_namespace.c pidns: call pid_ns_prepare_proc() from create_pid_namespace() 2011-03-23 19:46:58 -07:00
pm_qos_params.c Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 2011-05-29 11:18:09 -07:00
posix-cpu-timers.c cputimer: Cure lock inversion 2011-10-25 07:10:14 +02:00
posix-timers.c posix-timers: RCU conversion 2011-05-24 12:10:51 +02:00
printk.c cap_syslog: don't use WARN_ONCE for CAP_SYS_ADMIN deprecation warning 2012-02-03 09:18:57 -08:00
profile.c kernel/profile.c: remove some duplicate code from profile_hits() 2011-05-26 17:12:37 -07:00
ptrace.c ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread 2011-05-25 19:20:21 +02:00
range.c kernel/range.c: fix clean_sort_range() for the case of full array 2010-11-12 07:55:31 -08:00
rcupdate.c rcu: Use WARN_ON_ONCE for DEBUG_OBJECTS_RCU_HEAD warnings 2011-05-05 23:16:57 -07:00
rcutiny.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
rcutiny_plugin.h rcu: Converge TINY_RCU expedited and normal boosting 2011-05-05 23:16:58 -07:00
rcutorture.c rcu: mark rcutorture boosting callback as being on-stack 2011-05-05 23:16:57 -07:00
rcutree.c rcu: Fix day-one dyntick-idle stall-warning bug 2012-10-13 05:28:11 +09:00
rcutree.h rcu: Move RCU_BOOST #ifdefs to header file 2011-06-16 16:12:05 -07:00
rcutree_plugin.h softirq,rcu: Inform RCU of irq_exit() activity 2011-07-20 10:50:12 -07:00
rcutree_trace.c rcu: use softirq instead of kthreads except when RCU_BOOST=y 2011-06-15 23:07:21 -07:00
relay.c relay: prevent integer overflow in relay_open() 2012-02-20 12:48:10 -08:00
res_counter.c memcg: res_counter_read_u64(): fix potential races on 32-bit machines 2011-03-23 19:46:22 -07:00
resource.c resource: ability to resize an allocated resource 2011-07-06 10:54:08 -07:00
rtmutex-debug.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex-debug.h
rtmutex-tester.c rtmutex: tester: Remove the remaining BKL leftovers 2011-02-22 22:07:22 +01:00
rtmutex.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex.h
rtmutex_common.h rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rwsem.c
sched.c sched: Fix race in task_group() 2012-10-02 09:47:42 -07:00
sched_autogroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sched_autogroup.h sched, autogroup: Stop going ahead if autogroup is disabled 2011-02-23 11:33:59 +01:00
sched_clock.c sched: Add some clock info to sched_debug 2010-11-23 10:29:08 +01:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c sched: Get rid of lock_depth 2011-04-24 13:18:38 +02:00
sched_fair.c sched: Break out cpu_power from the sched_group structure 2011-07-20 18:32:40 +02:00
sched_features.h sched: Allow for overlapping sched_domain spans 2011-07-20 18:32:41 +02:00
sched_idletask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched_rt.c sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW 2012-02-13 11:06:08 -08:00
sched_stats.h sched: More sched_domain iterations fixes 2011-05-28 17:02:54 +02:00
sched_stoptask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
seccomp.c
semaphore.c
signal.c ptrace: don't clear GROUP_STOP_SIGMASK on double-stop 2011-11-11 09:36:23 -08:00
smp.c generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts 2011-06-17 10:17:12 +02:00
softirq.c softirq,rcu: Inform RCU of irq_exit() activity 2011-07-20 10:50:12 -07:00
spinlock.c
srcu.c rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status 2011-01-14 04:56:49 -08:00
stacktrace.c
stop_machine.c x86, mtrr: lock stop machine during MTRR rendezvous sequence 2011-08-29 13:29:08 -07:00
sys.c kernel/sys.c: call disable_nonboot_cpus() in kernel_restart() 2012-10-13 05:28:03 +09:00
sys_ni.c ipc: Add missing sys_ni entries for ipc/compat.c functions 2011-05-20 13:53:02 -07:00
sysctl.c sysctl: fix write access to dmesg_restrict/kptr_restrict 2012-04-13 08:14:07 -07:00
sysctl_binary.c binary_sysctl(): fix memory leak 2012-01-06 14:13:50 -08:00
sysctl_check.c sysctl_check: drop dead code 2011-03-23 19:46:51 -07:00
taskstats.c Make TASKSTATS require root access 2011-12-21 12:57:40 -08:00
test_kprobes.c
time.c time: Change jiffies_to_clock_t() argument type to unsigned long 2011-11-11 09:35:52 -08:00
timeconst.pl
timer.c timers: Consider slack value in mod_timer() 2011-06-03 15:02:32 +02:00
tracepoint.c jump label: Introduce static_branch() interface 2011-04-04 12:48:08 -04:00
tsacct.c taskstats: use real microsecond granularity for CPU times 2010-10-27 18:03:17 -07:00
uid16.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
up.c
user-return-notifier.c Fix common misspellings 2011-03-31 11:26:23 -03:00
user.c userns: add a user_namespace as creator/owner of uts_namespace 2011-03-23 19:46:59 -07:00
user_namespace.c user_ns: improve the user_ns on-the-slab packaging 2011-01-13 08:03:18 -08:00
utsname.c ns proc: Add support for the uts namespace 2011-05-10 14:35:35 -07:00
utsname_sysctl.c
wait.c Fix common misspellings 2011-03-31 11:26:23 -03:00
watchdog.c kernel/watchdog.c: Use proper ANSI C prototypes 2011-05-23 21:07:40 -07:00
workqueue.c workqueue: add missing smp_wmb() in process_one_work() 2012-10-13 05:28:03 +09:00
workqueue_sched.h