linux-uconsole/kernel
Steven Rostedt (VMware) 102d3ecfe1 UPSTREAM: sched/core: Call __schedule() from do_idle() without enabling preemption
I finally got around to creating trampolines for dynamically allocated
ftrace_ops with using synchronize_rcu_tasks(). For users of the ftrace
function hook callbacks, like perf, that allocate the ftrace_ops
descriptor via kmalloc() and friends, ftrace was not able to optimize
the functions being traced to use a trampoline because they would also
need to be allocated dynamically. The problem is that they cannot be
freed when CONFIG_PREEMPT is set, as there's no way to tell if a task
was preempted on the trampoline. That was before Paul McKenney
implemented synchronize_rcu_tasks() that would make sure all tasks
(except idle) have scheduled out or have entered user space.

While testing this, I triggered this bug:

BUG: unable to handle kernel paging request at ffffffffa0230077
 ...
RIP: 0010:0xffffffffa0230077
 ...
Call Trace:
  schedule+0x5/0xe0
  schedule_preempt_disabled+0x18/0x30
  do_idle+0x172/0x220

What happened was that the idle task was preempted on the trampoline.
As synchronize_rcu_tasks() ignores the idle thread, there's nothing
that lets ftrace know that the idle task was preempted on a trampoline.

The idle task shouldn't need to ever enable preemption. The idle task
is simply a loop that calls schedule or places the cpu into idle mode.
In fact, having preemption enabled is inefficient, because it can
happen when idle is just about to call schedule anyway, which would
cause schedule to be called twice. Once for when the interrupt came in
and was returning back to normal context, and then again in the normal
path that the idle loop is running in, which would be pointless, as it
had already scheduled.

The only reason schedule_preempt_disable() enables preemption is to be
able to call sched_submit_work(), which requires preemption enabled. As
this is a nop when the task is in the RUNNING state, and idle is always
in the running state, there's no reason that idle needs to enable
preemption. But that means it cannot use schedule_preempt_disable() as
other callers of that function require calling sched_submit_work().

Adding a new function local to kernel/sched/ that allows idle to call
the scheduler without enabling preemption, fixes the
synchronize_rcu_tasks() issue, as well as removes the pointless spurious
schedule calls caused by interrupts happening in the brief window where
preemption is enabled just before it calls schedule.

Reviewed: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170414084809.3dacde2a@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----Shawn: trace on 4.4 for RK3308 -------------------------
[  151.389904] BUG: scheduling while atomic: swapper/0/0/0x00000000
[  151.390478] Modules linked in:
[  151.390813] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.126 #1327
[  151.390830] Hardware name: Rockchip RK3308 evb digital-i2s mic board
(DT)
[  151.390844] Call trace:
[  151.390868] [<ffffff800808731c>] dump_backtrace+0x0/0x1c4
[  151.390883] [<ffffff80080874f4>] show_stack+0x14/0x1c
[  151.390900] [<ffffff80081e4274>] dump_stack+0x94/0xbc
[  151.390919] [<ffffff80080b4c6c>] __schedule_bug+0x3c/0x54
[  151.390938] [<ffffff800857e978>] __schedule+0x88/0x45c
[  151.390953] [<ffffff800857edc0>] schedule+0x74/0x94
[  151.390971] [<ffffff800857f118>] schedule_preempt_disabled+0x20/0x38
[  151.390987] [<ffffff80080c9d74>] cpu_startup_entry+0x44/0x204
[  151.391007] [<ffffff800857cda0>] rest_init+0x80/0x8c
[  151.391025] [<ffffff8008750b04>] start_kernel+0x31c/0x330
[  151.391040] [<ffffff80087501c4>] __primary_switched+0x30/0x6c
-------------------------------------------------------
Change-Id: I12971dfe9c2039920162326aabe1df0ecaf79804
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
(cherry-picked from 8663effb24)
2018-04-26 19:23:47 +08:00
..
bpf bpf: skip unnecessary capability check 2018-03-28 18:40:17 +02:00
configs UPSTREAM: config: android-base: disable CONFIG_NFSD and CONFIG_NFS_FS 2018-03-05 21:56:13 +05:30
debug Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-12-19 13:08:40 +08:00
events Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-11-20 20:53:19 +05:30
gcov Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2018-02-01 12:02:38 +08:00
irq Revert "genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs" 2018-03-31 18:12:32 +02:00
livepatch livepatch: x86: fix relocation computation with kASLR 2015-11-11 17:36:04 +01:00
locking BACKPORT: kernel: add kcov code coverage 2018-01-22 13:15:43 +05:30
power Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git 2017-11-02 17:00:07 +08:00
printk Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git 2018-04-08 18:28:30 +08:00
rcu BACKPORT: kernel: add kcov code coverage 2018-01-22 13:15:43 +05:30
sched UPSTREAM: sched/core: Call __schedule() from do_idle() without enabling preemption 2018-04-26 19:23:47 +08:00
time Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git 2018-04-08 18:28:30 +08:00
trace Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2018-04-05 18:31:12 +01:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:27:08 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-16 20:09:45 +01:00
audit.c Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-12-19 13:08:40 +08:00
audit.h audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_fsnotify.c audit: clean simple fsnotify implementation 2015-08-06 16:14:53 -04:00
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c audit: Fix use after free in audit_remove_watch_rule() 2017-08-24 17:02:35 -07:00
auditfilter.c audit: fix comment block whitespace 2015-11-04 08:23:51 -05:00
auditsc.c BACKPORT: audit: consistently record PIDs with task_tgid_nr() 2016-10-12 17:34:22 +05:30
backtracetest.c
bounds.c
capability.c exec: Ensure mm->user_ns contains the execed files 2017-01-06 11:16:14 +01:00
cgroup.c Revert "cgroup: Change from CAP_SYS_NICE to CAP_SYS_RESOURCE for cgroup migration permissions" 2017-10-17 19:10:13 +08:00
cgroup_freezer.c cgroup: fix handling of multi-destination migration from subtree_control enabling 2015-12-03 10:18:21 -05:00
cgroup_pids.c cgroup_pids: don't account for the root cgroup 2015-12-03 10:18:21 -05:00
compat.c
configs.c
context_tracking.c context_tracking: avoid irq_save/irq_restore on guest entry and exit 2015-11-10 12:06:23 +01:00
cpu.c Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-01-13 12:01:52 +08:00
cpu_pm.c kernel/cpu_pm: fix cpu_cluster_pm_exit comment 2015-09-03 02:42:20 +02:00
cpuset.c Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-10-13 23:14:45 +08:00
crash_dump.c
cred.c cred: Reject inodes with invalid ids in set_create_file_as() 2016-09-15 08:27:49 +02:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c BACKPORT: exit_thread: accept a task parameter to be exited 2018-03-05 21:56:13 +05:30
extable.c kernel/extable.c: mark core_kernel_text notrace 2017-07-21 07:44:56 +02:00
fork.c BACKPORT: kernel: add kcov code coverage 2018-01-22 13:15:43 +05:30
freezer.c
futex.c Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4 2018-01-31 13:33:20 +08:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:27:10 +01:00
hung_task.c
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c jump_label: Invoke jump_label_test() via early_initcall() 2017-12-16 10:33:55 +01:00
kallsyms.c
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c UPSTREAM: kcov: fix comparison callback signature 2018-01-22 13:15:43 +05:30
kexec.c s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() 2017-10-25 11:26:32 +08:00
kexec_core.c s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() 2017-10-25 11:26:32 +08:00
kexec_file.c kexec: introduce a protection mechanism for the crashkernel reserved memory 2017-10-25 11:23:52 +08:00
kexec_internal.h kexec: split kexec_file syscall code to kexec_file.c 2015-09-10 13:29:01 -07:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c tracing/kprobes: Enforce kprobes teardown after testing 2017-05-25 14:30:17 +02:00
ksysfs.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
kthread.c UPSTREAM: kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function 2017-08-11 19:31:04 +05:30
latencytop.c
Makefile BACKPORT: kernel: add kcov code coverage 2018-01-22 13:15:43 +05:30
membarrier.c Fix: Disable sys_membarrier when nohz_full is enabled 2017-03-12 06:37:26 +01:00
memremap.c mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} 2017-01-19 20:17:18 +01:00
module-internal.h
module.c Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2018-03-05 20:20:17 +05:30
module_signing.c KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c
padata.c padata: free correct variable 2017-05-20 14:27:02 +02:00
panic.c kernel/panic.c: add missing \n 2017-07-05 14:37:19 +02:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid.c pids: make task_tgid_nr_ns() safe 2017-08-24 17:02:36 -07:00
pid_namespace.c pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes 2017-05-25 14:30:11 +02:00
profile.c profile: hide unused functions when !CONFIG_PROC_FS 2018-02-25 11:03:44 +01:00
ptrace.c ptrace: Properly initialize ptracer_cred on fork 2017-06-14 13:16:20 +02:00
range.c
reboot.c i2c: rk3x: Make sure the i2c transfer to be finished before system reboot 2017-05-26 12:03:12 +08:00
relay.c
resource.c /proc/iomem: only expose physical resource addresses to privileged users 2017-08-06 19:19:42 -07:00
seccomp.c seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter() 2017-10-05 09:41:46 +02:00
signal.c Merge branch 'linux-linaro-lsk-v4.4-android' of git://git.linaro.org/kernel/linux-linaro-stable.git 2018-01-26 19:26:47 +08:00
smp.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
smpboot.c stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() 2015-10-20 10:23:55 +02:00
smpboot.h
softirq.c UPSTREAM: arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections 2018-01-22 13:15:43 +05:30
stacktrace.c
stop_machine.c kernel: remove stop_machine() Kconfig dependency 2015-12-12 10:15:34 -08:00
sys.c BACKPORT: timer: convert timer_slack_ns from unsigned long to u64 2016-07-11 12:43:04 +05:30
sys_ni.c mm: mlock: add new mlock system call 2015-11-05 19:34:48 -08:00
sysctl.c cpufreq: Drop schedfreq governor 2017-11-20 21:15:59 +05:30
sysctl_binary.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
task_work.c task_work: remove fifo ordering guarantee 2015-09-05 13:46:58 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Consolidate cond_resched_rcu_qs() into stutter_wait() 2015-10-06 11:25:01 -07:00
tracepoint.c tracepoint: Give priority to probes of tracepoints 2015-10-25 21:33:54 -04:00
tsacct.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:27:10 +01:00
up.c
user-return-notifier.c
user.c
user_namespace.c capabilities: ambient capabilities 2015-09-04 16:54:41 -07:00
utsname.c
utsname_sysctl.c
watchdog.c Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2017-01-09 12:01:35 +08:00
workqueue.c workqueue: Allow retrieval of current task's work struct 2018-03-18 11:17:48 +01:00
workqueue_internal.h workqueue: Fix NULL pointer dereference 2017-11-15 17:13:11 +01:00