linux-uconsole/kernel
Song Liu a7f4da875c bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()
[ Upstream commit eac9153f2b ]

bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
deadlock on rq_lock():

rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[...]
Call Trace:
 try_to_wake_up+0x1ad/0x590
 wake_up_q+0x54/0x80
 rwsem_wake+0x8a/0xb0
 bpf_get_stack+0x13c/0x150
 bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
 bpf_overflow_handler+0x60/0x100
 __perf_event_overflow+0x4f/0xf0
 perf_swevent_overflow+0x99/0xc0
 ___perf_sw_event+0xe7/0x120
 __schedule+0x47d/0x620
 schedule+0x29/0x90
 futex_wait_queue_me+0xb9/0x110
 futex_wait+0x139/0x230
 do_futex+0x2ac/0xa50
 __x64_sys_futex+0x13c/0x180
 do_syscall_64+0x42/0x100
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This can be reproduced by:
1. Start a multi-thread program that does parallel mmap() and malloc();
2. taskset the program to 2 CPUs;
3. Attach bpf program to trace_sched_switch and gather stackmap with
   build-id, e.g. with trace.py from bcc tools:
   trace.py -U -p <pid> -s <some-bin,some-lib> t:sched:sched_switch

A sample reproducer is attached at the end.

This could also trigger deadlock with other locks that are nested with
rq_lock.

Fix this by checking whether irqs are disabled. Since rq_lock and all
other nested locks are irq safe, it is safe to do up_read() when irqs are
not disable. If the irqs are disabled, postpone up_read() in irq_work.

Fixes: 615755a77b ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191014171223.357174-1-songliubraving@fb.com

Reproducer:
============================ 8< ============================

char *filename;

void *worker(void *p)
{
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd < 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&ts, NULL);
        }
        close(fd);
        return NULL;
}

int main(int argc, char *argv[])
{
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc < 2)
                return 0;

        filename = argv[1];

        for (i = 0; i < THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i < THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
}
============================ 8< ============================

Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31 16:35:20 +01:00
..
bpf bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack() 2019-12-31 16:35:20 +01:00
cgroup cgroup: pids: use atomic64_t for pids->limit 2019-12-17 20:34:56 +01:00
configs kconfig: tinyconfig: remove stale stack protector fixups 2018-06-15 07:15:28 +09:00
debug kdb: Don't back trace on a cpu that didn't round up 2019-02-12 19:47:19 +01:00
dma swiotlb: do not panic on mapping failures 2019-12-01 09:16:42 +01:00
events perf/core: Consistently fail fork on allocation failures 2019-12-13 08:51:04 +01:00
gcov gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT 2018-06-08 18:56:02 +09:00
irq irq/matrix: Fix memory overallocation 2019-12-01 09:17:13 +01:00
livepatch livepatch: Nullify obj->mod in klp_module_coming()'s error path 2019-10-07 18:57:10 +02:00
locking Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" 2019-10-11 18:21:06 +02:00
power x86/power: Fix 'nosmt' vs hibernation triple fault during resume 2019-06-11 12:20:52 +02:00
printk printk: fix integer overflow in setup_log_buf() 2019-12-01 09:16:14 +01:00
rcu rcuperf: Fix cleanup path for invalid perf_type strings 2019-05-31 06:46:30 -07:00
sched sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision 2019-12-13 08:52:35 +01:00
time y2038: make do_gettimeofday() and get_seconds() inline 2019-11-20 18:45:24 +01:00
trace tracing: Lock event_mutex before synth_event_mutex 2019-12-05 09:19:49 +01:00
.gitignore
acct.c acct_on(): don't mess with freeze protection 2019-05-31 06:46:05 -07:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-06 18:32:44 -08:00
audit.c audit: use ktime_get_coarse_real_ts64() for timestamps 2018-07-17 14:45:08 -04:00
audit.h audit: track the owner of the command mutex ourselves 2018-02-23 11:22:22 -05:00
audit_fsnotify.c fsnotify: add fsnotify_add_inode_mark() wrappers 2018-05-18 14:58:22 +02:00
audit_tree.c audit: Embed key into chunk 2019-12-13 08:51:11 +01:00
audit_watch.c audit_get_nd(): don't unlock parent too early 2019-12-13 08:51:02 +01:00
auditfilter.c audit: fix a memory leak bug 2019-05-31 06:46:17 -07:00
auditsc.c audit: print empty EXECVE args 2019-12-01 09:17:17 +01:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:08:47 -08:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c time: Enable get/put_compat_itimerspec64 always 2018-06-24 14:39:47 +02:00
configs.c
context_tracking.c
cpu.c cpu/SMT: State SMT is disabled even with nosmt and without "=force" 2019-11-24 08:20:18 +01:00
cpu_pm.c
crash_core.c kernel/crash_core.c: print timestamp using time64_t 2018-08-22 10:52:47 -07:00
crash_dump.c
cred.c access: avoid the RCU grace period for the temporary subjective credentials 2019-07-31 07:27:11 +02:00
delayacct.c delayacct: Use raw_spinlocks 2018-04-27 14:34:51 +02:00
dma.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
elfcore.c kernel/elfcore.c: include proper prototypes 2019-10-11 18:21:23 +02:00
exec_domain.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
exit.c cgroup: Call cgroup_release() before __exit_signal() 2019-08-09 17:52:34 +02:00
extable.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
fail_function.c bpf/error-inject/kprobes: Clear current_kprobe and enable preempt in kprobe 2018-06-21 12:33:19 +02:00
fork.c fork: fix some -Wmissing-prototypes warnings 2019-12-05 09:21:04 +01:00
freezer.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
futex.c futex: Prevent robust futex exit race 2019-12-01 09:17:38 +01:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-14 16:00:49 -08:00
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:16:02 +02:00
iomem.c memremap: split devm_memremap_pages() and memremap() infrastructure 2018-05-15 23:08:33 -07:00
irq_work.c irq_work: Do not raise an IPI when queueing work on the local CPU 2019-05-31 06:46:19 -07:00
jump_label.c jump_label: move 'asm goto' support test to Kconfig 2019-06-04 08:02:34 +02:00
kallsyms.c kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol 2019-09-21 07:17:02 +02:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt kconfig: include kernel/Kconfig.preempt from init/Kconfig 2018-08-02 08:06:54 +09:00
kcov.c kernel/kcov.c: mark write_comp_data() as notrace 2019-02-12 19:47:20 +01:00
kexec.c kexec: add call to LSM hook in original kexec_load syscall 2018-07-16 12:31:57 -07:00
kexec_core.c kexec: Allocate decrypted control pages for kdump if SME is enabled 2019-11-24 08:20:29 +01:00
kexec_file.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kmod.c
kprobes.c kprobes: Blacklist symbols in arch-defined prohibited area 2019-12-05 09:20:26 +01:00
ksysfs.c
kthread.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
latencytop.c
Makefile y2038: futex: Move compat implementation into futex.c 2019-12-01 09:17:38 +01:00
memremap.c mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support 2019-01-13 09:51:04 +01:00
module-internal.h modsign: log module name in the event of an error 2018-07-02 11:36:17 +02:00
module.c kernel/module.c: wakeup processes in module_wq on module unload 2019-12-17 20:35:57 +01:00
module_signing.c modsign: log module name in the event of an error 2018-07-02 11:36:17 +02:00
notifier.c
nsproxy.c
padata.c padata: use smp_mb in padata_reorder to avoid orphaned padata jobs 2019-07-26 09:14:25 +02:00
panic.c kernel/panic.c: do not append newline to the stack protector panic string 2019-12-01 09:17:10 +01:00
params.c kernel/params.c: downgrade warning for unsafe parameters 2018-04-11 10:28:37 -07:00
pid.c Fix failure path in alloc_pid() 2019-01-13 09:51:06 +01:00
pid_namespace.c signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig 2019-07-26 09:14:01 +02:00
profile.c
ptrace.c ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME 2019-07-10 09:53:41 +02:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
relay.c relay: check return of create_buf_file() properly 2019-03-13 14:02:35 -07:00
resource.c resource: fix locking in find_next_iomem_res() 2019-09-16 08:22:20 +02:00
rseq.c rseq: uapi: Declare rseq_cs field as union, update includes 2018-07-10 22:18:52 +02:00
seccomp.c audit/stable-4.18 PR 20180605 2018-06-06 16:34:00 -07:00
signal.c signal: Always ignore SIGKILL and SIGSTOP sent to the global init 2019-11-20 18:46:21 +01:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-02-12 19:47:25 +01:00
smpboot.c smpboot: Remove cpumask from the API 2018-07-03 09:20:44 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c nohz: Fix missing tick reprogram when interrupting an inline softirq 2018-08-03 15:52:10 +02:00
stacktrace.c
stop_machine.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
sys.c kernel/sys.c: prctl: fix false positive in validate_prctl_map() 2019-06-15 11:54:01 +02:00
sys_ni.c Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-10 10:17:09 -07:00
sysctl.c sysctl: return -EINVAL if val violates minmax 2019-06-15 11:53:59 +02:00
sysctl_binary.c staging: irda: remove remaining remants of irda code removal 2018-04-16 11:26:49 +02:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-17 13:57:15 +01:00
taskstats.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
test_kprobes.c kprobes: Remove jprobe API implementation 2018-06-21 12:33:05 +02:00
torture.c torture: Keep old-school dmesg format 2018-06-25 11:30:10 -07:00
tracepoint.c tracepoint: Fix tracepoint array element size mismatch 2018-10-17 15:35:29 -04:00
tsacct.c
ucount.c headers: untangle kmemleak.h from mm.h 2018-04-05 21:36:27 -07:00
uid16.c fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers 2018-04-02 20:15:59 +02:00
uid16.h kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c 2018-04-02 20:15:30 +02:00
umh.c umh: fix race condition 2018-06-07 16:56:28 -04:00
up.c
user-return-notifier.c
user.c userns: use irqsave variant of refcount_dec_and_lock() 2018-08-22 10:52:47 -07:00
user_namespace.c userns: also map extents in the reverse map to kernel IDs 2018-11-13 11:09:00 -08:00
utsname.c uts: create "struct uts_namespace" from kmem_cache 2018-04-11 10:28:35 -07:00
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-08-11 02:05:53 -05:00
watchdog.c watchdog: Respect watchdog cpumask on CPU hotplug 2019-04-03 06:26:29 +02:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-08-30 12:56:40 +02:00
workqueue.c workqueue: Fix missing kfree(rescuer) in destroy_workqueue() 2019-12-17 20:35:50 +01:00
workqueue_internal.h workqueue: Set worker->desc to workqueue name by default 2018-05-18 08:47:13 -07:00