linux-uconsole/kernel
Petr Mladek 2b35a4eaaa kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
commit 5fa54346ca upstream.

The system might hang with the following backtrace:

	schedule+0x80/0x100
	schedule_timeout+0x48/0x138
	wait_for_common+0xa4/0x134
	wait_for_completion+0x1c/0x2c
	kthread_flush_work+0x114/0x1cc
	kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
	kthread_cancel_delayed_work_sync+0x18/0x2c
	xxxx_pm_notify+0xb0/0xd8
	blocking_notifier_call_chain_robust+0x80/0x194
	pm_notifier_call_chain_robust+0x28/0x4c
	suspend_prepare+0x40/0x260
	enter_state+0x80/0x3f4
	pm_suspend+0x60/0xdc
	state_store+0x108/0x144
	kobj_attr_store+0x38/0x88
	sysfs_kf_write+0x64/0xc0
	kernfs_fop_write_iter+0x108/0x1d0
	vfs_write+0x2f4/0x368
	ksys_write+0x7c/0xec

It is caused by the following race between kthread_mod_delayed_work()
and kthread_cancel_delayed_work_sync():

CPU0				CPU1

Context: Thread A		Context: Thread B

kthread_mod_delayed_work()
  spin_lock()
  __kthread_cancel_work()
     spin_unlock()
     del_timer_sync()
				kthread_cancel_delayed_work_sync()
				  spin_lock()
				  __kthread_cancel_work()
				    spin_unlock()
				    del_timer_sync()
				    spin_lock()

				  work->canceling++
				  spin_unlock
     spin_lock()
   queue_delayed_work()
     // dwork is put into the worker->delayed_work_list

   spin_unlock()

				  kthread_flush_work()
     // flush_work is put at the tail of the dwork

				    wait_for_completion()

Context: IRQ

  kthread_delayed_work_timer_fn()
    spin_lock()
    list_del_init(&work->node);
    spin_unlock()

BANG: flush_work is not longer linked and will never get proceed.

The problem is that kthread_mod_delayed_work() checks work->canceling
flag before canceling the timer.

A simple solution is to (re)check work->canceling after
__kthread_cancel_work().  But then it is not clear what should be
returned when __kthread_cancel_work() removed the work from the queue
(list) and it can't queue it again with the new @delay.

The return value might be used for reference counting.  The caller has
to know whether a new work has been queued or an existing one was
replaced.

The proper solution is that kthread_mod_delayed_work() will remove the
work from the queue (list) _only_ when work->canceling is not set.  The
flag must be checked after the timer is stopped and the remaining
operations can be done under worker->lock.

Note that kthread_mod_delayed_work() could remove the timer and then
bail out.  It is fine.  The other canceling caller needs to cancel the
timer as well.  The important thing is that the queue (list)
manipulation is done atomically under worker->lock.

Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com
Fixes: 9a6b06c8d9 ("kthread: allow to modify delayed kthread work")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reported-by: Martin Liu <liumartin@google.com>
Cc: <jenhaochen@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:25 -04:00
..
bpf bpf: Do not mark insn as seen under speculative path verification 2021-06-23 14:42:49 +02:00
cgroup cgroup1: don't allow '\n' in renaming 2021-06-16 12:01:40 +02:00
configs
debug kgdb: fix to kill breakpoints on initmem after boot 2021-03-04 11:38:46 +01:00
dma swiotlb: Fix the type of index 2021-05-19 10:13:04 +02:00
entry x86/entry: Move nmi entry/exit into common code 2021-03-17 17:06:36 +01:00
events perf: Fix data race between pin_count increment/decrement 2021-06-16 12:01:45 +02:00
gcov gcov: re-fix clang-11+ support 2021-04-14 08:41:58 +02:00
irq genirq/matrix: Prevent allocation counter corruption 2021-05-11 14:47:17 +02:00
kcsan kcsan: Fix debugfs initcall return type 2021-05-26 12:06:54 +02:00
livepatch kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
locking locking/lockdep: Improve noinstr vs errors 2021-06-30 08:47:18 -04:00
power PM: EM: postpone creating the debugfs dir till fs_initcall 2021-03-30 14:32:04 +02:00
printk printk: fix deadlock when kernel panic 2021-03-04 11:38:41 +01:00
rcu rcu: Remove spurious instrumentation_end() in rcu_nmi_enter() 2021-05-14 09:50:22 +02:00
sched sched/pelt: Ensure that *_sum is always synced with *_avg 2021-06-23 14:42:48 +02:00
time posix-timers: Preserve return value in clock_adjtime32() 2021-05-11 14:47:16 +02:00
trace tracing: Do no increment trace_clock_global() by one 2021-06-23 14:42:50 +02:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
acct.c kernel: acct.c: fix some kernel-doc nits 2020-10-16 11:11:19 -07:00
async.c
audit.c audit: Remove redundant null check 2020-08-26 09:10:39 -04:00
audit.h audit: change unnecessary globals into statics 2020-08-17 20:26:58 -04:00
audit_fsnotify.c fsnotify: generalize handle_inode_event() 2020-12-30 11:54:18 +01:00
audit_tree.c fsnotify: generalize handle_inode_event() 2020-12-30 11:54:18 +01:00
audit_watch.c fsnotify: generalize handle_inode_event() 2020-12-30 11:54:18 +01:00
auditfilter.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
auditsc.c audit/stable-5.9 PR 20200803 2020-08-04 14:20:26 -07:00
backtracetest.c
bounds.c
capability.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
compat.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
configs.c
context_tracking.c
cpu.c kernel/cpu: add arch override for clear_tasks_mm_cpumask() mm handling 2020-11-27 00:10:39 +11:00
cpu_pm.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
crash_core.c crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo 2021-06-23 14:42:52 +02:00
crash_dump.c
cred.c
delayacct.c
dma.c
exec_domain.c
exit.c kernel/io_uring: cancel io_uring before task works 2021-01-30 13:55:18 +01:00
extable.c
fail_function.c fail_function: Remove a redundant mutex unlock 2020-11-19 11:58:16 -08:00
fork.c mm/fork: clear PASID for new mm 2021-03-30 14:31:52 +02:00
freezer.c Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" 2021-04-07 15:00:14 +02:00
futex.c futex: Do not apply time namespace adjustment on FUTEX_LOCK_PI 2021-05-11 14:47:37 +02:00
gen_kheaders.sh
groups.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c
jump_label.c static_call: Fix static_call_update() sanity check 2021-03-25 09:04:18 +01:00
kallsyms.c treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
kcmp.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: make some symbols static 2020-08-12 10:58:02 -07:00
kexec.c LSM: Introduce kernel_post_load_data() hook 2020-10-05 13:37:03 +02:00
kexec_core.c kernel: kexec: remove the lock operation of system_transition_mutex 2021-02-03 23:28:37 +01:00
kexec_elf.c
kexec_file.c kernel: kexec_file: fix error return code of kexec_calculate_store_digests() 2021-05-19 10:13:09 +02:00
kexec_internal.h
kheaders.c
kmod.c kmod: remove redundant "be an" in the comment 2020-08-12 10:58:01 -07:00
kprobes.c kprobes: Fix to delay the kprobes jump optimization 2021-03-04 11:38:35 +01:00
ksysfs.c
kthread.c kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() 2021-06-30 08:47:25 -04:00
latencytop.c
Makefile kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
module-internal.h
module.c module: limit enabling module.sig_enforce 2021-06-30 08:47:15 -04:00
module_signature.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
module_signing.c module: harden ELF info handling 2021-03-25 09:04:11 +01:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c
padata.c padata: fix possible padata_works_lock deadlock 2020-09-04 17:51:55 +10:00
panic.c panic: don't dump stack twice on warn 2020-11-14 11:26:04 -08:00
params.c params: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
pid.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
pid_namespace.c kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
profile.c
ptrace.c ptrace: make ptrace() fail if the tracee changed its pid unexpectedly 2021-05-26 12:06:49 +02:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-14 11:26:03 -08:00
regset.c
relay.c kernel/relay.c: drop unneeded initialization 2020-10-16 11:11:22 -07:00
resource.c kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources 2021-05-19 10:13:09 +02:00
rseq.c
scftorture.c scftorture: Add cond_resched() to test loop 2020-08-24 18:38:38 -07:00
scs.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
seccomp.c seccomp: Refactor notification handler to prepare for new semantics 2021-06-03 09:00:31 +02:00
signal.c ptrace: fix task_join_group_stop() for the case when current is traced 2020-11-02 12:14:19 -08:00
smp.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:50:46 +02:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-02-07 15:37:17 +01:00
smpboot.h
softirq.c softirq: Add debug check to __raise_softirq_irqoff() 2020-09-16 15:18:56 +02:00
stackleak.c stackleak: let stack_erasing_sysctl take a kernel pointer buffer 2020-09-19 13:13:39 -07:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Align static_call_is_init() patching condition 2021-04-07 15:00:06 +02:00
stop_machine.c stop_machine, rcu: Mark functions as notrace 2020-10-26 12:12:27 +01:00
sys.c kernel/sys.c: fix prototype of prctl_get_tid_address() 2020-10-25 11:44:16 -07:00
sys_ni.c mm/madvise: introduce process_madvise() syscall: an external memory hinting API 2020-10-18 09:27:10 -07:00
sysctl-test.c
sysctl.c sysctl.c: fix underflow value setting risk in vm_table 2021-03-17 17:06:25 +01:00
task_work.c task_work: cleanup notification modes 2020-10-17 15:05:30 -06:00
taskstats.c taskstats: move specifying netlink policy back to ops 2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c
tracepoint.c tracepoint: Do not fail unregistering a probe due to memory failure 2021-03-04 11:38:03 +01:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:50:46 +02:00
user-return-notifier.c
user.c
user_namespace.c capabilities: require CAP_SETFCAP to map uid 0 2021-05-07 11:04:31 +02:00
usermode_driver.c bpf: Fix umd memory leak in copy_process() 2021-03-30 14:32:03 +02:00
utsname.c
utsname_sysctl.c
watch_queue.c watch_queue: Limit the number of watches a user can hold 2020-08-17 09:39:18 -07:00
watchdog.c watchdog: fix barriers when printing backtraces from all CPUs 2021-05-19 10:13:00 +02:00
watchdog_hld.c
workqueue.c wq: handle VM suspension in stall detection 2021-06-16 12:01:36 +02:00
workqueue_internal.h