linux-uconsole/kernel
Oleg Nesterov 641bc58deb introduce for_each_thread() to replace the buggy while_each_thread()
commit 0c740d0afc upstream.

while_each_thread() and next_thread() should die, almost every lockless
usage is wrong.

1. Unless g == current, the lockless while_each_thread() is not safe.

   while_each_thread(g, t) can loop forever if g exits, next_thread()
   can't reach the unhashed thread in this case. Note that this can
   happen even if g is the group leader, it can exec.

2. Even if while_each_thread() itself was correct, people often use
   it wrongly.

   It was never safe to just take rcu_read_lock() and loop unless
   you verify that pid_alive(g) == T, even the first next_thread()
   can point to the already freed/reused memory.

This patch adds signal_struct->thread_head and task->thread_node to
create the normal rcu-safe list with the stable head.  The new
for_each_thread(g, t) helper is always safe under rcu_read_lock() as
long as this task_struct can't go away.

Note: of course it is ugly to have both task_struct->thread_node and the
old task_struct->thread_group, we will kill it later, after we change
the users of while_each_thread() to use for_each_thread().

Perhaps we can kill it even before we convert all users, we can
reimplement next_thread(t) using the new thread_head/thread_node.  But
we can't do this right now because this will lead to subtle behavioural
changes.  For example, do/while_each_thread() always sees at least one
task, while for_each_thread() can do nothing if the whole thread group
has died.  Or thread_group_empty(), currently its semantics is not clear
unless thread_group_leader(p) and we need to audit the callers before we
can change it.

So this patch adds the new interface which has to coexist with the old
one for some time, hopefully the next changes will be more or less
straightforward and the old one will go away soon.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sergey Dyasly <dserrg@gmail.com>
Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05 14:54:15 -07:00
..
cpu sched, idle: Fix the idle polling state logic 2013-11-29 11:11:42 -08:00
debug kgdb/sysrq: fix inconstistent help message of sysrq key 2013-04-30 17:04:10 -07:00
events perf: Fix a race condition in perf_remove_from_context() 2014-10-05 14:54:14 -07:00
gcov
irq genirq: Sanitize spurious interrupt detection of threaded irqs 2014-06-30 20:09:45 -07:00
power PM / sleep: Use valid_state() for platform-dependent sleep states only 2014-10-05 14:54:15 -07:00
sched printk: rename printk_sched to printk_deferred 2014-08-07 14:30:26 -07:00
time alarmtimer: Lock k_itimer during timer callback 2014-10-05 14:54:14 -07:00
trace trace: Fix epoll hang when we race with new entries 2014-10-05 14:54:07 -07:00
.gitignore kernel/hz.bc: ignore. 2013-04-22 07:09:06 -07:00
acct.c fs: Fix hang with BSD accounting on frozen filesystem 2013-05-04 14:57:58 -04:00
async.c async: rename and redefine async_func_ptr 2013-03-12 13:59:14 -07:00
audit.c CAPABILITIES: remove undefined caps from all processes 2014-09-17 09:03:57 -07:00
audit.h audit: fix mq_open and mq_unlink to add the MQ root as a hidden parent audit_names record 2013-12-04 10:57:03 -08:00
audit_tree.c kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() 2013-06-12 16:29:46 -07:00
audit_watch.c
auditfilter.c auditfilter.c: fix kernel-doc warnings 2013-05-24 16:22:52 -07:00
auditsc.c auditsc: audit_krule mask accesses need bounds checking 2014-06-16 13:42:53 -07:00
backtracetest.c
bounds.c
capability.c CAPABILITIES: remove undefined caps from all processes 2014-09-17 09:03:57 -07:00
cgroup.c cgroup: use a dedicated workqueue for cgroup destruction 2013-12-04 10:57:20 -08:00
cgroup_freezer.c
compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-05-01 07:21:43 -07:00
configs.c proc: Supply PDE attribute setting accessor functions 2013-05-01 17:29:18 -04:00
context_tracking.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-06-20 08:18:35 -10:00
cpu.c sched: Fix hotplug vs. set_cpus_allowed_ptr() 2014-06-11 12:03:24 -07:00
cpu_pm.c
cpuset.c cpuset,mempolicy: fix sleeping function called from invalid context 2014-07-17 15:58:00 -07:00
crash_dump.c
cred.c
delayacct.c cputime: Use accessors to read task cputime stats 2013-01-27 19:23:31 +01:00
dma.c
elfcore.c
exec_domain.c
exit.c introduce for_each_thread() to replace the buggy while_each_thread() 2014-10-05 14:54:15 -07:00
extable.c extable: Flip the sorting message 2013-04-15 13:25:16 +02:00
fork.c introduce for_each_thread() to replace the buggy while_each_thread() 2014-10-05 14:54:15 -07:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2014-01-09 12:24:23 -08:00
futex.c futex: Make lookup_pi_state more robust 2014-06-07 13:25:41 -07:00
futex_compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-02-23 18:50:11 -08:00
groups.c
hrtimer.c hrtimer: Set expiry time before switch_hrtimer_base() 2014-06-07 13:25:31 -07:00
hung_task.c
irq_work.c Merge branch 'nohz/printk-v8' into irq/core 2013-02-05 00:48:46 +01:00
itimer.c
jump_label.c
kallsyms.c kernel: kallsyms: memory override issue, need check destination buffer length 2013-04-15 15:17:26 +09:30
kcmp.c kcmp: fix standard comparison bug 2014-10-05 14:54:13 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/mutex: Disable optimistic spinning on some architectures 2014-07-28 08:00:07 -07:00
Kconfig.preempt
kexec.c PCI: Disable Bus Master only on kexec reboot 2013-12-20 07:45:08 -08:00
kmod.c usermodehelper: check subprocess_info->path != NULL 2013-05-16 12:01:11 -07:00
kprobes.c kprobes: Fix to free gone and unused optprobes 2013-05-28 10:37:59 +02:00
ksysfs.c
kthread.c kthread: implement probe_kthread_data() 2013-04-30 17:04:02 -07:00
latencytop.c
lglock.c
lockdep.c Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-block 2013-05-08 11:51:05 -07:00
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
Makefile We get rid of the general module prefix confusion with a binary config option, 2013-05-05 10:58:06 -07:00
modsign_certificate.S CONFIG_SYMBOL_PREFIX: cleanup. 2013-03-15 15:09:43 +10:30
modsign_pubkey.c
module-internal.h
module.c ftrace/module: Hardcode ftrace_module_init() call into load_module() 2014-06-07 13:25:28 -07:00
module_signing.c
mutex-debug.c
mutex-debug.h
mutex.c mutex: Back out architecture specific check for negative mutex count 2013-04-19 09:33:36 +02:00
mutex.h
notifier.c
nsproxy.c proc: Split the namespace stuff out into linux/proc_ns.h 2013-05-01 17:29:39 -04:00
padata.c
panic.c dump_stack: implement arch-specific hardware description in task dumps 2013-04-30 17:04:02 -07:00
params.c params: Fix potential memory leak in add_sysfs_param() 2013-03-18 11:40:21 +00:00
pid.c pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup 2013-09-26 17:18:27 -07:00
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-26 17:15:34 -07:00
posix-cpu-timers.c posix_timers: Fix pre-condition to stop the tick on full dynticks 2013-04-22 19:59:25 +02:00
posix-timers.c posix-timers: Remove unused variable 2013-04-18 12:51:19 +02:00
printk.c printk: rename printk_sched to printk_deferred 2014-08-07 14:30:26 -07:00
profile.c proc: Supply PDE attribute setting accessor functions 2013-05-01 17:29:18 -04:00
ptrace.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-29 11:11:44 -08:00
range.c range: Do not add new blank slot with add_range_with_merge 2013-06-18 11:32:10 -05:00
rcu.h rcu: Provide RCU CPU stall warnings for tiny RCU 2013-01-28 22:06:21 -08:00
rcupdate.c Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD 2013-01-28 22:25:21 -08:00
rcutiny.c Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD 2013-01-28 22:25:21 -08:00
rcutiny_plugin.h rcu: Provide RCU CPU stall warnings for tiny RCU 2013-01-28 22:06:21 -08:00
rcutorture.c rcu: Allow rcutorture to be built at low optimization levels 2013-02-04 12:18:20 -08:00
rcutree.c rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration 2013-06-10 13:37:12 -07:00
rcutree.h rcu: Don't call wakeup() with rcu_node structure ->lock held 2013-06-10 13:37:11 -07:00
rcutree_plugin.h rcu: Don't allocate bootmem from rcu_init() 2013-05-15 10:41:12 -07:00
rcutree_trace.c rcutrace: single_open() leaks 2013-05-05 00:16:35 -04:00
relay.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
res_counter.c
resource.c mem hotunplug: fix kfree() of bootmem memory 2013-04-29 15:54:40 -07:00
rtmutex-debug.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
rtmutex-debug.h rtmutex: Handle deadlock detection smarter 2014-07-17 15:58:04 -07:00
rtmutex-tester.c locking/rtmutex/tester: Set correct permissions on sysfs files 2013-04-10 14:48:37 +02:00
rtmutex.c rtmutex: Plug slow unlock race 2014-07-17 15:58:04 -07:00
rtmutex.h rtmutex: Handle deadlock detection smarter 2014-07-17 15:58:04 -07:00
rtmutex_common.h
rwsem.c Revert "rw_semaphore: remove up/down_read_non_owner" 2013-03-23 15:53:52 -07:00
seccomp.c seccomp: allow BPF_XOR based ALU instructions. 2013-03-26 11:07:19 +11:00
semaphore.c semaphore: use `bool' type for semaphore_waiter's up 2013-04-30 17:04:08 -07:00
signal.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-05-01 14:08:52 -07:00
smp.c kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path 2014-09-17 09:03:57 -07:00
smpboot.c kthread: Prevent unpark race which puts threads on the wrong cpu 2013-04-12 14:18:43 +02:00
smpboot.h
softirq.c irq: Force hardirq exit's softirq processing on its own stack 2013-10-13 16:08:34 -07:00
spinlock.c
srcu.c srcu: use ACCESS_ONCE() to access sp->completed in srcu_read_lock() 2013-02-07 15:19:36 -08:00
stacktrace.c
stop_machine.c stop_machine: Mark per cpu stopper enabled early 2013-02-26 22:25:17 +01:00
sys.c reboot: rigrate shutdown/reboot to boot cpu 2013-06-12 16:29:44 -07:00
sys_ni.c unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE 2013-05-09 13:46:38 -04:00
sysctl.c perf: Enforce 1 as lower limit for perf_event_max_sample_rate 2014-06-11 12:03:27 -07:00
sysctl_binary.c switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE 2013-05-09 14:53:20 -04:00
task_work.c
taskstats.c
test_kprobes.c kernel/: rename random32() to prandom_u32() 2013-04-29 18:28:42 -07:00
time.c timekeeping: Use inject_offset in warp_clock 2013-03-15 16:50:20 -07:00
timeconst.bc kernel: Replace timeconst.pl with a bc script 2013-02-16 23:17:25 +01:00
timer.c timer: Prevent overflow in apply_slack 2014-06-07 13:25:30 -07:00
tracepoint.c tracepoint: Do not waste memory on mods with no tracepoints 2014-05-30 21:52:11 -07:00
tsacct.c cputime: Use accessors to read task cputime stats 2013-01-27 19:23:31 +01:00
uid16.c make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect 2013-03-03 22:58:33 -05:00
up.c
user-return-notifier.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
user.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
user_namespace.c user namespace: fix incorrect memory barriers 2014-04-26 17:15:34 -07:00
utsname.c proc: Split the namespace stuff out into linux/proc_ns.h 2013-05-01 17:29:39 -04:00
utsname_sysctl.c kernel/utsname_sysctl.c: put get/get_uts() into CONFIG_PROC_SYSCTL code block 2013-02-27 19:10:22 -08:00
wait.c
watchdog.c watchdog: Add comments to explain the watchdog_disabled variable 2013-03-14 08:24:05 +01:00
workqueue.c workqueue: zero cpumask of wq_numa_possible_cpumask on init 2014-07-17 15:58:00 -07:00
workqueue_internal.h workqueue: include workqueue info when printing debug dump of a worker task 2013-04-30 17:04:02 -07:00