linux-uconsole/kernel
Linus Torvalds 797479da0a signal: avoid double atomic counter increments for user accounting
[ Upstream commit fda31c5029 ]

When queueing a signal, we increment both the users count of pending
signals (for RLIMIT_SIGPENDING tracking) and we increment the refcount
of the user struct itself (because we keep a reference to the user in
the signal structure in order to correctly account for it when freeing).

That turns out to be fairly expensive, because both of them are atomic
updates, and particularly under extreme signal handling pressure on big
machines, you can get a lot of cache contention on the user struct.
That can then cause horrid cacheline ping-pong when you do these
multiple accesses.

So change the reference counting to only pin the user for the _first_
pending signal, and to unpin it when the last pending signal is
dequeued.  That means that when a user sees a lot of concurrent signal
queuing - which is the only situation when this matters - the only
atomic access needed is generally the 'sigpending' count update.

This was noticed because of a particularly odd timing artifact on a
dual-socket 96C/192T Cascade Lake platform: when you get into bad
contention, on that machine for some reason seems to be much worse when
the contention happens in the upper 32-byte half of the cacheline.

As a result, the kernel test robot will-it-scale 'signal1' benchmark had
an odd performance regression simply due to random alignment of the
'struct user_struct' (and pointed to a completely unrelated and
apparently nonsensical commit for the regression).

Avoiding the double increments (and decrements on the dequeueing side,
of course) makes for much less contention and hugely improved
performance on that will-it-scale microbenchmark.

Quoting Feng Tang:

 "It makes a big difference, that the performance score is tripled! bump
  from original 17000 to 54000. Also the gap between 5.0-rc6 and
  5.0-rc6+Jiri's patch is reduced to around 2%"

[ The "2% gap" is the odd cacheline placement difference on that
  platform: under the extreme contention case, the effect of which half
  of the cacheline was hot was 5%, so with the reduced contention the
  odd timing artifact is reduced too ]

It does help in the non-contended case too, but is not nearly as
noticeable.

Reported-and-tested-by: Feng Tang <feng.tang@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Philip Li <philip.li@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-20 11:55:53 +01:00
..
bpf bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill 2020-02-28 16:38:59 +01:00
cgroup cgroup: Iterate tasks that did not finish do_exit() 2020-03-18 07:14:19 +01:00
configs kconfig: tinyconfig: remove stale stack protector fixups 2018-06-15 07:15:28 +09:00
debug kdb: do a sanity check on the cpu in kdb_per_cpu() 2020-01-27 14:50:48 +01:00
dma dma-debug: add a schedule point in debug_dma_dump_mappings() 2020-01-04 19:12:43 +01:00
events perf/core: Fix mlock accounting in perf_mmap() 2020-02-11 04:34:19 -08:00
gcov gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT 2018-06-08 18:56:02 +09:00
irq genirq/proc: Reject invalid affinity masks (again) 2020-02-28 16:38:59 +01:00
livepatch livepatch: Nullify obj->mod in klp_module_coming()'s error path 2019-10-07 18:57:10 +02:00
locking locking/spinlock/debug: Fix various data races 2020-01-12 12:17:05 +01:00
power PM / hibernate: memory_bm_find_bit(): Tighten node optimisation 2020-01-09 10:18:58 +01:00
printk printk: fix exclusive_console replaying 2020-02-11 04:33:51 -08:00
rcu rcu: Avoid data-race in rcu_gp_fqs_check_wake() 2020-02-11 04:33:55 -08:00
sched sched/fair: Fix O(nr_cgroups) in the load balancing path 2020-03-05 16:42:21 +01:00
time clocksource: Prevent double add_timer_on() for watchdog_timer 2020-02-11 04:34:18 -08:00
trace tracing: Disable trace_printk() on post poned tests 2020-03-05 16:42:18 +01:00
.gitignore
acct.c acct_on(): don't mess with freeze protection 2019-05-31 06:46:05 -07:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-06 18:32:44 -08:00
audit.c audit: always check the netlink payload length in audit_receive_msg() 2020-03-05 16:42:23 +01:00
audit.h audit: track the owner of the command mutex ourselves 2018-02-23 11:22:22 -05:00
audit_fsnotify.c fsnotify: add fsnotify_add_inode_mark() wrappers 2018-05-18 14:58:22 +02:00
audit_tree.c audit: Embed key into chunk 2019-12-13 08:51:11 +01:00
audit_watch.c audit_get_nd(): don't unlock parent too early 2019-12-13 08:51:02 +01:00
auditfilter.c audit: fix error handling in audit_data_to_entry() 2020-03-05 16:42:17 +01:00
auditsc.c audit: print empty EXECVE args 2019-12-01 09:17:17 +01:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:08:47 -08:00
capability.c LSM: generalize flag passing to security_capable 2020-01-23 08:21:29 +01:00
compat.c time: Enable get/put_compat_itimerspec64 always 2018-06-24 14:39:47 +02:00
configs.c
context_tracking.c
cpu.c cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order 2020-02-24 08:34:35 +01:00
cpu_pm.c
crash_core.c kernel/crash_core.c: print timestamp using time64_t 2018-08-22 10:52:47 -07:00
crash_dump.c
cred.c memcg: account security cred as well to kmemcg 2020-01-09 10:19:00 +01:00
delayacct.c delayacct: Use raw_spinlocks 2018-04-27 14:34:51 +02:00
dma.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
elfcore.c kernel/elfcore.c: include proper prototypes 2019-10-11 18:21:23 +02:00
exec_domain.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
exit.c exit: panic before exit_mm() on global init exit 2020-01-09 10:19:02 +01:00
extable.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
fail_function.c bpf/error-inject/kprobes: Clear current_kprobe and enable preempt in kprobe 2018-06-21 12:33:19 +02:00
fork.c fork,memcg: alloc_thread_stack_node needs to set tsk->stack 2020-01-27 14:50:58 +01:00
freezer.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
futex.c futex: Prevent robust futex exit race 2019-12-01 09:17:38 +01:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-14 16:00:49 -08:00
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:16:02 +02:00
iomem.c memremap: split devm_memremap_pages() and memremap() infrastructure 2018-05-15 23:08:33 -07:00
irq_work.c irq_work: Do not raise an IPI when queueing work on the local CPU 2019-05-31 06:46:19 -07:00
jump_label.c jump_label: move 'asm goto' support test to Kconfig 2019-06-04 08:02:34 +02:00
kallsyms.c kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol 2019-09-21 07:17:02 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt kconfig: include kernel/Kconfig.preempt from init/Kconfig 2018-08-02 08:06:54 +09:00
kcov.c kernel/kcov.c: mark write_comp_data() as notrace 2019-02-12 19:47:20 +01:00
kexec.c kexec: add call to LSM hook in original kexec_load syscall 2018-07-16 12:31:57 -07:00
kexec_core.c kexec: Allocate decrypted control pages for kdump if SME is enabled 2019-11-24 08:20:29 +01:00
kexec_file.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
kexec_internal.h
kmod.c
kprobes.c kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic 2020-03-11 14:14:47 +01:00
ksysfs.c
kthread.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
latencytop.c
Makefile y2038: futex: Move compat implementation into futex.c 2019-12-01 09:17:38 +01:00
memremap.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
module-internal.h modsign: log module name in the event of an error 2018-07-02 11:36:17 +02:00
module.c module: avoid setting info->name early in case we can fall back to info->mod->name 2020-02-24 08:34:49 +01:00
module_signing.c modsign: log module name in the event of an error 2018-07-02 11:36:17 +02:00
notifier.c
nsproxy.c
padata.c padata: fix null pointer deref of pd->pinst 2020-02-14 16:33:28 -05:00
panic.c kernel/panic.c: do not append newline to the stack protector panic string 2019-12-01 09:17:10 +01:00
params.c kernel/params.c: downgrade warning for unsafe parameters 2018-04-11 10:28:37 -07:00
pid.c Fix failure path in alloc_pid() 2019-01-13 09:51:06 +01:00
pid_namespace.c signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig 2019-07-26 09:14:01 +02:00
profile.c
ptrace.c ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() 2020-01-23 08:21:29 +01:00
range.c
reboot.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
relay.c relay: check return of create_buf_file() properly 2019-03-13 14:02:35 -07:00
resource.c resource: fix locking in find_next_iomem_res() 2019-09-16 08:22:20 +02:00
rseq.c rseq: uapi: Declare rseq_cs field as union, update includes 2018-07-10 22:18:52 +02:00
seccomp.c LSM: generalize flag passing to security_capable 2020-01-23 08:21:29 +01:00
signal.c signal: avoid double atomic counter increments for user accounting 2020-03-20 11:55:53 +01:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-02-12 19:47:25 +01:00
smpboot.c smpboot: Remove cpumask from the API 2018-07-03 09:20:44 +02:00
smpboot.h
softirq.c nohz: Fix missing tick reprogram when interrupting an inline softirq 2018-08-03 15:52:10 +02:00
stacktrace.c
stop_machine.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
sys.c kernel/sys.c: prctl: fix false positive in validate_prctl_map() 2019-06-15 11:54:01 +02:00
sys_ni.c Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-10 10:17:09 -07:00
sysctl.c kernel: sysctl: make drop_caches write-only 2020-01-04 19:13:17 +01:00
sysctl_binary.c staging: irda: remove remaining remants of irda code removal 2018-04-16 11:26:49 +02:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-17 13:57:15 +01:00
taskstats.c taskstats: fix data-race 2020-01-09 10:18:59 +01:00
test_kprobes.c kprobes: Remove jprobe API implementation 2018-06-21 12:33:05 +02:00
torture.c torture: Keep old-school dmesg format 2018-06-25 11:30:10 -07:00
tracepoint.c tracepoint: Fix tracepoint array element size mismatch 2018-10-17 15:35:29 -04:00
tsacct.c
ucount.c headers: untangle kmemleak.h from mm.h 2018-04-05 21:36:27 -07:00
uid16.c fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers 2018-04-02 20:15:59 +02:00
uid16.h kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c 2018-04-02 20:15:30 +02:00
umh.c umh: fix race condition 2018-06-07 16:56:28 -04:00
up.c
user-return-notifier.c
user.c userns: use irqsave variant of refcount_dec_and_lock() 2018-08-22 10:52:47 -07:00
user_namespace.c userns: also map extents in the reverse map to kernel IDs 2018-11-13 11:09:00 -08:00
utsname.c uts: create "struct uts_namespace" from kmem_cache 2018-04-11 10:28:35 -07:00
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-08-11 02:05:53 -05:00
watchdog.c watchdog/softlockup: Enforce that timestamp is valid on boot 2020-02-24 08:34:49 +01:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-08-30 12:56:40 +02:00
workqueue.c workqueue: don't use wq_select_unbound_cpu() for bound works 2020-03-18 07:14:20 +01:00
workqueue_internal.h workqueue: Set worker->desc to workqueue name by default 2018-05-18 08:47:13 -07:00