Commit graph

28,854 commits

Author SHA1 Message Date
Quentin Perret
8e9d3d0740 ANDROID: sched/fair: Cap transient util in stune
boosted_cpu_util() sums the CFS and RT util signals before they are used
for frequency selection. While the util_avg signals are in sync,
util_est prevents cpu_util_cfs() from decreasing when a CFS task is
preempted by RT.

This util_est behaviour is beneficial in many scenarios, but it can
cause the sum of rt_util and cfs_util to go above SCHED_CAPACITY_SCALE.
Although benign, this transient value appears in the stune tracepoints,
and can cause confusion.

Work around the problem by capping the util sum to SCHED_CAPACITY_SCALE.

Bug: 120440300
Change-Id: I2be6eb157af86024e52ae11715f5637c77b201a3
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2019-08-16 14:46:14 +01:00
Greg Kroah-Hartman
b1e96f1650 This is the 4.19.67 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1WZYYACgkQONu9yGCS
 aT5VjRAApdD6wuKcKhZ8j010Ni18w6W+3qs6IuIXv94eav0zFSRaO9Zp93lZq2p0
 h+k+ssZ+P8a4EuDquzDydlagno9hojHFAYr+9loPZlZUw578Jzg9JbJK9Z1MyQCo
 BCRElzZG67E+WjLP0wGHnS0oVhIoHlJaVWP3pEYkTJILY65ErLT/fYGs64YUAEKr
 Ct1pKoIHPEC0606IKx12kmV645ME4z6pI7g4kLDhk2BozglbxGlwdHgVuIe/NzDP
 PraR1gqMoOD2skjK673ozsZ65yuiVeqSTsbs49Xao1lAS6etUMbC/ACU/yrhL48H
 mMM/EFTSKb5TjJSxQAXU1ANQrm4X6n1yPkNW/MdthnPAotDY3Nda4NNVE9X2toM7
 XW0HfFdcVD7aJtpC/h6ckndGTaOGkHSPjhJtSlBEjF76BA+KhZ9hhcjNWng92bWL
 d5Nws4b82wvgM6T99mkZfbMc2MOopPMf+I94W0JcMa49+rXhyhJdrC72GpxKLdSq
 +XtZJupFWg0RrPlZfmc4Az8f/uY0UfR9gNSaHJokaZAkMzH2x4MzMnPxwRiXAw4W
 qz1s+sgZlqUQcWvODzaNvG1l7QtjD5rbdJ+FAjN2+16F8rep52Yl/IQffYr04DDD
 wikYmcUoMh8hCoj6Atj2LAAU9ulhl6ja8s0YpmHz/HQETufHAZc=
 =gOG+
 -----END PGP SIGNATURE-----

Merge 4.19.67 into android-4.19

Changes in 4.19.67
	iio: cros_ec_accel_legacy: Fix incorrect channel setting
	iio: adc: max9611: Fix misuse of GENMASK macro
	staging: gasket: apex: fix copy-paste typo
	staging: android: ion: Bail out upon SIGKILL when allocating memory.
	crypto: ccp - Fix oops by properly managing allocated structures
	crypto: ccp - Add support for valid authsize values less than 16
	crypto: ccp - Ignore tag length when decrypting GCM ciphertext
	usb: usbfs: fix double-free of usb memory upon submiturb error
	usb: iowarrior: fix deadlock on disconnect
	sound: fix a memory leak bug
	mmc: cavium: Set the correct dma max segment size for mmc_host
	mmc: cavium: Add the missing dma unmap when the dma has finished.
	loop: set PF_MEMALLOC_NOIO for the worker thread
	Input: usbtouchscreen - initialize PM mutex before using it
	Input: elantech - enable SMBus on new (2018+) systems
	Input: synaptics - enable RMI mode for HP Spectre X360
	x86/mm: Check for pfn instead of page in vmalloc_sync_one()
	x86/mm: Sync also unmappings in vmalloc_sync_all()
	mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()
	perf annotate: Fix s390 gap between kernel end and module start
	perf db-export: Fix thread__exec_comm()
	perf record: Fix module size on s390
	x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS
	gfs2: gfs2_walk_metadata fix
	usb: host: xhci-rcar: Fix timeout in xhci_suspend()
	usb: yurex: Fix use-after-free in yurex_delete
	usb: typec: tcpm: free log buf memory when remove debug file
	usb: typec: tcpm: remove tcpm dir if no children
	usb: typec: tcpm: Add NULL check before dereferencing config
	usb: typec: tcpm: Ignore unsupported/unknown alternate mode requests
	can: rcar_canfd: fix possible IRQ storm on high load
	can: peak_usb: fix potential double kfree_skb()
	netfilter: nfnetlink: avoid deadlock due to synchronous request_module
	vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn
	netfilter: Fix rpfilter dropping vrf packets by mistake
	netfilter: conntrack: always store window size un-scaled
	netfilter: nft_hash: fix symhash with modulus one
	scripts/sphinx-pre-install: fix script for RHEL/CentOS
	drm/amd/display: Wait for backlight programming completion in set backlight level
	drm/amd/display: use encoder's engine id to find matched free audio device
	drm/amd/display: Fix dc_create failure handling and 666 color depths
	drm/amd/display: Only enable audio if speaker allocation exists
	drm/amd/display: Increase size of audios array
	iscsi_ibft: make ISCSI_IBFT dependson ACPI instead of ISCSI_IBFT_FIND
	nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN
	mac80211: don't warn about CW params when not using them
	allocate_flower_entry: should check for null deref
	hwmon: (nct6775) Fix register address and added missed tolerance for nct6106
	drm: silence variable 'conn' set but not used
	cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
	s390/qdio: add sanity checks to the fast-requeue path
	ALSA: compress: Fix regression on compressed capture streams
	ALSA: compress: Prevent bypasses of set_params
	ALSA: compress: Don't allow paritial drain operations on capture streams
	ALSA: compress: Be more restrictive about when a drain is allowed
	perf tools: Fix proper buffer size for feature processing
	perf probe: Avoid calling freeing routine multiple times for same pointer
	drbd: dynamically allocate shash descriptor
	ACPI/IORT: Fix off-by-one check in iort_dev_find_its_id()
	nvme: fix multipath crash when ANA is deactivated
	ARM: davinci: fix sleep.S build error on ARMv4
	ARM: dts: bcm: bcm47094: add missing #cells for mdio-bus-mux
	scsi: megaraid_sas: fix panic on loading firmware crashdump
	scsi: ibmvfc: fix WARN_ON during event pool release
	scsi: scsi_dh_alua: always use a 2 second delay before retrying RTPG
	test_firmware: fix a memory leak bug
	tty/ldsem, locking/rwsem: Add missing ACQUIRE to read_failed sleep loop
	perf/core: Fix creating kernel counters for PMUs that override event->cpu
	s390/dma: provide proper ARCH_ZONE_DMA_BITS value
	HID: sony: Fix race condition between rumble and device remove.
	x86/purgatory: Do not use __builtin_memcpy and __builtin_memset
	ALSA: usb-audio: fix a memory leak bug
	can: peak_usb: pcan_usb_pro: Fix info-leaks to USB devices
	can: peak_usb: pcan_usb_fd: Fix info-leaks to USB devices
	hwmon: (nct7802) Fix wrong detection of in4 presence
	drm/i915: Fix wrong escape clock divisor init for GLK
	ALSA: firewire: fix a memory leak bug
	ALSA: hiface: fix multiple memory leak bugs
	ALSA: hda - Don't override global PCM hw info flag
	ALSA: hda - Workaround for crackled sound on AMD controller (1022:1457)
	mac80211: don't WARN on short WMM parameters from AP
	dax: dax_layout_busy_page() should not unmap cow pages
	SMB3: Fix deadlock in validate negotiate hits reconnect
	smb3: send CAP_DFS capability during session setup
	NFSv4: Fix an Oops in nfs4_do_setattr
	KVM: Fix leak vCPU's VMCS value into other pCPU
	mwifiex: fix 802.11n/WPA detection
	iwlwifi: don't unmap as page memory that was mapped as single
	iwlwifi: mvm: fix an out-of-bound access
	iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT on version < 41
	iwlwifi: mvm: fix version check for GEO_TX_POWER_LIMIT support
	Linux 4.19.67

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5ea813ed5ba6d1eeda51eb4031395ee3e8ba54c3
2019-08-16 11:27:10 +02:00
Leonard Crestez
d768173982 perf/core: Fix creating kernel counters for PMUs that override event->cpu
[ Upstream commit 4ce54af8b3 ]

Some hardware PMU drivers will override perf_event.cpu inside their
event_init callback. This causes a lockdep splat when initialized through
the kernel API:

 WARNING: CPU: 0 PID: 250 at kernel/events/core.c:2917 ctx_sched_out+0x78/0x208
 pc : ctx_sched_out+0x78/0x208
 Call trace:
  ctx_sched_out+0x78/0x208
  __perf_install_in_context+0x160/0x248
  remote_function+0x58/0x68
  generic_exec_single+0x100/0x180
  smp_call_function_single+0x174/0x1b8
  perf_install_in_context+0x178/0x188
  perf_event_create_kernel_counter+0x118/0x160

Fix this by calling perf_install_in_context with event->cpu, just like
perf_event_open

Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Frank Li <Frank.li@nxp.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/c4ebe0503623066896d7046def4d6b1e06e0eb2e.1563972056.git.leonard.crestez@nxp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-16 10:12:50 +02:00
Suren Baghdasaryan
0c4addb718 UPSTREAM: pidfd: fix a poll race when setting exit_state
There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:

CPU 0                            CPU 1
------------------------------------------------
exit_notify
  do_notify_parent
    do_notify_pidfd
  tsk->exit_state = EXIT_DEAD
                                  pidfd_poll
                                     if (tsk->exit_state)

However nothing prevents the following sequence:

CPU 0                            CPU 1
------------------------------------------------
exit_notify
  do_notify_parent
    do_notify_pidfd
                                   pidfd_poll
                                      if (tsk->exit_state)
  tsk->exit_state = EXIT_DEAD

This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.

To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.

Fixes: b53b0b9d9a ("pidfd: add polling support")
Cc: kernel-team@android.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
[christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
Signed-off-by: Christian Brauner <christian@brauner.io>

(cherry picked from commit b191d6491b)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: Ife81348ae3c3b6f5f8caac0c2f9bacd656582b31
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Christian Brauner
abd43bb345 UPSTREAM: pid: add pidfd_open()
This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:

int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);

With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a problem for
Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfds for PID-based processes we enable them to adopt this api.

In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.

Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org

(cherry picked from commit 32fcb426ec)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I97583cfa441a08585cbedf9a44f982c0d0ee5583
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Joel Fernandes (Google)
b3481301f4 UPSTREAM: pidfd: add polling support
This patch adds polling support to pidfd.

Android low memory killer (LMK) needs to know when a process dies once
it is sent the kill signal. It does so by checking for the existence of
/proc/pid which is both racy and slow. For example, if a PID is reused
between when LMK sends a kill signal and checks for existence of the
PID, since the wrong PID is now possibly checked for existence.
Using the polling support, LMK will be able to get notified when a process
exists in race-free and fast way, and allows the LMK to do other things
(such as by polling on other fds) while awaiting the process being killed
to die.

For notification to polling processes, we follow the same existing
mechanism in the kernel used when the parent of the task group is to be
notified of a child's death (do_notify_parent). This is precisely when the
tasks waiting on a poll of pidfd are also awakened in this patch.

We have decided to include the waitqueue in struct pid for the following
reasons:
1. The wait queue has to survive for the lifetime of the poll. Including
   it in task_struct would not be option in this case because the task can
   be reaped and destroyed before the poll returns.

2. By including the struct pid for the waitqueue means that during
   de_thread(), the new thread group leader automatically gets the new
   waitqueue/pid even though its task_struct is different.

Appropriate test cases are added in the second patch to provide coverage of
all the cases the patch is handling.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Jonathan Kowalski <bl0pbl33p@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: kernel-team@android.com
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Co-developed-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Christian Brauner <christian@brauner.io>

(cherry picked from commit b53b0b9d9a)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I0391acb666b34e33ef60a778dffdf12ed3654393
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Christian Brauner
2471d6f992 UPSTREAM: signal: improve comments
Improve the comments for pidfd_send_signal().
First, the comment still referred to a file descriptor for a process as a
"task file descriptor" which stems from way back at the beginning of the
discussion. Replace this with "pidfd" for consistency.
Second, the wording for the explanation of the arguments to the syscall
was a bit inconsistent, e.g. some used the past tense some used present
tense. Make the wording more consistent.

Signed-off-by: Christian Brauner <christian@brauner.io>

(cherry picked from commit c732327f04)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: Iaa289e1bc2f96366102e51241accbb97099553d4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Christian Brauner
5013eebdf4 UPSTREAM: fork: do not release lock that wasn't taken
Avoid calling cgroup_threadgroup_change_end() without having called
cgroup_threadgroup_change_begin() first.

During process creation we need to check whether the cgroup we are in
allows us to fork. To perform this check the cgroup needs to guard itself
against threadgroup changes and takes a lock.
Prior to CLONE_PIDFD the cleanup target "bad_fork_free_pid" would also need
to call cgroup_threadgroup_change_end() because said lock had already been
taken.
However, this is not the case anymore with the addition of CLONE_PIDFD. We
are now allocating a pidfd before we check whether the cgroup we're in can
fork and thus prior to taking the lock. So when copy_process() fails at the
right step it would release a lock we haven't taken.
This bug is not even very subtle to be honest. It's just not very clear
from the naming of cgroup_threadgroup_change_{begin,end}() that a lock is
taken.

Here's the relevant splat:

entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fec849
Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90
90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078
RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000
RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(depth <= 0)
WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052 __lock_release
kernel/locking/lockdep.c:4052 [inline]
WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052
lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 7744 Comm: syz-executor007 Not tainted 5.1.0+ #4
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  panic+0x2cb/0x65c kernel/panic.c:214
  __warn.cold+0x20/0x45 kernel/panic.c:566
  report_bug+0x263/0x2b0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:179 [inline]
  fixup_bug arch/x86/kernel/traps.c:174 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
  do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:972
RIP: 0010:__lock_release kernel/locking/lockdep.c:4052 [inline]
RIP: 0010:lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321
Code: 0f 85 a0 03 00 00 8b 35 77 66 08 08 85 f6 75 23 48 c7 c6 a0 55 6b 87
48 c7 c7 40 25 6b 87 4c 89 85 70 ff ff ff e8 b7 a9 eb ff <0f> 0b 4c 8b 85
70 ff ff ff 4c 89 ea 4c 89 e6 4c 89 c7 e8 52 63 ff
RSP: 0018:ffff888094117b48 EFLAGS: 00010086
RAX: 0000000000000000 RBX: 1ffff11012822f6f RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815af236 RDI: ffffed1012822f5b
RBP: ffff888094117c00 R08: ffff888092bfc400 R09: fffffbfff113301d
R10: fffffbfff113301c R11: ffffffff889980e3 R12: ffffffff8a451df8
R13: ffffffff8142e71f R14: ffffffff8a44cc80 R15: ffff888094117bd8
  percpu_up_read.constprop.0+0xcb/0x110 include/linux/percpu-rwsem.h:92
  cgroup_threadgroup_change_end include/linux/cgroup-defs.h:712 [inline]
  copy_process.part.0+0x47ff/0x6710 kernel/fork.c:2222
  copy_process kernel/fork.c:1772 [inline]
  _do_fork+0x25d/0xfd0 kernel/fork.c:2338
  __do_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:240 [inline]
  __se_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:236 [inline]
  __ia32_compat_sys_x86_clone+0xbc/0x140 arch/x86/ia32/sys_ia32.c:236
  do_syscall_32_irqs_on arch/x86/entry/common.c:334 [inline]
  do_fast_syscall_32+0x281/0xd54 arch/x86/entry/common.c:405
  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fec849
Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90
90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078
RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000
RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..

Reported-and-tested-by: syzbot+3286e58549edc479faae@syzkaller.appspotmail.com
Fixes: b3e5838252 ("clone: add CLONE_PIDFD")
Signed-off-by: Christian Brauner <christian@brauner.io>

(cherry picked from commit c3b7112df8)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I3878b8796347676a4d013180c213e5d4830ee30d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Christian Brauner
5eecbc68ed UPSTREAM: signal: support CLONE_PIDFD with pidfd_send_signal
Let pidfd_send_signal() use pidfds retrieved via CLONE_PIDFD.  With this
patch pidfd_send_signal() becomes independent of procfs.  This fullfils
the request made when we merged the pidfd_send_signal() patchset.  The
pidfd_send_signal() syscall is now always available allowing for it to
be used by users without procfs mounted or even users without procfs
support compiled into the kernel.

Signed-off-by: Christian Brauner <christian@brauner.io>
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>

(cherry picked from commit 2151ad1b06)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: If6ee59279f41e0ae3c6d9e24f7b5481f23aab469
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Christian Brauner
66faab946a UPSTREAM: clone: add CLONE_PIDFD
This patchset makes it possible to retrieve pid file descriptors at
process creation time by introducing the new flag CLONE_PIDFD to the
clone() system call.  Linus originally suggested to implement this as a
new flag to clone() instead of making it a separate system call.  As
spotted by Linus, there is exactly one bit for clone() left.

CLONE_PIDFD creates file descriptors based on the anonymous inode
implementation in the kernel that will also be used to implement the new
mount api.  They serve as a simple opaque handle on pids.  Logically,
this makes it possible to interpret a pidfd differently, narrowing or
widening the scope of various operations (e.g. signal sending).  Thus, a
pidfd cannot just refer to a tgid, but also a tid, or in theory - given
appropriate flag arguments in relevant syscalls - a process group or
session. A pidfd does not represent a privilege.  This does not imply it
cannot ever be that way but for now this is not the case.

A pidfd comes with additional information in fdinfo if the kernel supports
procfs.  The fdinfo file contains the pid of the process in the callers
pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d".

As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the
parent_tidptr argument of clone.  This has the advantage that we can
give back the associated pid and the pidfd at the same time.

To remove worries about missing metadata access this patchset comes with
a sample program that illustrates how a combination of CLONE_PIDFD, and
pidfd_send_signal() can be used to gain race-free access to process
metadata through /proc/<pid>.  The sample program can easily be
translated into a helper that would be suitable for inclusion in libc so
that users don't have to worry about writing it themselves.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>

(cherry picked from commit b3e5838252)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I8a8f87e8fb23de0adb6d6acf2e622926b7a9f55c
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Christian Brauner
c46b05132a UPSTREAM: signal: use fdget() since we don't allow O_PATH
As stated in the original commit for pidfd_send_signal() we don't allow
to signal processes through O_PATH file descriptors since it is
semantically equivalent to a write on the pidfd.

We already correctly error out right now and return EBADF if an O_PATH
fd is passed.  This is because we use file->f_op to detect whether a
pidfd is passed and O_PATH fds have their file->f_op set to empty_fops
in do_dentry_open() and thus fail the test.

Thus, there is no regression.  It's just semantically correct to use
fdget() and return an error right from there instead of taking a
reference and returning an error later.

Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jann@thejh.net>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 738a7832d2)

Bug: 135608568
Test: test program using syscall(__NR_pidfd_send_signal,..) to send SIGKILL
Change-Id: Ib102922f9793e8610940d34ad5fb1256d4b07476
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Jann Horn
74c14d6081 UPSTREAM: signal: don't silently convert SI_USER signals to non-current pidfd
The current sys_pidfd_send_signal() silently turns signals with explicit
SI_USER context that are sent to non-current tasks into signals with
kernel-generated siginfo.
This is unlike do_rt_sigqueueinfo(), which returns -EPERM in this case.
If a user actually wants to send a signal with kernel-provided siginfo,
they can do that with pidfd_send_signal(pidfd, sig, NULL, 0); so allowing
this case is unnecessary.

Instead of silently replacing the siginfo, just bail out with an error;
this is consistent with other interfaces and avoids special-casing behavior
based on security checks.

Fixes: 3eb39f4793 ("signal: add pidfd_send_signal() syscall")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Christian Brauner <christian@brauner.io>

(cherry picked from commit 556a888a14)

Bug: 135608568
Test: test program using syscall(__NR_pidfd_send_signal,..) to send SIGKILL
Change-Id: I004452c19c50296730a2c6852a5ef47abd69d819
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Christian Brauner
1f27ef8d9b BACKPORT: signal: add pidfd_send_signal() syscall
The kill() syscall operates on process identifiers (pid). After a process
has exited its pid can be reused by another process. If a caller sends a
signal to a reused pid it will end up signaling the wrong process. This
issue has often surfaced and there has been a push to address this problem [1].

This patch uses file descriptors (fd) from proc/<pid> as stable handles on
struct pid. Even if a pid is recycled the handle will not change. The fd
can be used to send signals to the process it refers to.
Thus, the new syscall pidfd_send_signal() is introduced to solve this
problem. Instead of pids it operates on process fds (pidfd).

/* prototype and argument /*
long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags);

/* syscall number 424 */
The syscall number was chosen to be 424 to align with Arnd's rework in his
y2038 to minimize merge conflicts (cf. [25]).

In addition to the pidfd and signal argument it takes an additional
siginfo_t and flags argument. If the siginfo_t argument is NULL then
pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it
is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo().
The flags argument is added to allow for future extensions of this syscall.
It currently needs to be passed as 0. Failing to do so will cause EINVAL.

/* pidfd_send_signal() replaces multiple pid-based syscalls */
The pidfd_send_signal() syscall currently takes on the job of
rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a
positive pid is passed to kill(2). It will however be possible to also
replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended.

/* sending signals to threads (tid) and process groups (pgid) */
Specifically, the pidfd_send_signal() syscall does currently not operate on
process groups or threads. This is left for future extensions.
In order to extend the syscall to allow sending signal to threads and
process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and
PIDFD_TYPE_TID) should be added. This implies that the flags argument will
determine what is signaled and not the file descriptor itself. Put in other
words, grouping in this api is a property of the flags argument not a
property of the file descriptor (cf. [13]). Clarification for this has been
requested by Eric (cf. [19]).
When appropriate extensions through the flags argument are added then
pidfd_send_signal() can additionally replace the part of kill(2) which
operates on process groups as well as the tgkill(2) and
rt_tgsigqueueinfo(2) syscalls.
How such an extension could be implemented has been very roughly sketched
in [14], [15], and [16]. However, this should not be taken as a commitment
to a particular implementation. There might be better ways to do it.
Right now this is intentionally left out to keep this patchset as simple as
possible (cf. [4]).

/* naming */
The syscall had various names throughout iterations of this patchset:
- procfd_signal()
- procfd_send_signal()
- taskfd_send_signal()
In the last round of reviews it was pointed out that given that if the
flags argument decides the scope of the signal instead of different types
of fds it might make sense to either settle for "procfd_" or "pidfd_" as
prefix. The community was willing to accept either (cf. [17] and [18]).
Given that one developer expressed strong preference for the "pidfd_"
prefix (cf. [13]) and with other developers less opinionated about the name
we should settle for "pidfd_" to avoid further bikeshedding.

The  "_send_signal" suffix was chosen to reflect the fact that the syscall
takes on the job of multiple syscalls. It is therefore intentional that the
name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the
fomer because it might imply that pidfd_send_signal() is a replacement for
kill(2), and not the latter because it is a hassle to remember the correct
spelling - especially for non-native speakers - and because it is not
descriptive enough of what the syscall actually does. The name
"pidfd_send_signal" makes it very clear that its job is to send signals.

/* zombies */
Zombies can be signaled just as any other process. No special error will be
reported since a zombie state is an unreliable state (cf. [3]). However,
this can be added as an extension through the @flags argument if the need
ever arises.

/* cross-namespace signals */
The patch currently enforces that the signaler and signalee either are in
the same pid namespace or that the signaler's pid namespace is an ancestor
of the signalee's pid namespace. This is done for the sake of simplicity
and because it is unclear to what values certain members of struct
siginfo_t would need to be set to (cf. [5], [6]).

/* compat syscalls */
It became clear that we would like to avoid adding compat syscalls
(cf. [7]).  The compat syscall handling is now done in kernel/signal.c
itself by adding __copy_siginfo_from_user_generic() which lets us avoid
compat syscalls (cf. [8]). It should be noted that the addition of
__copy_siginfo_from_user_any() is caused by a bug in the original
implementation of rt_sigqueueinfo(2) (cf. 12).
With upcoming rework for syscall handling things might improve
significantly (cf. [11]) and __copy_siginfo_from_user_any() will not gain
any additional callers.

/* testing */
This patch was tested on x64 and x86.

/* userspace usage */
An asciinema recording for the basic functionality can be found under [9].
With this patch a process can be killed via:

 #define _GNU_SOURCE
 #include <errno.h>
 #include <fcntl.h>
 #include <signal.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sys/stat.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
 #include <unistd.h>

 static inline int do_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
                                         unsigned int flags)
 {
 #ifdef __NR_pidfd_send_signal
         return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
 #else
         return -ENOSYS;
 #endif
 }

 int main(int argc, char *argv[])
 {
         int fd, ret, saved_errno, sig;

         if (argc < 3)
                 exit(EXIT_FAILURE);

         fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
         if (fd < 0) {
                 printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]);
                 exit(EXIT_FAILURE);
         }

         sig = atoi(argv[2]);

         printf("Sending signal %d to process %s\n", sig, argv[1]);
         ret = do_pidfd_send_signal(fd, sig, NULL, 0);

         saved_errno = errno;
         close(fd);
         errno = saved_errno;

         if (ret < 0) {
                 printf("%s - Failed to send signal %d to process %s\n",
                        strerror(errno), sig, argv[1]);
                 exit(EXIT_FAILURE);
         }

         exit(EXIT_SUCCESS);
 }

/* Q&A
 * Given that it seems the same questions get asked again by people who are
 * late to the party it makes sense to add a Q&A section to the commit
 * message so it's hopefully easier to avoid duplicate threads.
 *
 * For the sake of progress please consider these arguments settled unless
 * there is a new point that desperately needs to be addressed. Please make
 * sure to check the links to the threads in this commit message whether
 * this has not already been covered.
 */
Q-01: (Florian Weimer [20], Andrew Morton [21])
      What happens when the target process has exited?
A-01: Sending the signal will fail with ESRCH (cf. [22]).

Q-02:  (Andrew Morton [21])
       Is the task_struct pinned by the fd?
A-02:  No. A reference to struct pid is kept. struct pid - as far as I
       understand - was created exactly for the reason to not require to
       pin struct task_struct (cf. [22]).

Q-03: (Andrew Morton [21])
      Does the entire procfs directory remain visible? Just one entry
      within it?
A-03: The same thing that happens right now when you hold a file descriptor
      to /proc/<pid> open (cf. [22]).

Q-04: (Andrew Morton [21])
      Does the pid remain reserved?
A-04: No. This patchset guarantees a stable handle not that pids are not
      recycled (cf. [22]).

Q-05: (Andrew Morton [21])
      Do attempts to signal that fd return errors?
A-05: See {Q,A}-01.

Q-06: (Andrew Morton [22])
      Is there a cleaner way of obtaining the fd? Another syscall perhaps.
A-06: Userspace can already trivially retrieve file descriptors from procfs
      so this is something that we will need to support anyway. Hence,
      there's no immediate need to add another syscalls just to make
      pidfd_send_signal() not dependent on the presence of procfs. However,
      adding a syscalls to get such file descriptors is planned for a
      future patchset (cf. [22]).

Q-07: (Andrew Morton [21] and others)
      This fd-for-a-process sounds like a handy thing and people may well
      think up other uses for it in the future, probably unrelated to
      signals. Are the code and the interface designed to permit such
      future applications?
A-07: Yes (cf. [22]).

Q-08: (Andrew Morton [21] and others)
      Now I think about it, why a new syscall? This thing is looking
      rather like an ioctl?
A-08: This has been extensively discussed. It was agreed that a syscall is
      preferred for a variety or reasons. Here are just a few taken from
      prior threads. Syscalls are safer than ioctl()s especially when
      signaling to fds. Processes are a core kernel concept so a syscall
      seems more appropriate. The layout of the syscall with its four
      arguments would require the addition of a custom struct for the
      ioctl() thereby causing at least the same amount or even more
      complexity for userspace than a simple syscall. The new syscall will
      replace multiple other pid-based syscalls (see description above).
      The file-descriptors-for-processes concept introduced with this
      syscall will be extended with other syscalls in the future. See also
      [22], [23] and various other threads already linked in here.

Q-09: (Florian Weimer [24])
      What happens if you use the new interface with an O_PATH descriptor?
A-09:
      pidfds opened as O_PATH fds cannot be used to send signals to a
      process (cf. [2]). Signaling processes through pidfds is the
      equivalent of writing to a file. Thus, this is not an operation that
      operates "purely at the file descriptor level" as required by the
      open(2) manpage. See also [4].

/* References */
[1]:  https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/
[2]:  https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/
[3]:  https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/
[4]:  https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/
[5]:  https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/
[6]:  https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/
[7]:  https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/
[8]:  https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/
[9]:  https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy
[11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/
[12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/
[13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/
[14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/
[15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/
[16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/
[17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/
[18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/
[19]: https://lore.kernel.org/lkml/20181208054059.19813-1-christian@brauner.io/
[20]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
[21]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/
[22]: https://lore.kernel.org/lkml/20181228233725.722tdfgijxcssg76@brauner.io/
[23]: https://lwn.net/Articles/773459/
[24]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
[25]: https://lore.kernel.org/lkml/CAK8P3a0ej9NcJM8wXNPbcGUyOUZYX+VLoDFdbenW3s3114oQZw@mail.gmail.com/

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Aleksa Sarai <cyphar@cyphar.com>

(cherry picked from commit 3eb39f4793)

Conflicts:
        include/linux/proc_fs.h - trivial manual merge
        include/uapi/asm-generic/unistd.h - trivial manual merge
        kernel/signal.c

(1. manual merges because of 4.19 differences
 2. change prepare_kill_siginfo() to use struct siginfo instead of
kernel_siginfo
 3. change copy_siginfo_from_user_any() to use struct siginfo instead of
kernel_siginfo
 4. change pidfd_send_signal() to use struct siginfo instead of
kernel_siginfo
 5. use copy_from_user() instead of copy_siginfo_from_user() in
copy_siginfo_from_user_any())

Bug: 135608568
Test: test program using syscall(__NR_pidfd_send_signal,..) to send SIGKILL
Change-Id: I24e6298ecf036d1822f3fa6c5286984b4e195c16
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Greg Kroah-Hartman
487e61785a This is the 4.19.66 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1NlswACgkQONu9yGCS
 aT6bJBAAhlElptL5xPWWrkZdYBVyfcat3VaeEmtO7ilHxyvhFMFOE8zIYEg/NU5s
 ySM1yqdYvZY1ANawzIEH3es7eXy++GBBIVcRUPBuEVG/yv4/XyeX6MQYlDn3LuzI
 KsTAkkkqy4LLieqZTF5cqavu1EuUkQWkPxKW1ps9O5Dv24FRmKhj+EiIoEGt43Ln
 6d8vChzS+ZWNhWMJd4zP8x5ZXjUUTYlBxveRQQ9yatTbzrQdfkxyD7k6+1K8se8T
 m9koOVJC9LtbR7W2WbVVq4L6ik3l3LQTr592kZVKiwyVgJtlul3LvqS0hNXcQxSj
 pwfbz+4b8UP4aRgRbMxzSRLOkb+hUOvYR+CuGLKx827b9FQcLNAbseRsA0MlPENF
 jJJUQSDollhx4knbYU+8Y2V1WtDi7Dnjt5gRvCvQ6rdrJMp1mFdqRGxfsdBaR10N
 kU+LaSfIheEvuRoTv535NpxvPQhmEqY6NTGwMYDP1xlvu4vQbWF+HAMhCLmTWrif
 i5ITtxYxSZOHNy7lvR7dNTMWsLoYg0KvG/oqJo8kSLmlBPcga+VK1YlPsc6JrfEG
 d4ft7rMz/Cx2oeUQfiURqi/XPwyIBaU6MbovLuQkVx9CJcwgYcmdFD0HRHZIEwgk
 z0q965POxm3DUoJF8HjBFNUOmlaYn9JZBmrwg5aFgwCy5Kot8g0=
 =aeth
 -----END PGP SIGNATURE-----

Merge 4.19.66 into android-4.19

Changes in 4.19.66
	scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure
	gcc-9: don't warn about uninitialized variable
	driver core: Establish order of operations for device_add and device_del via bitflag
	drivers/base: Introduce kill_device()
	libnvdimm/bus: Prevent duplicate device_unregister() calls
	libnvdimm/region: Register badblocks before namespaces
	libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant
	libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
	HID: wacom: fix bit shift for Cintiq Companion 2
	HID: Add quirk for HP X1200 PIXART OEM mouse
	IB: directly cast the sockaddr union to aockaddr
	atm: iphase: Fix Spectre v1 vulnerability
	bnx2x: Disable multi-cos feature.
	ife: error out when nla attributes are empty
	ip6_gre: reload ipv6h in prepare_ip6gre_xmit_ipv6
	ip6_tunnel: fix possible use-after-free on xmit
	ipip: validate header length in ipip_tunnel_xmit
	mlxsw: spectrum: Fix error path in mlxsw_sp_module_init()
	mvpp2: fix panic on module removal
	mvpp2: refactor MTU change code
	net: bridge: delete local fdb on device init failure
	net: bridge: mcast: don't delete permanent entries when fast leave is enabled
	net: fix ifindex collision during namespace removal
	net/mlx5e: always initialize frag->last_in_page
	net/mlx5: Use reversed order when unregister devices
	net: phylink: Fix flow control for fixed-link
	net: qualcomm: rmnet: Fix incorrect UL checksum offload logic
	net: sched: Fix a possible null-pointer dereference in dequeue_func()
	net sched: update vlan action for batched events operations
	net: sched: use temporary variable for actions indexes
	net/smc: do not schedule tx_work in SMC_CLOSED state
	NFC: nfcmrvl: fix gpio-handling regression
	ocelot: Cancel delayed work before wq destruction
	tipc: compat: allow tipc commands without arguments
	tun: mark small packets as owned by the tap sock
	net/mlx5: Fix modify_cq_in alignment
	net/mlx5e: Prevent encap flow counter update async to user query
	r8169: don't use MSI before RTL8168d
	compat_ioctl: pppoe: fix PPPOEIOCSFWD handling
	cgroup: Call cgroup_release() before __exit_signal()
	cgroup: Implement css_task_iter_skip()
	cgroup: Include dying leaders with live threads in PROCS iterations
	cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
	cgroup: Fix css_task_iter_advance_css_set() cset skip condition
	spi: bcm2835: Fix 3-wire mode if DMA is enabled
	Linux 4.19.66

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id33ce169af8bf14a3791040b4cf923832ce84f6c
2019-08-11 15:21:36 +02:00
Tejun Heo
ebda41dd17 cgroup: Fix css_task_iter_advance_css_set() cset skip condition
commit c596687a00 upstream.

While adding handling for dying task group leaders c03cd7738a
("cgroup: Include dying leaders with live threads in PROCS
iterations") added an inverted cset skip condition to
css_task_iter_advance_css_set().  It should skip cset if it's
completely empty but was incorrectly testing for the inverse condition
for the dying_tasks list.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: c03cd7738a ("cgroup: Include dying leaders with live threads in PROCS iterations")
Reported-by: syzbot+d4bba5ccd4f9a2a68681@syzkaller.appspotmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-09 17:52:35 +02:00
Tejun Heo
0a9abd2778 cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
commit cee0c33c54 upstream.

b636fd38dc ("cgroup: Implement css_task_iter_skip()") introduced
css_task_iter_skip() which is used to fix task iterations skipping
dying threadgroup leaders with live threads.  Skipping is implemented
as a subportion of full advancing but css_task_iter_next() forgot to
fully advance a skipped iterator before determining the next task to
visit causing it to return invalid task pointers.

Fix it by making css_task_iter_next() fully advance the iterator if it
has been skipped since the previous iteration.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: syzbot
Link: http://lkml.kernel.org/r/00000000000097025d058a7fd785@google.com
Fixes: b636fd38dc ("cgroup: Implement css_task_iter_skip()")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-09 17:52:34 +02:00
Tejun Heo
4340d175b8 cgroup: Include dying leaders with live threads in PROCS iterations
commit c03cd7738a upstream.

CSS_TASK_ITER_PROCS currently iterates live group leaders; however,
this means that a process with dying leader and live threads will be
skipped.  IOW, cgroup.procs might be empty while cgroup.threads isn't,
which is confusing to say the least.

Fix it by making cset track dying tasks and include dying leaders with
live threads in PROCS iteration.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-09 17:52:34 +02:00
Tejun Heo
370b9e6399 cgroup: Implement css_task_iter_skip()
commit b636fd38dc upstream.

When a task is moved out of a cset, task iterators pointing to the
task are advanced using the normal css_task_iter_advance() call.  This
is fine but we'll be tracking dying tasks on csets and thus moving
tasks from cset->tasks to (to be added) cset->dying_tasks.  When we
remove a task from cset->tasks, if we advance the iterators, they may
move over to the next cset before we had the chance to add the task
back on the dying list, which can allow the task to escape iteration.

This patch separates out skipping from advancing.  Skipping only moves
the affected iterators to the next pointer rather than fully advancing
it and the following advancing will recognize that the cursor has
already been moved forward and do the rest of advancing.  This ensures
that when a task moves from one list to another in its cset, as long
as it moves in the right direction, it's always visible to iteration.

This doesn't cause any visible behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-09 17:52:34 +02:00
Tejun Heo
7528e95b75 cgroup: Call cgroup_release() before __exit_signal()
commit 6b115bf58e upstream.

cgroup_release() calls cgroup_subsys->release() which is used by the
pids controller to uncharge its pid.  We want to use it to manage
iteration of dying tasks which requires putting it before
__unhash_process().  Move cgroup_release() above __exit_signal().
While this makes it uncharge before the pid is freed, pid is RCU freed
anyway and the window is very narrow.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-09 17:52:34 +02:00
Greg Kroah-Hartman
de4c70d6a9 This is the 4.19.65 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1Js7MACgkQONu9yGCS
 aT4PQxAAo7xa4kYvDxc1RjUY/yIlp6lQ3rpYAAfZB0t8vN+dqivnJZ7m6JHeWX1Y
 CMcxg85zxLVFeuiXdP821Zj68AB5zqlWMhX0bXm2lhw/Eo9+XHzXtnrLZHhz0/Xd
 M5cmfIPmoyPCUQQfzSfUMvch+ZpwzEt5op5pUfSjckSpjHQZ0HFj1WJ4D8Hn9jAJ
 y4+DAKDZgtqhb3GvpS6MoVnBJgcPk9+mBiDkSb12L392+FvHqfeBi3tDRhvyiZAO
 iJrk747SPds7NlNmuRnj7YyUSDhBzaceRCz0Jsv9FT5EKXoPErXdsL3Bkfa9TREM
 pH0OaMgNr6WSXLO9qIMcfxMeaKVIvIbotqBTkBTzhEAGPkHA75dhi0lpixXXFExg
 MaqhLfmHO0dOEr9FrvYGe7f2wUA1Rdw/qRTM3KPEKmHxMqBS7eufIWMHwie1n9Oe
 cYoP6UkxUIvhUyFV2BlMRFdMfaDbtR0iqy8Dqh36NISD6PAYaUGSoVeSO1fEg4Jy
 5GgrKPg6rcz2XNY2cVbsm2zLpqY4dY58SFK9ORfuULdKUQvScvFGrdSSW0CgX+uc
 F/5NmPutUoboHVxFraDPx7yo46pHf1RW0Me4xZ0aJ3e9ituLAN4fmJ9u46nofb5M
 thPelQlMVt30O41uViJ0ADkOjCsiBr3AxOFvc76Ct9Q/BJVxhLk=
 =JVBv
 -----END PGP SIGNATURE-----

Merge 4.19.65 into android-4.19

Changes in 4.19.65
	ARM: riscpc: fix DMA
	ARM: dts: rockchip: Make rk3288-veyron-minnie run at hs200
	ARM: dts: rockchip: Make rk3288-veyron-mickey's emmc work again
	ARM: dts: rockchip: Mark that the rk3288 timer might stop in suspend
	ftrace: Enable trampoline when rec count returns back to one
	dmaengine: tegra-apb: Error out if DMA_PREP_INTERRUPT flag is unset
	arm64: dts: rockchip: fix isp iommu clocks and power domain
	kernel/module.c: Only return -EEXIST for modules that have finished loading
	firmware/psci: psci_checker: Park kthreads before stopping them
	MIPS: lantiq: Fix bitfield masking
	dmaengine: rcar-dmac: Reject zero-length slave DMA requests
	clk: tegra210: fix PLLU and PLLU_OUT1
	fs/adfs: super: fix use-after-free bug
	clk: sprd: Add check for return value of sprd_clk_regmap_init()
	btrfs: fix minimum number of chunk errors for DUP
	btrfs: qgroup: Don't hold qgroup_ioctl_lock in btrfs_qgroup_inherit()
	cifs: Fix a race condition with cifs_echo_request
	ceph: fix improper use of smp_mb__before_atomic()
	ceph: return -ERANGE if virtual xattr value didn't fit in buffer
	ACPI: blacklist: fix clang warning for unused DMI table
	scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized
	perf version: Fix segfault due to missing OPT_END()
	x86: kvm: avoid constant-conversion warning
	ACPI: fix false-positive -Wuninitialized warning
	be2net: Signal that the device cannot transmit during reconfiguration
	x86/apic: Silence -Wtype-limits compiler warnings
	x86: math-emu: Hide clang warnings for 16-bit overflow
	mm/cma.c: fail if fixed declaration can't be honored
	lib/test_overflow.c: avoid tainting the kernel and fix wrap size
	lib/test_string.c: avoid masking memset16/32/64 failures
	coda: add error handling for fget
	coda: fix build using bare-metal toolchain
	uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers
	drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
	ipc/mqueue.c: only perform resource calculation if user valid
	mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed
	xen/pv: Fix a boot up hang revealed by int3 self test
	x86/kvm: Don't call kvm_spurious_fault() from .fixup
	x86/paravirt: Fix callee-saved function ELF sizes
	x86, boot: Remove multiple copy of static function sanitize_boot_params()
	drm/nouveau: fix memory leak in nouveau_conn_reset()
	kconfig: Clear "written" flag to avoid data loss
	kbuild: initialize CLANG_FLAGS correctly in the top Makefile
	Btrfs: fix incremental send failure after deduplication
	Btrfs: fix race leading to fs corruption after transaction abort
	mmc: dw_mmc: Fix occasional hang after tuning on eMMC
	mmc: meson-mx-sdio: Fix misuse of GENMASK macro
	gpiolib: fix incorrect IRQ requesting of an active-low lineevent
	IB/hfi1: Fix Spectre v1 vulnerability
	mtd: rawnand: micron: handle on-die "ECC-off" devices correctly
	selinux: fix memory leak in policydb_init()
	ALSA: hda: Fix 1-minute detection delay when i915 module is not available
	mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
	s390/dasd: fix endless loop after read unit address configuration
	cgroup: kselftest: relax fs_spec checks
	parisc: Fix build of compressed kernel even with debug enabled
	drivers/perf: arm_pmu: Fix failure path in PM notifier
	arm64: compat: Allow single-byte watchpoints on all addresses
	arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG,ERG}
	nbd: replace kill_bdev() with __invalidate_device() again
	xen/swiotlb: fix condition for calling xen_destroy_contiguous_region()
	IB/mlx5: Fix unreg_umr to ignore the mkey state
	IB/mlx5: Use direct mkey destroy command upon UMR unreg failure
	IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache
	IB/mlx5: Fix clean_mr() to work in the expected order
	IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification
	IB/hfi1: Check for error on call to alloc_rsm_map_table
	drm/i915/gvt: fix incorrect cache entry for guest page mapping
	eeprom: at24: make spd world-readable again
	ARC: enable uboot support unconditionally
	objtool: Support GCC 9 cold subfunction naming scheme
	gcc-9: properly declare the {pv,hv}clock_page storage
	x86/vdso: Prevent segfaults due to hoisted vclock reads
	scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA
	x86/cpufeatures: Carve out CQM features retrieval
	x86/cpufeatures: Combine word 11 and 12 into a new scattered features word
	x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
	x86/speculation: Enable Spectre v1 swapgs mitigations
	x86/entry/64: Use JMP instead of JMPQ
	x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
	Documentation: Add swapgs description to the Spectre v1 documentation
	Linux 4.19.65

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iceeabdb164657e0a616db618e6aa8445d56b0dc1
2019-08-06 20:08:18 +02:00
Prarit Bhargava
09ec6c6783 kernel/module.c: Only return -EEXIST for modules that have finished loading
[ Upstream commit 6e6de3dee5 ]

Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and
linux guests boot with repeated errors:

amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)

The warnings occur because the module code erroneously returns -EEXIST
for modules that have failed to load and are in the process of being
removed from the module list.

module amd64_edac_mod has a dependency on module edac_mce_amd.  Using
modules.dep, systemd will load edac_mce_amd for every request of
amd64_edac_mod.  When the edac_mce_amd module loads, the module has
state MODULE_STATE_UNFORMED and once the module load fails and the state
becomes MODULE_STATE_GOING.  Another request for edac_mce_amd module
executes and add_unformed_module() will erroneously return -EEXIST even
though the previous instance of edac_mce_amd has MODULE_STATE_GOING.
Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which
fails because of unknown symbols from edac_mce_amd.

add_unformed_module() must wait to return for any case other than
MODULE_STATE_LIVE to prevent a race between multiple loads of
dependent modules.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Barret Rhoden <brho@google.com>
Cc: David Arcari <darcari@redhat.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-06 19:06:47 +02:00
Cheng Jian
f486088d38 ftrace: Enable trampoline when rec count returns back to one
[ Upstream commit a124692b69 ]

Custom trampolines can only be enabled if there is only a single ops
attached to it. If there's only a single callback registered to a function,
and the ops has a trampoline registered for it, then we can call the
trampoline directly. This is very useful for improving the performance of
ftrace and livepatch.

If more than one callback is registered to a function, the general
trampoline is used, and the custom trampoline is not restored back to the
direct call even if all the other callbacks were unregistered and we are
back to one callback for the function.

To fix this, set FTRACE_FL_TRAMP flag if rec count is decremented
to one, and the ops that left has a trampoline.

Testing After this patch :

insmod livepatch_unshare_files.ko
cat /sys/kernel/debug/tracing/enabled_functions

	unshare_files (1) R I	tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0

echo unshare_files > /sys/kernel/debug/tracing/set_ftrace_filter
echo function > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/enabled_functions

	unshare_files (2) R I ->ftrace_ops_list_func+0x0/0x150

echo nop > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/enabled_functions

	unshare_files (1) R I	tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0

Link: http://lkml.kernel.org/r/1556969979-111047-1-git-send-email-cj.chengjian@huawei.com

Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-06 19:06:46 +02:00
Greg Kroah-Hartman
844ecc4634 This is the 4.19.64 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1GibIACgkQONu9yGCS
 aT7z2hAAmv8AsH9IG43m7t6zLroJVswr/9594xk7yPBQgcY3/PW2aTFBCFbsdOL4
 yXcj2PSwRiq9K6qAJULrvOvncR9fIILHqzWzyXnoaZ30lR/FxaaFmuHZX/5Ix1tB
 e5EEE/EA49UAEjEDaMLq8g2IvibsReDxmSpnXyBJWoyRAdFIElVnMJ2+zvP/wRhF
 NKzQj/bj/qecCbis2lUCaVWJFZ6+P/52UbD8lvIwqR3nk2TKsGDcLU6eY3yg4KrB
 rEHl5T8KIPrkX3KNIEB8EcFREene+rdpZLLVe4fYwf+gOqfiFXSzZZvweauMkplq
 ehlVHkykvQvlsVM2tjBD379z3C4aasZDuMVNMCbAy2FlruLeBQ7gEn77mCJB9VH5
 /n/mlc2yizdoowtARCLWOUMfASpdSbqu2SQ7A/3kwG7l6GrpzKSIU2nQgm+41sUZ
 QJVtZ3IYsPoYjnU4B3JZzgJnf3M9jcRz/3JegviqhSEbF1gaScJX0cqN8C1idN/v
 ZAGCJK9S20/EEEsp5jn+bq2grUehvmD4TVDfot4P+5yRYyBIhMFpbM2RpjydOpwy
 +x8D1Q34LYPFgZfQ0vF62vcSBhMBiJ/7j41rUeo44K+Lg00F3yCOyL6FxK6S8h6j
 wsD0xLbllMrhV5KRYFizb3QbCHoHYiROIJk76uLvB+Tqq2Jg9VQ=
 =qIi2
 -----END PGP SIGNATURE-----

Merge 4.19.64 into android-4.19

Changes in 4.19.64
	hv_sock: Add support for delayed close
	vsock: correct removal of socket from the list
	NFS: Fix dentry revalidation on NFSv4 lookup
	NFS: Refactor nfs_lookup_revalidate()
	NFSv4: Fix lookup revalidate of regular files
	usb: dwc2: Disable all EP's on disconnect
	usb: dwc2: Fix disable all EP's on disconnect
	arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
	binder: fix possible UAF when freeing buffer
	ISDN: hfcsusb: checking idx of ep configuration
	media: au0828: fix null dereference in error path
	ath10k: Change the warning message string
	media: cpia2_usb: first wake up, then free in disconnect
	media: pvrusb2: use a different format for warnings
	NFS: Cleanup if nfs_match_client is interrupted
	media: radio-raremono: change devm_k*alloc to k*alloc
	iommu/vt-d: Don't queue_iova() if there is no flush queue
	iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
	Bluetooth: hci_uart: check for missing tty operations
	vhost: introduce vhost_exceeds_weight()
	vhost_net: fix possible infinite loop
	vhost: vsock: add weight support
	vhost: scsi: add weight support
	sched/fair: Don't free p->numa_faults with concurrent readers
	sched/fair: Use RCU accessors consistently for ->numa_group
	/proc/<pid>/cmdline: remove all the special cases
	/proc/<pid>/cmdline: add back the setproctitle() special case
	drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
	Fix allyesconfig output.
	ceph: hold i_ceph_lock when removing caps for freeing inode
	block, scsi: Change the preempt-only flag into a counter
	scsi: core: Avoid that a kernel warning appears during system resume
	ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
	Linux 4.19.64

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3e9055b677bd8ad9d5070307fae0bc765d444e9d
2019-08-04 09:37:11 +02:00
Jann Horn
a5a3915f17 sched/fair: Use RCU accessors consistently for ->numa_group
commit cb361d8cde upstream.

The old code used RCU annotations and accessors inconsistently for
->numa_group, which can lead to use-after-frees and NULL dereferences.

Let all accesses to ->numa_group use proper RCU helpers to prevent such
issues.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 8c8a743c50 ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04 09:30:56 +02:00
Jann Horn
48046e092a sched/fair: Don't free p->numa_faults with concurrent readers
commit 16d51a590a upstream.

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04 09:30:56 +02:00
Greg Kroah-Hartman
75ff56e1a2 This is the 4.19.63 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1BJrAACgkQONu9yGCS
 aT6MpRAAt2nuozm5Z/MshFAuGFzddAwPoYtkIPSy8BiPHYjf7x0+D5Ew4dz5OihS
 ElfbA94hMOpvhhXlzBU3ZFJsWZIK78gzV6+LiHyb5R97Jdzj/zT4h40y0kxKw+pS
 gghnZ6zx+pGSIXm/EsODW2gg98yTrmhBFpUhXpAGoC/71c1vxVlj3jHcuhd778YK
 NRlj2tFWJGIBpmXrApo1Eg7qQRj4tzbOjthfgANWPq+EP68PgiOaxMDMMp7IUwju
 KrXEFcXlk5aHvfaJ06FSBRBnn45XMyGXPYV/76HsaqkBNmg1r7o02dZKy/0S84Fn
 YXoEFxHt6NFOJ52LiO95z7cQ0xeTSYygNCtYXJ70uyDMnrCPXCrKp7DRyka+vDPs
 RCrcpB1QjcCb3xTL//SPkNWM3oZEW9CawpRFh9bmqbw/h7ZjaUEuwNIJnunISulu
 2fvOjUmFWfUVIARiwKFVuIkXzgf3cSLYZTtiDFC5/yBpkGVNnXqyO7YtWZwnrMHq
 L3DC3pOKuYXMa03KWGEzZoCZEXjtoRhRwCSgV7wbK5o90sZeRj/HC1zXEyDPPD/R
 7A1rePTuwlAH3gHCJGhYkmYqULx62ZdvV6IC2N7xNxeTL1Y7OVNBT7ZUqexxY6WC
 OG1vVxUKNJIvBLYmc6cmQATgR6XH5/B9H2p1YRBLuAQuqHk+jqo=
 =ZQf3
 -----END PGP SIGNATURE-----

Merge 4.19.63 into android-4.19

Changes in 4.19.63
	hvsock: fix epollout hang from race condition
	drm/panel: simple: Fix panel_simple_dsi_probe
	iio: adc: stm32-dfsdm: manage the get_irq error case
	iio: adc: stm32-dfsdm: missing error case during probe
	staging: vt6656: use meaningful error code during buffer allocation
	usb: core: hub: Disable hub-initiated U1/U2
	tty: max310x: Fix invalid baudrate divisors calculator
	pinctrl: rockchip: fix leaked of_node references
	tty: serial: cpm_uart - fix init when SMC is relocated
	drm/amd/display: Fill prescale_params->scale for RGB565
	drm/amdgpu/sriov: Need to initialize the HDP_NONSURFACE_BAStE
	drm/amd/display: Disable ABM before destroy ABM struct
	drm/amdkfd: Fix a potential memory leak
	drm/amdkfd: Fix sdma queue map issue
	drm/edid: Fix a missing-check bug in drm_load_edid_firmware()
	PCI: Return error if cannot probe VF
	drm/bridge: tc358767: read display_props in get_modes()
	drm/bridge: sii902x: pixel clock unit is 10kHz instead of 1kHz
	gpu: host1x: Increase maximum DMA segment size
	drm/crc-debugfs: User irqsafe spinlock in drm_crtc_add_crc_entry
	drm/crc-debugfs: Also sprinkle irqrestore over early exits
	memstick: Fix error cleanup path of memstick_init
	tty/serial: digicolor: Fix digicolor-usart already registered warning
	tty: serial: msm_serial: avoid system lockup condition
	serial: 8250: Fix TX interrupt handling condition
	drm/amd/display: Always allocate initial connector state state
	drm/virtio: Add memory barriers for capset cache.
	phy: renesas: rcar-gen2: Fix memory leak at error paths
	drm/amd/display: fix compilation error
	powerpc/pseries/mobility: prevent cpu hotplug during DT update
	drm/rockchip: Properly adjust to a true clock in adjusted_mode
	serial: imx: fix locking in set_termios()
	tty: serial_core: Set port active bit in uart_port_activate
	usb: gadget: Zero ffs_io_data
	mmc: sdhci: sdhci-pci-o2micro: Check if controller supports 8-bit width
	powerpc/pci/of: Fix OF flags parsing for 64bit BARs
	drm/msm: Depopulate platform on probe failure
	serial: mctrl_gpio: Check if GPIO property exisits before requesting it
	PCI: sysfs: Ignore lockdep for remove attribute
	i2c: stm32f7: fix the get_irq error cases
	kbuild: Add -Werror=unknown-warning-option to CLANG_FLAGS
	genksyms: Teach parser about 128-bit built-in types
	PCI: xilinx-nwl: Fix Multi MSI data programming
	iio: iio-utils: Fix possible incorrect mask calculation
	powerpc/cacheflush: fix variable set but not used
	powerpc/xmon: Fix disabling tracing while in xmon
	recordmcount: Fix spurious mcount entries on powerpc
	mfd: madera: Add missing of table registration
	mfd: core: Set fwnode for created devices
	mfd: arizona: Fix undefined behavior
	mfd: hi655x-pmic: Fix missing return value check for devm_regmap_init_mmio_clk
	mm/swap: fix release_pages() when releasing devmap pages
	um: Silence lockdep complaint about mmap_sem
	powerpc/4xx/uic: clear pending interrupt after irq type/pol change
	RDMA/i40iw: Set queue pair state when being queried
	serial: sh-sci: Terminate TX DMA during buffer flushing
	serial: sh-sci: Fix TX DMA buffer flushing and workqueue races
	IB/mlx5: Fixed reporting counters on 2nd port for Dual port RoCE
	powerpc/mm: Handle page table allocation failures
	IB/ipoib: Add child to parent list only if device initialized
	arm64: assembler: Switch ESB-instruction with a vanilla nop if !ARM64_HAS_RAS
	PCI: mobiveil: Fix PCI base address in MEM/IO outbound windows
	PCI: mobiveil: Fix the Class Code field
	kallsyms: exclude kasan local symbols on s390
	PCI: mobiveil: Initialize Primary/Secondary/Subordinate bus numbers
	PCI: mobiveil: Use the 1st inbound window for MEM inbound transactions
	perf test mmap-thread-lookup: Initialize variable to suppress memory sanitizer warning
	perf stat: Fix use-after-freed pointer detected by the smatch tool
	perf top: Fix potential NULL pointer dereference detected by the smatch tool
	perf session: Fix potential NULL pointer dereference found by the smatch tool
	perf annotate: Fix dereferencing freed memory found by the smatch tool
	perf hists browser: Fix potential NULL pointer dereference found by the smatch tool
	RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM
	PCI: dwc: pci-dra7xx: Fix compilation when !CONFIG_GPIOLIB
	powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
	block: init flush rq ref count to 1
	f2fs: avoid out-of-range memory access
	mailbox: handle failed named mailbox channel request
	dlm: check if workqueues are NULL before flushing/destroying
	powerpc/eeh: Handle hugepages in ioremap space
	block/bio-integrity: fix a memory leak bug
	sh: prevent warnings when using iounmap
	mm/kmemleak.c: fix check for softirq context
	9p: pass the correct prototype to read_cache_page
	mm/gup.c: mark undo_dev_pagemap as __maybe_unused
	mm/gup.c: remove some BUG_ONs from get_gate_page()
	memcg, fsnotify: no oom-kill for remote memcg charging
	mm/mmu_notifier: use hlist_add_head_rcu()
	proc: use down_read_killable mmap_sem for /proc/pid/smaps_rollup
	proc: use down_read_killable mmap_sem for /proc/pid/pagemap
	proc: use down_read_killable mmap_sem for /proc/pid/clear_refs
	proc: use down_read_killable mmap_sem for /proc/pid/map_files
	cxgb4: reduce kernel stack usage in cudbg_collect_mem_region()
	proc: use down_read_killable mmap_sem for /proc/pid/maps
	locking/lockdep: Fix lock used or unused stats error
	mm: use down_read_killable for locking mmap_sem in access_remote_vm
	locking/lockdep: Hide unused 'class' variable
	usb: wusbcore: fix unbalanced get/put cluster_id
	usb: pci-quirks: Correct AMD PLL quirk detection
	btrfs: inode: Don't compress if NODATASUM or NODATACOW set
	x86/sysfb_efi: Add quirks for some devices with swapped width and height
	x86/speculation/mds: Apply more accurate check on hypervisor platform
	binder: prevent transactions to context manager from its own process.
	fpga-manager: altera-ps-spi: Fix build error
	mei: me: add mule creek canyon (EHL) device ids
	hpet: Fix division by zero in hpet_time_div()
	ALSA: ac97: Fix double free of ac97_codec_device
	ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1
	ALSA: hda - Add a conexant codec entry to let mute led work
	powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()
	powerpc/tm: Fix oops on sigreturn on systems without TM
	libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl()
	access: avoid the RCU grace period for the temporary subjective credentials
	Linux 4.19.63

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic31529aa6fd283d16d6bfb182187a9402a4db44f
2019-07-31 08:03:42 +02:00
Linus Torvalds
408af82309 access: avoid the RCU grace period for the temporary subjective credentials
commit d7852fbd0f upstream.

It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
work because it installs a temporary credential that gets allocated and
freed for each system call.

The allocation and freeing overhead is mostly benign, but because
credentials can be accessed under the RCU read lock, the freeing
involves a RCU grace period.

Which is not a huge deal normally, but if you have a lot of access()
calls, this causes a fair amount of seconday damage: instead of having a
nice alloc/free patterns that hits in hot per-CPU slab caches, you have
all those delayed free's, and on big machines with hundreds of cores,
the RCU overhead can end up being enormous.

But it turns out that all of this is entirely unnecessary.  Exactly
because access() only installs the credential as the thread-local
subjective credential, the temporary cred pointer doesn't actually need
to be RCU free'd at all.  Once we're done using it, we can just free it
synchronously and avoid all the RCU overhead.

So add a 'non_rcu' flag to 'struct cred', which can be set by users that
know they only use it in non-RCU context (there are other potential
users for this).  We can make it a union with the rcu freeing list head
that we need for the RCU case, so this doesn't need any extra storage.

Note that this also makes 'get_current_cred()' clear the new non_rcu
flag, in case we have filesystems that take a long-term reference to the
cred and then expect the RCU delayed freeing afterwards.  It's not
entirely clear that this is required, but it makes for clear semantics:
the subjective cred remains non-RCU as long as you only access it
synchronously using the thread-local accessors, but you _can_ use it as
a generic cred if you want to.

It is possible that we should just remove the whole RCU markings for
->cred entirely.  Only ->real_cred is really supposed to be accessed
through RCU, and the long-term cred copies that nfs uses might want to
explicitly re-enable RCU freeing if required, rather than have
get_current_cred() do it implicitly.

But this is a "minimal semantic changes" change for the immediate
problem.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Glauber <jglauber@marvell.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-31 07:27:11 +02:00
Arnd Bergmann
148959cc64 locking/lockdep: Hide unused 'class' variable
[ Upstream commit 68037aa782 ]

The usage is now hidden in an #ifdef, so we need to move
the variable itself in there as well to avoid this warning:

  kernel/locking/lockdep_proc.c:203:21: error: unused variable 'class' [-Werror,-Wunused-variable]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yuyang Du <duyuyang@gmail.com>
Cc: frederic@kernel.org
Fixes: 68d41d8c94 ("locking/lockdep: Fix lock used or unused stats error")
Link: https://lkml.kernel.org/r/20190715092809.736834-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31 07:27:09 +02:00
Yuyang Du
4acb04ef5e locking/lockdep: Fix lock used or unused stats error
[ Upstream commit 68d41d8c94 ]

The stats variable nr_unused_locks is incremented every time a new lock
class is register and decremented when the lock is first used in
__lock_acquire(). And after all, it is shown and checked in lockdep_stats.

However, under configurations that either CONFIG_TRACE_IRQFLAGS or
CONFIG_PROVE_LOCKING is not defined:

The commit:

  0918065151 ("locking/lockdep: Consolidate lock usage bit initialization")

missed marking the LOCK_USED flag at IRQ usage initialization because
as mark_usage() is not called. And the commit:

  886532aee3 ("locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING")

further made mark_lock() not defined such that the LOCK_USED cannot be
marked at all when the lock is first acquired.

As a result, we fix this by not showing and checking the stats under such
configurations for lockdep_stats.

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: arnd@arndb.de
Cc: frederic@kernel.org
Link: https://lkml.kernel.org/r/20190709101522.9117-1-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31 07:27:09 +02:00
Greg Kroah-Hartman
f232ce65ba This is the 4.19.62 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl09QMoACgkQONu9yGCS
 aT7bBQ//dApu9DLQ3tyfc+tucIc3ViOcif3zlfLWRqClcLXw1HF63/Q/u40kYuf1
 TOnMkVXFWmpeGPgz2/tWw9TN+ibHaPBfyMoPGw/Ehs5dB8qn/F4AnLqkePFimIYQ
 +YOZTbvPqW/xSh8I79i6y2NxrZLGvO9FDY9YUblBKz4lKWc+HULXn9RmjtxlBOLE
 s6epoM0tWKPPeM5MyJIwT+9MZ0o3Cj89YNv+PqYLQSzcvbUbHDUiMt+DO4rqSyyf
 6fQe4mSX4tk+tbviyEruvKH3kJg0rWS1Maw1/h+AOJ8Gmr2WZ+vYBorPEdwdYu5K
 eootc3Jkj4wtcPsvKtNzWVPqJxdd5j651vS8/bxA0DmOVQM7A8dUBWKtMJGGG7nM
 nIRvxoMo+kv9QE/DJpeQ/apE9WnBflSqw6/DYdqs11gTX4E+beR75aRoVD1ue0lS
 if5CfnM0BTiOO1rMdMo42rzcBfh2DLt5a18WXsMmYOaSMGEJuF0KjgaGc5E4N00A
 QlqIJ7PKk+Kb53jz5oLV2vB/SXNvAQRLAvMTMH2Mst4veDonYyiVTfp6C+r8DJYc
 hvyaF1FoCMTWV5XsINmM2mlJ2+/G0nV5kLDmVPUHiFxE0JXnPQ8iCx9a96Ub+Zim
 F//muwohNIUa6EEN6AwrEUsoyVZ7cP/aR5wHQ9PAY3RV5nis474=
 =zWcX
 -----END PGP SIGNATURE-----

Merge 4.19.62 into android-4.19

Changes in 4.19.62
	bnx2x: Prevent load reordering in tx completion processing
	caif-hsi: fix possible deadlock in cfhsi_exit_module()
	hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
	igmp: fix memory leak in igmpv3_del_delrec()
	ipv4: don't set IPv6 only flags to IPv4 addresses
	ipv6: rt6_check should return NULL if 'from' is NULL
	ipv6: Unlink sibling route in case of failure
	net: bcmgenet: use promisc for unsupported filters
	net: dsa: mv88e6xxx: wait after reset deactivation
	net: make skb_dst_force return true when dst is refcounted
	net: neigh: fix multiple neigh timer scheduling
	net: openvswitch: fix csum updates for MPLS actions
	net: phy: sfp: hwmon: Fix scaling of RX power
	net: stmmac: Re-work the queue selection for TSO packets
	nfc: fix potential illegal memory access
	r8169: fix issue with confused RX unit after PHY power-down on RTL8411b
	rxrpc: Fix send on a connected, but unbound socket
	sctp: fix error handling on stream scheduler initialization
	sky2: Disable MSI on ASUS P6T
	tcp: be more careful in tcp_fragment()
	tcp: fix tcp_set_congestion_control() use from bpf hook
	tcp: Reset bytes_acked and bytes_received when disconnecting
	vrf: make sure skb->data contains ip header to make routing
	net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn
	macsec: fix use-after-free of skb during RX
	macsec: fix checksumming after decryption
	netrom: fix a memory leak in nr_rx_frame()
	netrom: hold sock when setting skb->destructor
	net_sched: unset TCQ_F_CAN_BYPASS when adding filters
	net/tls: make sure offload also gets the keys wiped
	sctp: not bind the socket in sctp_connect
	net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling
	net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query
	net: bridge: don't cache ether dest pointer on input
	net: bridge: stp: don't cache eth dest pointer before skb pull
	dma-buf: balance refcount inbalance
	dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc
	gpio: davinci: silence error prints in case of EPROBE_DEFER
	MIPS: lb60: Fix pin mappings
	perf/core: Fix exclusive events' grouping
	perf/core: Fix race between close() and fork()
	ext4: don't allow any modifications to an immutable file
	ext4: enforce the immutable flag on open files
	mm: add filemap_fdatawait_range_keep_errors()
	jbd2: introduce jbd2_inode dirty range scoping
	ext4: use jbd2_inode dirty range scoping
	ext4: allow directory holes
	KVM: nVMX: do not use dangling shadow VMCS after guest reset
	KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
	mm: vmscan: scan anonymous pages on file refaults
	net: sched: verify that q!=NULL before setting q->flags
	Linux 4.19.62

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2eb23bda9d5294a5c874fe4f403934fd99e84661
2019-07-28 08:43:04 +02:00
Peter Zijlstra
4a5cc64d8a perf/core: Fix race between close() and fork()
commit 1cf8dfe8a6 upstream.

Syzcaller reported the following Use-after-Free bug:

	close()						clone()

							  copy_process()
							    perf_event_init_task()
							      perf_event_init_context()
							        mutex_lock(parent_ctx->mutex)
								inherit_task_group()
								  inherit_group()
								    inherit_event()
								      mutex_lock(event->child_mutex)
								      // expose event on child list
								      list_add_tail()
								      mutex_unlock(event->child_mutex)
							        mutex_unlock(parent_ctx->mutex)

							    ...
							    goto bad_fork_*

							  bad_fork_cleanup_perf:
							    perf_event_free_task()

	  perf_release()
	    perf_event_release_kernel()
	      list_for_each_entry()
		mutex_lock(ctx->mutex)
		mutex_lock(event->child_mutex)
		// event is from the failing inherit
		// on the other CPU
		perf_remove_from_context()
		list_move()
		mutex_unlock(event->child_mutex)
		mutex_unlock(ctx->mutex)

							      mutex_lock(ctx->mutex)
							      list_for_each_entry_safe()
							        // event already stolen
							      mutex_unlock(ctx->mutex)

							    delayed_free_task()
							      free_task()

	     list_for_each_entry_safe()
	       list_del()
	       free_event()
	         _free_event()
		   // and so event->hw.target
		   // is the already freed failed clone()
		   if (event->hw.target)
		     put_task_struct(event->hw.target)
		       // WHOOPSIE, already quite dead

Which puts the lie to the the comment on perf_event_free_task():
'unexposed, unused context' not so much.

Which is a 'fun' confluence of fail; copy_process() doing an
unconditional free_task() and not respecting refcounts, and perf having
creative locking. In particular:

  82d94856fa ("perf/core: Fix lock inversion between perf,trace,cpuhp")

seems to have overlooked this 'fun' parade.

Solve it by using the fact that detached events still have a reference
count on their (previous) context. With this perf_event_free_task()
can detect when events have escaped and wait for their destruction.

Debugged-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 82d94856fa ("perf/core: Fix lock inversion between perf,trace,cpuhp")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28 08:29:28 +02:00
Alexander Shishkin
75100ec5f0 perf/core: Fix exclusive events' grouping
commit 8a58ddae23 upstream.

So far, we tried to disallow grouping exclusive events for the fear of
complications they would cause with moving between contexts. Specifically,
moving a software group to a hardware context would violate the exclusivity
rules if both groups contain matching exclusive events.

This attempt was, however, unsuccessful: the check that we have in the
perf_event_open() syscall is both wrong (looks at wrong PMU) and
insufficient (group leader may still be exclusive), as can be illustrated
by running:

  $ perf record -e '{intel_pt//,cycles}' uname
  $ perf record -e '{cycles,intel_pt//}' uname

ultimately successfully.

Furthermore, we are completely free to trigger the exclusivity violation
by:

   perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}'

even though the helpful perf record will not allow that, the ABI will.

The warning later in the perf_event_open() path will also not trigger, because
it's also wrong.

Fix all this by validating the original group before moving, getting rid
of broken safeguards and placing a useful one to perf_install_in_context().

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: mathieu.poirier@linaro.org
Cc: will.deacon@arm.com
Fixes: bed5b25ad9 ("perf: Add a pmu capability for "exclusive" events")
Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28 08:29:28 +02:00
Greg Kroah-Hartman
71ce27c31a This is the 4.19.61 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl06qFcACgkQONu9yGCS
 aT6O9A/+JZqoVYnItpOnT8Hu//0mYEKvREWqsoTJNpZJhLWtGjPTT9ospHNpVgfC
 GUkFqngWzXHpzCgTYHUV3Mm+SIiVXCM3nkCU1+2YOsPzrKo/lJSfFt3wOYGpKO5V
 qratAQLra5TqR0teR00aQblqKqfmrux05uL9dNcVIwve813m00jFALcpjrXnanpP
 tx5cqCo3uHOou5XLraHx/CMPnfJI/mLegBUTM4DxAmN2vG4gQck2gnrU7s1eg4cy
 1Fqh0Oo2Ycj5p9yoGss02JqR3wGZHOEmF55j2JcTZAPvW6/c55iPd52Trn8kPOHB
 Awq/VwJmP4p10a4TWoZpv7VqpL3PzO8/AW7QWOER8QnDzfOTHGae7YT8LVp5Xqj5
 1NqowuP/Tm0yaZSaDLqkdvhVqTi0oGL8OCYLErpeR9PQ3P+p3paaswopsPqnXURj
 Q4Pahe1vm9WG2NpKh2bHVmmVkQmvwuxxxnaa31HI/IyLd5bYFV1/LbEa/XrSK36W
 VJtO+0AjERO9uTVP/YDloDkQ4R3+3W+m520jYsgf1OwY7v/Kc6iLb7cDwci/ZWMy
 YSMm8hrO0nzuT0SI25TKLDvxjGbANKvxytzOQMOTb8NsIWwaoEKWh+4r9XkdUXNa
 +dx72I5J2Be+3hk+eaDNzCdEae5pgVTxBpwJbzI4RfnK1Doa4uE=
 =hJdd
 -----END PGP SIGNATURE-----

Merge 4.19.61 into android-4.19

Changes in 4.19.61
	MIPS: ath79: fix ar933x uart parity mode
	MIPS: fix build on non-linux hosts
	arm64/efi: Mark __efistub_stext_offset as an absolute symbol explicitly
	scsi: iscsi: set auth_protocol back to NULL if CHAP_A value is not supported
	dmaengine: imx-sdma: fix use-after-free on probe error path
	wil6210: fix potential out-of-bounds read
	ath10k: Do not send probe response template for mesh
	ath9k: Check for errors when reading SREV register
	ath6kl: add some bounds checking
	ath10k: add peer id check in ath10k_peer_find_by_id
	wil6210: fix spurious interrupts in 3-msi
	ath: DFS JP domain W56 fixed pulse type 3 RADAR detection
	regmap: debugfs: Fix memory leak in regmap_debugfs_init
	batman-adv: fix for leaked TVLV handler.
	media: dvb: usb: fix use after free in dvb_usb_device_exit
	media: spi: IR LED: add missing of table registration
	crypto: talitos - fix skcipher failure due to wrong output IV
	media: ov7740: avoid invalid framesize setting
	media: marvell-ccic: fix DMA s/g desc number calculation
	media: vpss: fix a potential NULL pointer dereference
	media: media_device_enum_links32: clean a reserved field
	net: stmmac: dwmac1000: Clear unused address entries
	net: stmmac: dwmac4/5: Clear unused address entries
	qed: Set the doorbell address correctly
	signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig
	af_key: fix leaks in key_pol_get_resp and dump_sp.
	xfrm: Fix xfrm sel prefix length validation
	fscrypt: clean up some BUG_ON()s in block encryption/decryption
	perf annotate TUI browser: Do not use member from variable within its own initialization
	media: mc-device.c: don't memset __user pointer contents
	media: saa7164: fix remove_proc_entry warning
	media: staging: media: davinci_vpfe: - Fix for memory leak if decoder initialization fails.
	net: phy: Check against net_device being NULL
	crypto: talitos - properly handle split ICV.
	crypto: talitos - Align SEC1 accesses to 32 bits boundaries.
	tua6100: Avoid build warnings.
	batman-adv: Fix duplicated OGMs on NETDEV_UP
	locking/lockdep: Fix merging of hlocks with non-zero references
	media: wl128x: Fix some error handling in fm_v4l2_init_video_device()
	net: hns3: set ops to null when unregister ad_dev
	cpupower : frequency-set -r option misses the last cpu in related cpu list
	arm64: mm: make CONFIG_ZONE_DMA32 configurable
	perf jvmti: Address gcc string overflow warning for strncpy()
	net: stmmac: dwmac4: fix flow control issue
	net: stmmac: modify default value of tx-frames
	crypto: inside-secure - do not rely on the hardware last bit for result descriptors
	net: fec: Do not use netdev messages too early
	net: axienet: Fix race condition causing TX hang
	s390/qdio: handle PENDING state for QEBSM devices
	RAS/CEC: Fix pfn insertion
	net: sfp: add mutex to prevent concurrent state checks
	ipset: Fix memory accounting for hash types on resize
	perf cs-etm: Properly set the value of 'old' and 'head' in snapshot mode
	perf test 6: Fix missing kvm module load for s390
	perf report: Fix OOM error in TUI mode on s390
	irqchip/meson-gpio: Add support for Meson-G12A SoC
	media: uvcvideo: Fix access to uninitialized fields on probe error
	media: fdp1: Support M3N and E3 platforms
	iommu: Fix a leak in iommu_insert_resv_region
	gpio: omap: fix lack of irqstatus_raw0 for OMAP4
	gpio: omap: ensure irq is enabled before wakeup
	regmap: fix bulk writes on paged registers
	bpf: silence warning messages in core
	media: s5p-mfc: fix reading min scratch buffer size on MFC v6/v7
	selinux: fix empty write to keycreate file
	x86/cpu: Add Ice Lake NNPI to Intel family
	ASoC: meson: axg-tdm: fix sample clock inversion
	rcu: Force inlining of rcu_read_lock()
	x86/cpufeatures: Add FDP_EXCPTN_ONLY and ZERO_FCS_FDS
	qed: iWARP - Fix tc for MPA ll2 connection
	net: hns3: fix for skb leak when doing selftest
	block: null_blk: fix race condition for null_del_dev
	blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration
	xfrm: fix sa selector validation
	sched/core: Add __sched tag for io_schedule()
	sched/fair: Fix "runnable_avg_yN_inv" not used warnings
	perf/x86/intel/uncore: Handle invalid event coding for free-running counter
	x86/atomic: Fix smp_mb__{before,after}_atomic()
	perf evsel: Make perf_evsel__name() accept a NULL argument
	vhost_net: disable zerocopy by default
	ipoib: correcly show a VF hardware address
	x86/cacheinfo: Fix a -Wtype-limits warning
	blk-iolatency: only account submitted bios
	ACPICA: Clear status of GPEs on first direct enable
	EDAC/sysfs: Fix memory leak when creating a csrow object
	nvme: fix possible io failures when removing multipathed ns
	nvme-pci: properly report state change failure in nvme_reset_work
	nvme-pci: set the errno on ctrl state change error
	lightnvm: pblk: fix freeing of merged pages
	arm64: Do not enable IRQs for ct_user_exit
	ipsec: select crypto ciphers for xfrm_algo
	ipvs: defer hook registration to avoid leaks
	media: s5p-mfc: Make additional clocks optional
	media: i2c: fix warning same module names
	ntp: Limit TAI-UTC offset
	timer_list: Guard procfs specific code
	acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
	media: coda: fix mpeg2 sequence number handling
	media: coda: fix last buffer handling in V4L2_ENC_CMD_STOP
	media: coda: increment sequence offset for the last returned frame
	media: vimc: cap: check v4l2_fill_pixfmt return value
	media: hdpvr: fix locking and a missing msleep
	net: stmmac: sun8i: force select external PHY when no internal one
	rtlwifi: rtl8192cu: fix error handle when usb probe failed
	mt7601u: do not schedule rx_tasklet when the device has been disconnected
	x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c
	mt7601u: fix possible memory leak when the device is disconnected
	ipvs: fix tinfo memory leak in start_sync_thread
	ath10k: add missing error handling
	ath10k: fix PCIE device wake up failed
	perf tools: Increase MAX_NR_CPUS and MAX_CACHES
	ASoC: Intel: hdac_hdmi: Set ops to NULL on remove
	libata: don't request sense data on !ZAC ATA devices
	clocksource/drivers/exynos_mct: Increase priority over ARM arch timer
	xsk: Properly terminate assignment in xskq_produce_flush_desc
	rslib: Fix decoding of shortened codes
	rslib: Fix handling of of caller provided syndrome
	ixgbe: Check DDM existence in transceiver before access
	crypto: serpent - mark __serpent_setkey_sbox noinline
	crypto: asymmetric_keys - select CRYPTO_HASH where needed
	wil6210: drop old event after wmi_call timeout
	EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec
	bcache: check CACHE_SET_IO_DISABLE in allocator code
	bcache: check CACHE_SET_IO_DISABLE bit in bch_journal()
	bcache: acquire bch_register_lock later in cached_dev_free()
	bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush()
	bcache: fix potential deadlock in cached_def_free()
	net: hns3: fix a -Wformat-nonliteral compile warning
	net: hns3: add some error checking in hclge_tm module
	ath10k: destroy sdio workqueue while remove sdio module
	net: mvpp2: prs: Don't override the sign bit in SRAM parser shift
	igb: clear out skb->tstamp after reading the txtime
	iwlwifi: mvm: Drop large non sta frames
	bpf: fix uapi bpf_prog_info fields alignment
	perf stat: Make metric event lookup more robust
	perf stat: Fix group lookup for metric group
	bnx2x: Prevent ptp_task to be rescheduled indefinitely
	net: usb: asix: init MAC address buffers
	rxrpc: Fix oops in tracepoint
	bpf, libbpf, smatch: Fix potential NULL pointer dereference
	selftests: bpf: fix inlines in test_lwt_seg6local
	bonding: validate ip header before check IPPROTO_IGMP
	gpiolib: Fix references to gpiod_[gs]et_*value_cansleep() variants
	tools: bpftool: Fix json dump crash on powerpc
	Bluetooth: hci_bcsp: Fix memory leak in rx_skb
	Bluetooth: Add new 13d3:3491 QCA_ROME device
	Bluetooth: Add new 13d3:3501 QCA_ROME device
	Bluetooth: 6lowpan: search for destination address in all peers
	perf tests: Fix record+probe_libc_inet_pton.sh for powerpc64
	Bluetooth: Check state in l2cap_disconnect_rsp
	gtp: add missing gtp_encap_disable_sock() in gtp_encap_enable()
	Bluetooth: validate BLE connection interval updates
	gtp: fix suspicious RCU usage
	gtp: fix Illegal context switch in RCU read-side critical section.
	gtp: fix use-after-free in gtp_encap_destroy()
	gtp: fix use-after-free in gtp_newlink()
	net: mvmdio: defer probe of orion-mdio if a clock is not ready
	iavf: fix dereference of null rx_buffer pointer
	floppy: fix div-by-zero in setup_format_params
	floppy: fix out-of-bounds read in next_valid_format
	floppy: fix invalid pointer dereference in drive_name
	floppy: fix out-of-bounds read in copy_buffer
	xen: let alloc_xenballooned_pages() fail if not enough memory free
	scsi: NCR5380: Reduce goto statements in NCR5380_select()
	scsi: NCR5380: Always re-enable reselection interrupt
	Revert "scsi: ncr5380: Increase register polling limit"
	scsi: core: Fix race on creating sense cache
	scsi: megaraid_sas: Fix calculation of target ID
	scsi: mac_scsi: Increase PIO/PDMA transfer length threshold
	scsi: mac_scsi: Fix pseudo DMA implementation, take 2
	crypto: ghash - fix unaligned memory access in ghash_setkey()
	crypto: ccp - Validate the the error value used to index error messages
	crypto: arm64/sha1-ce - correct digest for empty data in finup
	crypto: arm64/sha2-ce - correct digest for empty data in finup
	crypto: chacha20poly1305 - fix atomic sleep when using async algorithm
	crypto: crypto4xx - fix AES CTR blocksize value
	crypto: crypto4xx - fix blocksize for cfb and ofb
	crypto: crypto4xx - block ciphers should only accept complete blocks
	crypto: ccp - memset structure fields to zero before reuse
	crypto: ccp/gcm - use const time tag comparison.
	crypto: crypto4xx - fix a potential double free in ppc4xx_trng_probe
	Revert "bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()"
	bcache: Revert "bcache: fix high CPU occupancy during journal"
	bcache: Revert "bcache: free heap cache_set->flush_btree in bch_journal_free"
	bcache: ignore read-ahead request failure on backing device
	bcache: fix mistaken sysfs entry for io_error counter
	bcache: destroy dc->writeback_write_wq if failed to create dc->writeback_thread
	Input: gtco - bounds check collection indent level
	Input: alps - don't handle ALPS cs19 trackpoint-only device
	Input: synaptics - whitelist Lenovo T580 SMBus intertouch
	Input: alps - fix a mismatch between a condition check and its comment
	regulator: s2mps11: Fix buck7 and buck8 wrong voltages
	arm64: tegra: Update Jetson TX1 GPU regulator timings
	iwlwifi: pcie: don't service an interrupt that was masked
	iwlwifi: pcie: fix ALIVE interrupt handling for gen2 devices w/o MSI-X
	iwlwifi: don't WARN when calling iwl_get_shared_mem_conf with RF-Kill
	iwlwifi: fix RF-Kill interrupt while FW load for gen2 devices
	NFSv4: Handle the special Linux file open access mode
	pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error
	pNFS: Fix a typo in pnfs_update_layout
	pnfs: Fix a problem where we gratuitously start doing I/O through the MDS
	lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE
	ASoC: dapm: Adapt for debugfs API change
	raid5-cache: Need to do start() part job after adding journal device
	ALSA: seq: Break too long mutex context in the write loop
	ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell platform
	ALSA: hda/realtek: apply ALC891 headset fixup to one Dell machine
	media: v4l2: Test type instead of cfg->type in v4l2_ctrl_new_custom()
	media: coda: Remove unbalanced and unneeded mutex unlock
	media: videobuf2-core: Prevent size alignment wrapping buffer size to 0
	media: videobuf2-dma-sg: Prevent size from overflowing
	KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
	arm64: tegra: Fix AGIC register range
	fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.
	kconfig: fix missing choice values in auto.conf
	drm/nouveau/i2c: Enable i2c pads & busses during preinit
	padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
	dm zoned: fix zone state management race
	xen/events: fix binding user event channels to cpus
	9p/xen: Add cleanup path in p9_trans_xen_init
	9p/virtio: Add cleanup path in p9_virtio_init
	x86/boot: Fix memory leak in default_get_smp_config()
	perf/x86/intel: Fix spurious NMI on fixed counter
	perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCs
	perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs
	drm/edid: parse CEA blocks embedded in DisplayID
	intel_th: pci: Add Ice Lake NNPI support
	PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
	PCI: Do not poll for PME if the device is in D3cold
	PCI: qcom: Ensure that PERST is asserted for at least 100 ms
	Btrfs: fix data loss after inode eviction, renaming it, and fsync it
	Btrfs: fix fsync not persisting dentry deletions due to inode evictions
	Btrfs: add missing inode version, ctime and mtime updates when punching hole
	IB/mlx5: Report correctly tag matching rendezvous capability
	HID: wacom: generic: only switch the mode on devices with LEDs
	HID: wacom: generic: Correct pad syncing
	HID: wacom: correct touch resolution x/y typo
	libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
	coda: pass the host file in vma->vm_file on mmap
	include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
	xfs: fix pagecache truncation prior to reflink
	xfs: flush removing page cache in xfs_reflink_remap_prep
	xfs: don't overflow xattr listent buffer
	xfs: rename m_inotbt_nores to m_finobt_nores
	xfs: don't ever put nlink > 0 inodes on the unlinked list
	xfs: reserve blocks for ifree transaction during log recovery
	xfs: fix reporting supported extra file attributes for statx()
	xfs: serialize unaligned dio writes against all other dio writes
	xfs: abort unaligned nowait directio early
	gpu: ipu-v3: ipu-ic: Fix saturation bit offset in TPMEM
	crypto: caam - limit output IV to CBC to work around CTR mode DMA issue
	parisc: Ensure userspace privilege for ptraced processes in regset functions
	parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1
	powerpc/32s: fix suspend/resume when IBATs 4-7 are used
	powerpc/watchpoint: Restore NV GPRs while returning from exception
	powerpc/powernv/npu: Fix reference leak
	powerpc/pseries: Fix oops in hotplug memory notifier
	mmc: sdhci-msm: fix mutex while in spinlock
	eCryptfs: fix a couple type promotion bugs
	mtd: rawnand: mtk: Correct low level time calculation of r/w cycle
	mtd: spinand: read returns badly if the last page has bitflips
	intel_th: msu: Fix single mode with disabled IOMMU
	Bluetooth: Add SMP workaround Microsoft Surface Precision Mouse bug
	usb: Handle USB3 remote wakeup for LPM enabled devices correctly
	blk-throttle: fix zero wait time for iops throttled group
	blk-iolatency: clear use_delay when io.latency is set to zero
	blkcg: update blkcg_print_stat() to handle larger outputs
	net: mvmdio: allow up to four clocks to be specified for orion-mdio
	dt-bindings: allow up to four clocks for orion-mdio
	dm bufio: fix deadlock with loop device
	Linux 4.19.61

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2f565111b1c16f369fa86e0481527fcc6357fe1b
2019-07-26 10:31:53 +02:00
Daniel Jordan
1e4247d795 padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
commit cf144f81a9 upstream.

Testing padata with the tcrypt module on a 5.2 kernel...

    # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
    # modprobe tcrypt mode=211 sec=1

...produces this splat:

    INFO: task modprobe:10075 blocked for more than 120 seconds.
          Not tainted 5.2.0-base+ #16
    modprobe        D    0 10075  10064 0x80004080
    Call Trace:
     ? __schedule+0x4dd/0x610
     ? ring_buffer_unlock_commit+0x23/0x100
     schedule+0x6c/0x90
     schedule_timeout+0x3b/0x320
     ? trace_buffer_unlock_commit_regs+0x4f/0x1f0
     wait_for_common+0x160/0x1a0
     ? wake_up_q+0x80/0x80
     { crypto_wait_req }             # entries in braces added by hand
     { do_one_aead_op }
     { test_aead_jiffies }
     test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
     do_test+0x4053/0x6a2b [tcrypt]
     ? 0xffffffffa00f4000
     tcrypt_mod_init+0x50/0x1000 [tcrypt]
     ...

The second modprobe command never finishes because in padata_reorder,
CPU0's load of reorder_objects is executed before the unlocking store in
spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:

CPU0                                 CPU1

padata_reorder                       padata_do_serial
  LOAD reorder_objects  // 0
                                       INC reorder_objects  // 1
                                       padata_reorder
                                         TRYLOCK pd->lock   // failed
  UNLOCK pd->lock

CPU0 deletes the timer before returning from padata_reorder and since no
other job is submitted to padata, modprobe waits indefinitely.

Add a pair of full barriers to guarantee proper ordering:

CPU0                                 CPU1

padata_reorder                       padata_do_serial
  UNLOCK pd->lock
  smp_mb()
  LOAD reorder_objects
                                       INC reorder_objects
                                       smp_mb__after_atomic()
                                       padata_reorder
                                         TRYLOCK pd->lock

smp_mb__after_atomic is needed so the read part of the trylock operation
comes after the INC, as Andrea points out.   Thanks also to Andrea for
help with writing a litmus test.

Fixes: 16295bec63 ("padata: Generic parallelization/serialization interface")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: <stable@vger.kernel.org>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26 09:14:25 +02:00
Nathan Huckleberry
b9f547b7bd timer_list: Guard procfs specific code
[ Upstream commit a9314773a9 ]

With CONFIG_PROC_FS=n the following warning is emitted:

kernel/time/timer_list.c:361:36: warning: unused variable
'timer_list_sops' [-Wunused-const-variable]
   static const struct seq_operations timer_list_sops = {

Add #ifdef guard around procfs specific code.

Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: john.stultz@linaro.org
Cc: sboyd@kernel.org
Cc: clang-built-linux@googlegroups.com
Link: https://github.com/ClangBuiltLinux/linux/issues/534
Link: https://lkml.kernel.org/r/20190614181604.112297-1-nhuck@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26 09:14:10 +02:00
Miroslav Lichvar
d86c0b73f7 ntp: Limit TAI-UTC offset
[ Upstream commit d897a4ab11 ]

Don't allow the TAI-UTC offset of the system clock to be set by adjtimex()
to a value larger than 100000 seconds.

This prevents an overflow in the conversion to int, prevents the CLOCK_TAI
clock from getting too far ahead of the CLOCK_REALTIME clock, and it is
still large enough to allow leap seconds to be inserted at the maximum rate
currently supported by the kernel (once per day) for the next ~270 years,
however unlikely it is that someone can survive a catastrophic event which
slowed down the rotation of the Earth so much.

Reported-by: Weikang shi <swkhack@gmail.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20190618154713.20929-1-mlichvar@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26 09:14:10 +02:00
Qian Cai
7fc96cd2b0 sched/fair: Fix "runnable_avg_yN_inv" not used warnings
[ Upstream commit 509466b7d4 ]

runnable_avg_yN_inv[] is only used in kernel/sched/pelt.c but was
included in several other places because they need other macros all
came from kernel/sched/sched-pelt.h which was generated by
Documentation/scheduler/sched-pelt. As the result, it causes compilation
a lot of warnings,

  kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
  kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
  kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
  ...

Silence it by appending the __maybe_unused attribute for it, so all
generated variables and macros can still be kept in the same file.

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1559596304-31581-1-git-send-email-cai@lca.pw
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26 09:14:08 +02:00
Gao Xiang
d8b7db6c50 sched/core: Add __sched tag for io_schedule()
[ Upstream commit e3b929b0a1 ]

Non-inline io_schedule() was introduced in:

  commit 10ab56434f ("sched/core: Separate out io_schedule_prepare() and io_schedule_finish()")

Keep in line with io_schedule_timeout(), otherwise "/proc/<pid>/wchan" will
report io_schedule() rather than its callers when waiting for IO.

Reported-by: Jilong Kou <koujilong@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 10ab56434f ("sched/core: Separate out io_schedule_prepare() and io_schedule_finish()")
Link: https://lkml.kernel.org/r/20190603091338.2695-1-gaoxiang25@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26 09:14:08 +02:00
Valdis Klētnieks
7c10f8941b bpf: silence warning messages in core
[ Upstream commit aee450cbe4 ]

Compiling kernel/bpf/core.c with W=1 causes a flood of warnings:

kernel/bpf/core.c:1198:65: warning: initialized field overwritten [-Woverride-init]
 1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
      |                                                                 ^~~~
kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
 1087 |  INSN_3(ALU, ADD,  X),   \
      |  ^~~~~~
kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
 1202 |   BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
      |   ^~~~~~~~~~~~
kernel/bpf/core.c:1198:65: note: (near initialization for 'public_insntable[12]')
 1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
      |                                                                 ^~~~
kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
 1087 |  INSN_3(ALU, ADD,  X),   \
      |  ^~~~~~
kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
 1202 |   BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
      |   ^~~~~~~~~~~~

98 copies of the above.

The attached patch silences the warnings, because we *know* we're overwriting
the default initializer. That leaves bpf/core.c with only 6 other warnings,
which become more visible in comparison.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26 09:14:06 +02:00
Imre Deak
2fbde27465 locking/lockdep: Fix merging of hlocks with non-zero references
[ Upstream commit d9349850e1 ]

The sequence

	static DEFINE_WW_CLASS(test_ww_class);

	struct ww_acquire_ctx ww_ctx;
	struct ww_mutex ww_lock_a;
	struct ww_mutex ww_lock_b;
	struct ww_mutex ww_lock_c;
	struct mutex lock_c;

	ww_acquire_init(&ww_ctx, &test_ww_class);

	ww_mutex_init(&ww_lock_a, &test_ww_class);
	ww_mutex_init(&ww_lock_b, &test_ww_class);
	ww_mutex_init(&ww_lock_c, &test_ww_class);

	mutex_init(&lock_c);

	ww_mutex_lock(&ww_lock_a, &ww_ctx);

	mutex_lock(&lock_c);

	ww_mutex_lock(&ww_lock_b, &ww_ctx);
	ww_mutex_lock(&ww_lock_c, &ww_ctx);

	mutex_unlock(&lock_c);	(*)

	ww_mutex_unlock(&ww_lock_c);
	ww_mutex_unlock(&ww_lock_b);
	ww_mutex_unlock(&ww_lock_a);

	ww_acquire_fini(&ww_ctx); (**)

will trigger the following error in __lock_release() when calling
mutex_release() at **:

	DEBUG_LOCKS_WARN_ON(depth <= 0)

The problem is that the hlock merging happening at * updates the
references for test_ww_class incorrectly to 3 whereas it should've
updated it to 4 (representing all the instances for ww_ctx and
ww_lock_[abc]).

Fix this by updating the references during merging correctly taking into
account that we can have non-zero references (both for the hlock that we
merge into another hlock or for the hlock we are merging into).

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190524201509.9199-2-imre.deak@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26 09:14:03 +02:00
Eric W. Biederman
b397462a01 signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig
[ Upstream commit f9070dc945 ]

The locking in force_sig_info is not prepared to deal with a task that
exits or execs (as sighand may change).  The is not a locking problem
in force_sig as force_sig is only built to handle synchronous
exceptions.

Further the function force_sig_info changes the signal state if the
signal is ignored, or blocked or if SIGNAL_UNKILLABLE will prevent the
delivery of the signal.  The signal SIGKILL can not be ignored and can
not be blocked and SIGNAL_UNKILLABLE won't prevent it from being
delivered.

So using force_sig rather than send_sig for SIGKILL is confusing
and pointless.

Because it won't impact the sending of the signal and and because
using force_sig is wrong, replace force_sig with send_sig.

Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Fixes: cf3f89214e ("pidns: add reboot_pid_ns() to handle the reboot syscall")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26 09:14:01 +02:00
Sami Tolvanen
9a8a0e6fd9 ANDROID: kernel/Makefile: do not disable LTO for sys_ni.c with CFI
sys_ni.c compiles fine with CONFIG_LTO_CLANG, and disabling LTO
for it breaks indirect call checking to functions in this file.

Bug: 138254717
Change-Id: I7947cf3d0283ad37431860739665fee7fb0dfbdb
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2019-07-25 09:36:26 -07:00
Greg Kroah-Hartman
bafa20fa20 This is the 4.19.60 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl00DjYACgkQONu9yGCS
 aT7O6hAAqqs1jm+vztbAJTyZPR+Vu7yGO1BukoyoqA3iUm7JPW0/Xamp+e/nOjq3
 UrRKcn6WvIdDv22ikrR1qfFTFZYYCZfe4LWvzuUNsscr0dixW6iYoiSr5RDypH0C
 VIYZfEMaZ5G1R07jO7u8HWXAjAm+xqvxZRgARu9H0tk9As1+yW1kYFnQubdpIyoA
 3zsTTQ+Dsyzc5mQQXBi88VnNpnI2PZGDAyaYmqfe7iuiIZ6qvjYZ245GygVb5Qlo
 9yGKuxqRc7Lrd34f6t/0w2CwZuj8lbpt7twcdLXOjg/EjcouwBnX5smoq8oo5UIK
 kNSRsV0pfxhLt7EXViuRFduJIinViaYJY7guzWon3O9HXjO6OlUIhM65WRvwuxhz
 NuM1ctOfDqiyDzJ0NEp7tROsmkV3Un/DFHrePWGvcQ25lFxJMLtXUQDf/39cNkP2
 iiWDSDOAXzgskfzpxmfRYyXO2/u2cjnmdUil27+/B54vYYM4XemBn07uc6zJZhJ/
 spq2Hg/i/7gaAaoqRgoHvYLajlUytvetJMhdAZYhEpHL2/1gxE6SXI9LypV3096a
 FgdEfAghf0yY6FzaOXVb1PlqgbkigWtf8vo7Wmr25mNrg01678UTqGi2soCMhLXz
 OAGtOvPKcmD6wTY3gZlEzzVxoX0eCXUUVgK6TZFsMbmJb3+Y9yA=
 =Uqvz
 -----END PGP SIGNATURE-----

Merge 4.19.60 into android-4.19

Changes in 4.19.60
	Revert "e1000e: fix cyclic resets at link up with active tx"
	e1000e: start network tx queue only when link is up
	Input: synaptics - enable SMBUS on T480 thinkpad trackpad
	nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header
	drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
	firmware: improve LSM/IMA security behaviour
	irqchip/gic-v3-its: Fix command queue pointer comparison bug
	clk: ti: clkctrl: Fix returning uninitialized data
	efi/bgrt: Drop BGRT status field reserved bits check
	perf/core: Fix perf_sample_regs_user() mm check
	ARM: dts: gemini Fix up DNS-313 compatible string
	ARM: omap2: remove incorrect __init annotation
	afs: Fix uninitialised spinlock afs_volume::cb_break_lock
	x86/apic: Fix integer overflow on 10 bit left shift of cpu_khz
	be2net: fix link failure after ethtool offline test
	ppp: mppe: Add softdep to arc4
	sis900: fix TX completion
	ARM: dts: imx6ul: fix PWM[1-4] interrupts
	pinctrl: mcp23s08: Fix add_data and irqchip_add_nested call order
	dm table: don't copy from a NULL pointer in realloc_argv()
	dm verity: use message limit for data block corruption message
	x86/boot/64: Fix crash if kernel image crosses page table boundary
	x86/boot/64: Add missing fixup_pointer() for next_early_pgt access
	HID: chicony: add another quirk for PixArt mouse
	HID: multitouch: Add pointstick support for ALPS Touchpad
	pinctrl: mediatek: Ignore interrupts that are wake only during resume
	cpu/hotplug: Fix out-of-bounds read when setting fail state
	pinctrl: mediatek: Update cur_mask in mask/mask ops
	linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL
	genirq: Delay deactivation in free_irq()
	genirq: Fix misleading synchronize_irq() documentation
	genirq: Add optional hardware synchronization for shutdown
	x86/ioapic: Implement irq_get_irqchip_state() callback
	x86/irq: Handle spurious interrupt after shutdown gracefully
	x86/irq: Seperate unused system vectors from spurious entry again
	ARC: hide unused function unw_hdr_alloc
	s390: fix stfle zero padding
	s390/qdio: (re-)initialize tiqdio list entries
	s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
	crypto: talitos - move struct talitos_edesc into talitos.h
	crypto: talitos - fix hash on SEC1.
	crypto/NX: Set receive window credits to max number of CRBs in RxFIFO
	regmap-irq: do not write mask register if mask_base is zero
	drm/udl: introduce a macro to convert dev to udl.
	drm/udl: Replace drm_dev_unref with drm_dev_put
	drm/udl: move to embedding drm device inside udl device.
	x86/entry/32: Fix ENDPROC of common_spurious
	Linux 4.19.60

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I283306f8640e06b3ffe8bcdca1478a0fd3af77db
2019-07-22 14:36:16 +02:00
Thomas Gleixner
6074f6043c genirq: Add optional hardware synchronization for shutdown
commit 62e0468650 upstream

free_irq() ensures that no hardware interrupt handler is executing on a
different CPU before actually releasing resources and deactivating the
interrupt completely in a domain hierarchy.

But that does not catch the case where the interrupt is on flight at the
hardware level but not yet serviced by the target CPU. That creates an
interesing race condition:

   CPU 0                  CPU 1               IRQ CHIP

                                              interrupt is raised
                                              sent to CPU1
			  Unable to handle
			  immediately
			  (interrupts off,
			   deep idle delay)
   mask()
   ...
   free()
     shutdown()
     synchronize_irq()
     release_resources()
                          do_IRQ()
                            -> resources are not available

That might be harmless and just trigger a spurious interrupt warning, but
some interrupt chips might get into a wedged state.

Utilize the existing irq_get_irqchip_state() callback for the
synchronization in free_irq().

synchronize_hardirq() is not using this mechanism as it might actually
deadlock unter certain conditions, e.g. when called with interrupts
disabled and the target CPU is the one on which the synchronization is
invoked. synchronize_irq() uses it because that function cannot be called
from non preemtible contexts as it might sleep.

No functional change intended and according to Marc the existing GIC
implementations where the driver supports the callback should be able
to cope with that core change. Famous last words.

Fixes: 464d12309e ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.279463375@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21 09:03:13 +02:00
Thomas Gleixner
3f10ccc297 genirq: Fix misleading synchronize_irq() documentation
commit 1d21f2af85 upstream

The function might sleep, so it cannot be called from interrupt
context. Not even with care.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.189241552@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21 09:03:12 +02:00
Thomas Gleixner
578db1aa59 genirq: Delay deactivation in free_irq()
commit 4001d8e876 upstream

When interrupts are shutdown, they are immediately deactivated in the
irqdomain hierarchy. While this looks obviously correct there is a subtle
issue:

There might be an interrupt in flight when free_irq() is invoking the
shutdown. This is properly handled at the irq descriptor / primary handler
level, but the deactivation might completely disable resources which are
required to acknowledge the interrupt.

Split the shutdown code and deactivate the interrupt after synchronization
in free_irq(). Fixup all other usage sites where this is not an issue to
invoke the combined shutdown_and_deactivate() function instead.

This still might be an issue if the interrupt in flight servicing is
delayed on a remote CPU beyond the invocation of synchronize_irq(), but
that cannot be handled at that level and needs to be handled in the
synchronize_irq() context.

Fixes: f8264e3496 ("irqdomain: Introduce new interfaces to support hierarchy irqdomains")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.098196390@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21 09:03:12 +02:00
Eiichi Tsukata
f6e01328cb cpu/hotplug: Fix out-of-bounds read when setting fail state
[ Upstream commit 33d4a5a7a5 ]

Setting invalid value to /sys/devices/system/cpu/cpuX/hotplug/fail
can control `struct cpuhp_step *sp` address, results in the following
global-out-of-bounds read.

Reproducer:

  # echo -2 > /sys/devices/system/cpu/cpu0/hotplug/fail

KASAN report:

  BUG: KASAN: global-out-of-bounds in write_cpuhp_fail+0x2cd/0x2e0
  Read of size 8 at addr ffffffff89734438 by task bash/1941

  CPU: 0 PID: 1941 Comm: bash Not tainted 5.2.0-rc6+ #31
  Call Trace:
   write_cpuhp_fail+0x2cd/0x2e0
   dev_attr_store+0x58/0x80
   sysfs_kf_write+0x13d/0x1a0
   kernfs_fop_write+0x2bc/0x460
   vfs_write+0x1e1/0x560
   ksys_write+0x126/0x250
   do_syscall_64+0xc1/0x390
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7f05e4f4c970

  The buggy address belongs to the variable:
   cpu_hotplug_lock+0x98/0xa0

  Memory state around the buggy address:
   ffffffff89734300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
   ffffffff89734380: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
  >ffffffff89734400: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
                                          ^
   ffffffff89734480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   ffffffff89734500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Add a sanity check for the value written from user space.

Fixes: 1db49484f2 ("smp/hotplug: Hotplug state fail injection")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Link: https://lkml.kernel.org/r/20190627024732.31672-1-devel@etsukata.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-21 09:03:11 +02:00
Peter Zijlstra
afda29dc5a perf/core: Fix perf_sample_regs_user() mm check
[ Upstream commit 085ebfe937 ]

perf_sample_regs_user() uses 'current->mm' to test for the presence of
userspace, but this is insufficient, consider use_mm().

A better test is: '!(current->flags & PF_KTHREAD)', exec() clears
PF_KTHREAD after it sets the new ->mm but before it drops to userspace
for the first time.

Possibly obsoletes: bf05fc25f2 ("powerpc/perf: Fix oops when kthread execs user process")

Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reported-by: Young Xiao <92siuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4018994f3d ("perf: Add ability to attach user level registers dump to sample")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-21 09:03:05 +02:00
Greg Kroah-Hartman
0f653d9aa3 This is the 4.19.59 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0qx4sACgkQONu9yGCS
 aT7Wzw/+Ixgza5VeJICnFgLZ80bYEQP5fDDcTD8psGi8fg/yKpUcHM0tv2Fi/ScQ
 dKNKN1zrWtn8e5bC8HE7V5rVFH3iT9gJXL4tebmFg9IOaBoce9wSaDMaptnv4OEw
 Ikb8apdrO2cHRWFhyIj9f35d3WE2OWUA4QYhrL17rptyP+k0eBBdyo572qfnheuf
 4Yp4X6u8pnSR3fl4sgxzcfNLPXfrF8BMAKEx8/I1YyhUORpeJ/QxZkyFKNLMbUHm
 OWIHcw0O4Sfqtx9zWzwmpLk/aF8b98rCieJUDxYakVYD/iLsrdkkCx3IHlvMWdZF
 UtNVQbA26KIIFpXYe5gD1My+56grJaSCxAsO6M+c4PRCZ2BP+e6t+k3eASueadqs
 Ihq2qZyq1cMBQCeT1Sc3zQZgzwTE7lgzqQLVHiMmMukWv1Sx2xyio3GvN0i51gqz
 PCIxslzNhQnpmswCnDXgwaSp7W3YlT6+/zpQnzK1spZsfp8Ab/PkB41WyiPCWBtJ
 /Zx+lkdUd8HU8ZoKBoNMPWErX//MKa3NhKvakliPklVkSUfF12+4aB+Iil9H8vag
 ie4qmJrGvwg0t5PvRqRqy35fij/kcnJnFJJLlywkzRdTXlFUqqV+09N6hhS0BRgf
 YJibc8VptLWXgYRQoQD1J/xF87bcmB7HBnC4jBpdDzCkbTEHoI8=
 =zCPG
 -----END PGP SIGNATURE-----

Merge 4.19.59 into android-4.19

Changes in 4.19.59
	crypto: talitos - rename alternative AEAD algos.
	soc: brcmstb: Fix error path for unsupported CPUs
	soc: bcm: brcmstb: biuctrl: Register writes require a barrier
	Input: elantech - enable middle button support on 2 ThinkPads
	samples, bpf: fix to change the buffer size for read()
	samples, bpf: suppress compiler warning
	mac80211: fix rate reporting inside cfg80211_calculate_bitrate_he()
	bpf: sockmap, fix use after free from sleep in psock backlog workqueue
	soundwire: stream: fix out of boundary access on port properties
	staging:iio:ad7150: fix threshold mode config bit
	mac80211: mesh: fix RCU warning
	mac80211: free peer keys before vif down in mesh
	mwifiex: Fix possible buffer overflows at parsing bss descriptor
	iwlwifi: Fix double-free problems in iwl_req_fw_callback()
	mwifiex: Fix heap overflow in mwifiex_uap_parse_tail_ies()
	soundwire: intel: set dai min and max channels correctly
	dt-bindings: can: mcp251x: add mcp25625 support
	can: mcp251x: add support for mcp25625
	can: m_can: implement errata "Needless activation of MRAF irq"
	can: af_can: Fix error path of can_init()
	net: phy: rename Asix Electronics PHY driver
	ibmvnic: Do not close unopened driver during reset
	ibmvnic: Refresh device multicast list after reset
	ibmvnic: Fix unchecked return codes of memory allocations
	ARM: dts: am335x phytec boards: Fix cd-gpios active level
	s390/boot: disable address-of-packed-member warning
	drm/vmwgfx: Honor the sg list segment size limitation
	drm/vmwgfx: fix a warning due to missing dma_parms
	riscv: Fix udelay in RV32.
	Input: imx_keypad - make sure keyboard can always wake up system
	KVM: arm/arm64: vgic: Fix kvm_device leak in vgic_its_destroy
	mlxsw: spectrum: Disallow prio-tagged packets when PVID is removed
	ARM: davinci: da850-evm: call regulator_has_full_constraints()
	ARM: davinci: da8xx: specify dma_coherent_mask for lcdc
	mac80211: only warn once on chanctx_conf being NULL
	mac80211: do not start any work during reconfigure flow
	bpf, devmap: Fix premature entry free on destroying map
	bpf, devmap: Add missing bulk queue free
	bpf, devmap: Add missing RCU read lock on flush
	bpf, x64: fix stack layout of JITed bpf code
	qmi_wwan: add support for QMAP padding in the RX path
	qmi_wwan: avoid RCU stalls on device disconnect when in QMAP mode
	qmi_wwan: extend permitted QMAP mux_id value range
	mmc: core: complete HS400 before checking status
	md: fix for divide error in status_resync
	bnx2x: Check if transceiver implements DDM before access
	drm: return -EFAULT if copy_to_user() fails
	ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
	net: lio_core: fix potential sign-extension overflow on large shift
	scsi: qedi: Check targetname while finding boot target information
	quota: fix a problem about transfer quota
	net: dsa: mv88e6xxx: fix shift of FID bits in mv88e6185_g1_vtu_loadpurge()
	NFS4: Only set creation opendata if O_CREAT
	net :sunrpc :clnt :Fix xps refcount imbalance on the error path
	fscrypt: don't set policy for a dead directory
	udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
	media: stv0297: fix frequency range limit
	ALSA: usb-audio: Fix parse of UAC2 Extension Units
	ALSA: hda/realtek - Headphone Mic can't record after S3
	block, bfq: NULL out the bic when it's no longer valid
	perf pmu: Fix uncore PMU alias list for ARM64
	x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
	x86/tls: Fix possible spectre-v1 in do_get_thread_area()
	Documentation: Add section about CPU vulnerabilities for Spectre
	Documentation/admin: Remove the vsyscall=native documentation
	mwifiex: Abort at too short BSS descriptor element
	mwifiex: Don't abort on small, spec-compliant vendor IEs
	USB: serial: ftdi_sio: add ID for isodebug v1
	USB: serial: option: add support for GosunCn ME3630 RNDIS mode
	Revert "serial: 8250: Don't service RX FIFO if interrupts are disabled"
	p54usb: Fix race between disconnect and firmware loading
	usb: gadget: ether: Fix race between gether_disconnect and rx_submit
	usb: dwc2: use a longer AHB idle timeout in dwc2_core_reset()
	usb: renesas_usbhs: add a workaround for a race condition of workqueue
	drivers/usb/typec/tps6598x.c: fix portinfo width
	drivers/usb/typec/tps6598x.c: fix 4CC cmd write
	staging: comedi: dt282x: fix a null pointer deref on interrupt
	staging: comedi: amplc_pci230: fix null pointer deref on interrupt
	HID: Add another Primax PIXART OEM mouse quirk
	lkdtm: support llvm-objcopy
	binder: fix memory leak in error path
	carl9170: fix misuse of device driver API
	VMCI: Fix integer overflow in VMCI handle arrays
	MIPS: Remove superfluous check for __linux__
	staging: fsl-dpaa2/ethsw: fix memory leak of switchdev_work
	staging: bcm2835-camera: Replace spinlock protecting context_map with mutex
	staging: bcm2835-camera: Ensure all buffers are returned on disable
	staging: bcm2835-camera: Remove check of the number of buffers supplied
	staging: bcm2835-camera: Handle empty EOS buffers whilst streaming
	staging: rtl8712: reduce stack usage, again
	Linux 4.19.59

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I650890ad9d984de0fc729677bd29506cd21338be
2019-07-14 08:44:38 +02:00
Toshiaki Makita
4c2ce7addd bpf, devmap: Add missing RCU read lock on flush
[ Upstream commit 86723c8640 ]

.ndo_xdp_xmit() assumes it is called under RCU. For example virtio_net
uses RCU to detect it has setup the resources for tx. The assumption
accidentally broke when introducing bulk queue in devmap.

Fixes: 5d053f9da4 ("bpf: devmap prepare xdp frames for bulking")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-14 08:11:11 +02:00