Commit graph

28,932 commits

Author SHA1 Message Date
Alexander Shishkin
bdc1e2cb66 UPSTREAM: perf/core: Fix the address filtering fix
(Upstream commit 52a44f83fc).

The following recent commit:

c60f83b813 ("perf, pt, coresight: Fix address filters for vmas with non-zero offset")

changes the address filtering logic to communicate filter ranges to the PMU driver
via a single address range object, instead of having the driver do the final bit of
math.

That change forgets to take into account kernel filters, which are not calculated
the same way as DSO based filters.

Fix that by passing the kernel filters the same way as file-based filters.
This doesn't require any additional changes in the drivers.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: c60f83b813 ("perf, pt, coresight: Fix address filters for vmas with non-zero offset")
Link: https://lkml.kernel.org/r/20190329091212.29870-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Bug: 140266694
Change-Id: I28472be39c9ee0f1157b51014880fc66926a39e7
Signed-off-by: Yabin Cui <yabinc@google.com>
2019-11-06 14:05:02 -08:00
Alexander Shishkin
93b9a8a987 UPSTREAM: perf, pt, coresight: Fix address filters for vmas with non-zero offset
(Upstream commit c60f83b813).

Currently, the address range calculation for file-based filters works as
long as the vma that maps the matching part of the object file starts
from offset zero into the file (vm_pgoff==0). Otherwise, the resulting
filter range would be off by vm_pgoff pages. Another related problem is
that in case of a partially matching vma, that is, a vma that matches
part of a filter region, the filter range size wouldn't be adjusted.

Fix the arithmetics around address filter range calculations, taking
into account vma offset, so that the entire calculation is done before
the filter configuration is passed to the PMU drivers instead of having
those drivers do the final bit of arithmetics.

Based on the patch by Adrian Hunter <adrian.hunter.intel.com>.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Fixes: 375637bc52 ("perf/core: Introduce address range filtering")
Link: http://lkml.kernel.org/r/20190215115655.63469-3-alexander.shishkin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Bug: 140266694
Change-Id: I248219cff1573281196d455cf39d028b68ec17a7
Signed-off-by: Yabin Cui <yabinc@google.com>
2019-11-06 14:05:02 -08:00
Alexander Shishkin
547690a5ac UPSTREAM: perf: Copy parent's address filter offsets on clone
(Upstream commit 18736eef12).

When a child event is allocated in the inherit_event() path, the VMA
based filter offsets are not copied from the parent, even though the
address space mapping of the new task remains the same, which leads to
no trace for the new task until exec.

Reported-by: Mansour Alharthi <malharthi9@gatech.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Fixes: 375637bc52 ("perf/core: Introduce address range filtering")
Link: http://lkml.kernel.org/r/20190215115655.63469-2-alexander.shishkin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Bug: 140266694
Change-Id: I14af7095b0712ee6fc427eafc3d3486f0fc1b07c
Signed-off-by: Yabin Cui <yabinc@google.com>
2019-11-06 14:05:02 -08:00
Greg Kroah-Hartman
aa4d6b3489 This is the 4.19.82 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl3Ct0gACgkQONu9yGCS
 aT46EQ//SsO3zq9K1P9HVRCQh5+ZPrk2uynVQIlMunhvhix8+SA+UopfNWwqM30n
 aEUPHk9snHTiRm5VRCBip8ea3/uZCpLTAwm/L0OKSyHpZ/GDGIQxNP5svjMQePYp
 57mmhVEV387gHoJiXxi8OiOYuPagscw809UkMBTIgl1g3B+vicy6IYEjlvmwr+vy
 6ghqEDkrR+2+25n/yrPPfesL+rlpE4nB6QvNkYnDSzyJTTKP76Wh21bP4r0mV2RN
 U4X5irbdfTSEYcK5DbNTUgMsUEk1ixxY6vHy7yWSe8ED9oMHdfjzfUS15pjbyrxD
 GLXw3o7Lv/ES7HGZpG073QLIp9oJPtvhrFHvIwBicE2pvBE3++zmmCFdQMx/tY25
 sUUctWvpeizX0qD+7mH6VrMXZB7DbHTUGfxdtfPJfk3l+c4NtSxe6hPaOC1MDJaY
 qBj7tDCoeB1HkLSqKkCGlMu9v1/V6+8E0Gqb6DbPC3IRwezHLq0U/LDCoTf874Qs
 0Fx5t+9Fxk5oeG6ifEmaYP0xT9untUi5EfFAYGCdEGYv5GZskmQgi7BLrnzUCfqM
 T+oruajADfSfe0ylgcXp9vdVkVWx/arbftOH3IVAZhe6Qj1jq57sMmSaLii7QyCC
 Z+rRwGNCO3WhkobBbLpF75uLMYulMqtlsep0Y2JIxdhcticiQbE=
 =1W43
 -----END PGP SIGNATURE-----

Merge 4.19.82 into android-4.19

Changes in 4.19.82
	zram: fix race between backing_dev_show and backing_dev_store
	dm snapshot: introduce account_start_copy() and account_end_copy()
	dm snapshot: rework COW throttling to fix deadlock
	Btrfs: fix inode cache block reserve leak on failure to allocate data space
	Btrfs: fix memory leak due to concurrent append writes with fiemap
	btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents()
	btrfs: tracepoints: Fix wrong parameter order for qgroup events
	wil6210: fix freeing of rx buffers in EDMA mode
	f2fs: flush quota blocks after turnning it off
	scsi: lpfc: Fix a duplicate 0711 log message number.
	sc16is7xx: Fix for "Unexpected interrupt: 8"
	powerpc/powernv: hold device_hotplug_lock when calling memtrace_offline_pages()
	f2fs: fix to recover inode's i_gc_failures during POR
	f2fs: fix to recover inode->i_flags of inode block during POR
	HID: i2c-hid: add Direkt-Tek DTLAPY133-1 to descriptor override
	usb: dwc2: fix unbalanced use of external vbus-supply
	tools/power turbostat: fix goldmont C-state limit decoding
	x86/cpu: Add Atom Tremont (Jacobsville)
	drm/msm/dpu: handle failures while initializing displays
	bcache: fix input overflow to writeback_rate_minimum
	PCI: Fix Switchtec DMA aliasing quirk dmesg noise
	Btrfs: fix deadlock on tree root leaf when finding free extent
	netfilter: ipset: Make invalid MAC address checks consistent
	HID: i2c-hid: Disable runtime PM for LG touchscreen
	HID: i2c-hid: Ignore input report if there's no data present on Elan touchpanels
	HID: i2c-hid: Add Odys Winbook 13 to descriptor override
	platform/x86: Add the VLV ISP PCI ID to atomisp2_pm
	platform/x86: Fix config space access for intel_atomisp2_pm
	ath10k: assign 'n_cipher_suites = 11' for WCN3990 to enable WPA3
	clk: boston: unregister clks on failure in clk_boston_setup()
	scripts/setlocalversion: Improve -dirty check with git-status --no-optional-locks
	staging: mt7621-pinctrl: use pinconf-generic for 'dt_node_to_map' and 'dt_free_map'
	HID: Add ASUS T100CHI keyboard dock battery quirks
	NFSv4: Ensure that the state manager exits the loop on SIGKILL
	HID: steam: fix boot loop with bluetooth firmware
	HID: steam: fix deadlock with input devices.
	samples: bpf: fix: seg fault with NULL pointer arg
	usb: dwc3: gadget: early giveback if End Transfer already completed
	usb: dwc3: gadget: clear DWC3_EP_TRANSFER_STARTED on cmd complete
	ALSA: usb-audio: Cleanup DSD whitelist
	usb: handle warm-reset port requests on hub resume
	rtc: pcf8523: set xtal load capacitance from DT
	arm64: Add MIDR encoding for HiSilicon Taishan CPUs
	arm64: kpti: Whitelist HiSilicon Taishan v110 CPUs
	mlxsw: spectrum: Set LAG port collector only when active
	scsi: lpfc: Correct localport timeout duration error
	CIFS: Respect SMB2 hdr preamble size in read responses
	cifs: add credits from unmatched responses/messages
	ALSA: hda/realtek - Apply ALC294 hp init also for S4 resume
	media: vimc: Remove unused but set variables
	ext4: disallow files with EXT4_JOURNAL_DATA_FL from EXT4_IOC_SWAP_BOOT
	exec: load_script: Do not exec truncated interpreter path
	net: dsa: mv88e6xxx: Release lock while requesting IRQ
	PCI/PME: Fix possible use-after-free on remove
	drm/amd/display: fix odm combine pipe reset
	power: supply: max14656: fix potential use-after-free
	iio: adc: meson_saradc: Fix memory allocation order
	iio: fix center temperature of bmc150-accel-core
	libsubcmd: Make _FORTIFY_SOURCE defines dependent on the feature
	perf tests: Avoid raising SEGV using an obvious NULL dereference
	perf map: Fix overlapped map handling
	perf script brstackinsn: Fix recovery from LBR/binary mismatch
	perf jevents: Fix period for Intel fixed counters
	perf tools: Propagate get_cpuid() error
	perf annotate: Propagate perf_env__arch() error
	perf annotate: Fix the signedness of failure returns
	perf annotate: Propagate the symbol__annotate() error return
	perf annotate: Return appropriate error code for allocation failures
	staging: rtl8188eu: fix null dereference when kzalloc fails
	RDMA/hfi1: Prevent memory leak in sdma_init
	RDMA/iwcm: Fix a lock inversion issue
	HID: hyperv: Use in-place iterator API in the channel callback
	nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
	arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #1542419
	tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'
	tty: n_hdlc: fix build on SPARC
	gpio: max77620: Use correct unit for debounce times
	fs: cifs: mute -Wunused-const-variable message
	serial: mctrl_gpio: Check for NULL pointer
	efi/cper: Fix endianness of PCIe class code
	efi/x86: Do not clean dummy variable in kexec path
	MIPS: include: Mark __cmpxchg as __always_inline
	x86/xen: Return from panic notifier
	ocfs2: clear zero in unaligned direct IO
	fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
	fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock()
	fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc()
	arm64: armv8_deprecated: Checking return value for memory allocation
	x86/cpu: Add Comet Lake to the Intel CPU models header
	sched/vtime: Fix guest/system mis-accounting on task switch
	perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp
	drm/amdgpu: fix memory leak
	iio: imu: adis16400: release allocated memory on failure
	MIPS: include: Mark __xchg as __always_inline
	MIPS: fw: sni: Fix out of bounds init of o32 stack
	virt: vbox: fix memory leak in hgcm_call_preprocess_linaddr
	nbd: fix possible sysfs duplicate warning
	NFSv4: Fix leak of clp->cl_acceptor string
	s390/uaccess: avoid (false positive) compiler warnings
	tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
	ARM: 8914/1: NOMMU: Fix exc_ret for XIP
	ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360
	iwlwifi: exclude GEO SAR support for 3168
	nbd: verify socket is supported during setup
	USB: legousbtower: fix a signedness bug in tower_probe()
	thunderbolt: Use 32-bit writes when writing ring producer/consumer
	ath6kl: fix a NULL-ptr-deref bug in ath6kl_usb_alloc_urb_from_pipe()
	fuse: flush dirty data/metadata before non-truncate setattr
	fuse: truncate pending writes on O_TRUNC
	ALSA: bebob: Fix prototype of helper function to return negative value
	ALSA: hda/realtek - Fix 2 front mics of codec 0x623
	ALSA: hda/realtek - Add support for ALC623
	UAS: Revert commit 3ae62a4209 ("UAS: fix alignment of scatter/gather segments")
	USB: gadget: Reject endpoints with 0 maxpacket value
	usb-storage: Revert commit 747668dbc0 ("usb-storage: Set virt_boundary_mask to avoid SG overflows")
	USB: ldusb: fix ring-buffer locking
	USB: ldusb: fix control-message timeout
	usb: xhci: fix __le32/__le64 accessors in debugfs code
	USB: serial: whiteheat: fix potential slab corruption
	USB: serial: whiteheat: fix line-speed endianness
	scsi: target: cxgbit: Fix cxgbit_fw4_ack()
	HID: i2c-hid: add Trekstor Primebook C11B to descriptor override
	HID: Fix assumption that devices have inputs
	HID: fix error message in hid_open_report()
	nl80211: fix validation of mesh path nexthop
	s390/cmm: fix information leak in cmm_timeout_handler()
	s390/idle: fix cpu idle time calculation
	arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default
	rtlwifi: Fix potential overflow on P2P code
	dmaengine: qcom: bam_dma: Fix resource leak
	dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle
	drm/amdgpu/powerplay/vega10: allow undervolting in p7
	NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
	batman-adv: Avoid free/alloc race when handling OGM buffer
	llc: fix sk_buff leak in llc_sap_state_process()
	llc: fix sk_buff leak in llc_conn_service()
	rxrpc: Fix call ref leak
	rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record
	rxrpc: Fix trace-after-put looking at the put peer record
	NFC: pn533: fix use-after-free and memleaks
	bonding: fix potential NULL deref in bond_update_slave_arr
	net: usb: sr9800: fix uninitialized local variable
	sch_netem: fix rcu splat in netem_enqueue()
	ALSA: timer: Simplify error path in snd_timer_open()
	ALSA: timer: Fix mutex deadlock at releasing card
	ALSA: usb-audio: DSD auto-detection for Playback Designs
	ALSA: usb-audio: Update DSD support quirks for Oppo and Rotel
	ALSA: usb-audio: Add DSD support for Gustard U16/X26 USB Interface
	powerpc/powernv: Fix CPU idle to be called with IRQs disabled
	Revert "ALSA: hda: Flush interrupts on disabling"
	Linux 4.19.82

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I79ced3dcffed0086af7d8a77116e8061915677a1
2019-11-06 13:21:58 +01:00
Petr Mladek
394c90d9ce tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
[ Upstream commit d303de1fcf ]

A customer reported the following softlockup:

[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002]  tracing_read_pipe+0x336/0x3c0
[899688.160002]  __vfs_read+0x26/0x140
[899688.160002]  vfs_read+0x87/0x130
[899688.160002]  SyS_read+0x42/0x90
[899688.160002]  do_syscall_64+0x74/0x160

It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.

Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.

The culprit seems to be in the code:

	/* reset all but tr, trace, and overruns */
	memset(&iter->seq, 0,
	       sizeof(struct trace_iterator) -
	       offsetof(struct trace_iterator, seq));

It was added by the commit 53d0aa7730 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:

     struct trace_seq {
	unsigned char		buffer[PAGE_SIZE];
	unsigned int		len;
     };

There was no "size" variable and zeroing was perfectly fine.

The solution is to reinitialize the structure after or without
zeroing.

Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06 13:06:09 +01:00
Frederic Weisbecker
2cd003a820 sched/vtime: Fix guest/system mis-accounting on task switch
[ Upstream commit 68e7a4d66b ]

vtime_account_system() assumes that the target task to account cputime
to is always the current task. This is most often true indeed except on
task switch where we call:

	vtime_common_task_switch(prev)
		vtime_account_system(prev)

Here prev is the scheduling-out task where we account the cputime to. It
doesn't match current that is already the scheduling-in task at this
stage of the context switch.

So we end up checking the wrong task flags to determine if we are
accounting guest or system time to the previous task.

As a result the wrong task is used to check if the target is running in
guest mode. We may then spuriously account or leak either system or
guest time on task switch.

Fix this assumption and also turn vtime_guest_enter/exit() to use the
task passed in parameter as well to avoid future similar issues.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpengli@tencent.com>
Fixes: 2a42eb9594 ("sched/cputime: Accumulate vtime on top of nsec clocksource")
Link: https://lkml.kernel.org/r/20190925214242.21873-1-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06 13:06:01 +01:00
Greg Kroah-Hartman
c8532f7af6 This is the 4.19.81 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl239joACgkQONu9yGCS
 aT4C8g//docfwEU2Ezqy7G2049bm0ulF99clghNoT47idndkBGgrZp1fJ9X8H7dO
 dxR12NoKQcYtlxdLvY1XTrHYXnxhMgECjltLeI0sd2sA90hteWx9lXS9DDcWcwWr
 zE4vtteCEE6b7DlMtMV3IeSqtkLIAewdlYYepmsX2xNVnRJXwCbEbvQ0TVHan0P/
 1QtuiiiO97q8NkPAOAJCqfs6l+LFfWMuKN1cESBQ7fVUg0PppP+dfqrBhyJ/C98S
 kA+iFkFKkRdYr7052sJtm0mSvOrBmCU9KGixHzQ9gmN7LlfxgfnQqH5i1qCL/mC2
 +/9jozmZ2vhyH75VV2CT3hsVgyseptURudrmOcQC/iiCF8nDypcqrcYBYYNMNg8z
 u4b4bvxxXv+XzJQkgIWjiKPa1+BQh3JBubbNHxVhJecxyd6dwHH498z/9t3e1XRn
 9pdW7YdVDWNI5HgxVJRTmaN157N0mQSg9ghEDQ10G/KOhY7+pbCGCEIOursSvou3
 ov01uMeip2TE+OcjcEMyspcKJZkxaEvfS0iAvCYsXebPjyVwtVrT9Nph8MTJ1zPw
 W3YN5wt/kT3pJII2owR5SG3ti+LindvaK+jAEKsLcA9CZ3DeS2tSjwr5KQR0w8Px
 5wtfVo5yFG7YRQCXKq3mhuZEeXHXOXASNEuxD4fmgXqSYwj2SZU=
 =bugE
 -----END PGP SIGNATURE-----

Merge 4.19.81 into android-4.19

Changes in 4.19.81
	nvme-pci: Fix a race in controller removal
	scsi: ufs: skip shutdown if hba is not powered
	scsi: megaraid: disable device when probe failed after enabled device
	scsi: qla2xxx: Fix unbound sleep in fcport delete path.
	ARM: OMAP2+: Fix missing reset done flag for am3 and am43
	ARM: OMAP2+: Fix warnings with broken omap2_set_init_voltage()
	ieee802154: ca8210: prevent memory leak
	ARM: dts: am4372: Set memory bandwidth limit for DISPC
	net: dsa: qca8k: Use up to 7 ports for all operations
	MIPS: dts: ar9331: fix interrupt-controller size
	xen/efi: Set nonblocking callbacks
	nl80211: fix null pointer dereference
	mac80211: fix txq null pointer dereference
	netfilter: nft_connlimit: disable bh on garbage collection
	net: dsa: rtl8366rb: add missing of_node_put after calling of_get_child_by_name
	mips: Loongson: Fix the link time qualifier of 'serial_exit()'
	net: hisilicon: Fix usage of uninitialized variable in function mdio_sc_cfg_reg_write()
	lib: textsearch: fix escapes in example code
	r8152: Set macpassthru in reset_resume callback
	namespace: fix namespace.pl script to support relative paths
	libata/ahci: Fix PCS quirk application
	md/raid0: fix warning message for parameter default_layout
	Revert "drm/radeon: Fix EEH during kexec"
	ocfs2: fix panic due to ocfs2_wq is null
	ipv4: fix race condition between route lookup and invalidation
	ipv4: Return -ENETUNREACH if we can't create route but saddr is valid
	net: avoid potential infinite loop in tc_ctl_action()
	net: bcmgenet: Fix RGMII_MODE_EN value for GENET v1/2/3
	net: bcmgenet: Set phydev->dev_flags only for internal PHYs
	net: i82596: fix dma_alloc_attr for sni_82596
	net/ibmvnic: Fix EOI when running in XIVE mode.
	net: ipv6: fix listify ip6_rcv_finish in case of forwarding
	net: stmmac: disable/enable ptp_ref_clk in suspend/resume flow
	sctp: change sctp_prot .no_autobind with true
	memfd: Fix locking when tagging pins
	USB: legousbtower: fix memleak on disconnect
	ALSA: hda/realtek - Add support for ALC711
	ALSA: hda/realtek - Enable headset mic on Asus MJ401TA
	ALSA: usb-audio: Disable quirks for BOSS Katana amplifiers
	ALSA: hda - Force runtime PM on Nvidia HDMI codecs
	usb: udc: lpc32xx: fix bad bit shift operation
	USB: serial: ti_usb_3410_5052: fix port-close races
	USB: ldusb: fix memleak on disconnect
	USB: usblp: fix use-after-free on disconnect
	USB: ldusb: fix read info leaks
	MIPS: tlbex: Fix build_restore_pagemask KScratch restore
	staging: wlan-ng: fix exit return when sme->key_idx >= NUM_WEPKEYS
	scsi: zfcp: fix reaction on bit error threshold notification
	scsi: sd: Ignore a failure to sync cache due to lack of authorization
	scsi: core: save/restore command resid for error handling
	scsi: core: try to get module before removing device
	scsi: ch: Make it possible to open a ch device multiple times again
	Input: da9063 - fix capability and drop KEY_SLEEP
	Input: synaptics-rmi4 - avoid processing unknown IRQs
	ASoC: rsnd: Reinitialize bit clock inversion flag for every format setting
	ACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit()
	cfg80211: wext: avoid copying malformed SSIDs
	mac80211: Reject malformed SSID elements
	drm/edid: Add 6 bpc quirk for SDC panel in Lenovo G50
	drm/ttm: Restore ttm prefaulting
	drm/amdgpu: Bail earlier when amdgpu.cik_/si_support is not set to 1
	drivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store()
	fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c
	mmc: cqhci: Commit descriptors before setting the doorbell
	mm/memory-failure.c: don't access uninitialized memmaps in memory_failure()
	mm/slub: fix a deadlock in show_slab_objects()
	mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo
	hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
	mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once
	xtensa: drop EXPORT_SYMBOL for outs*/ins*
	parisc: Fix vmap memory leak in ioremap()/iounmap()
	EDAC/ghes: Fix Use after free in ghes_edac remove path
	arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
	CIFS: avoid using MID 0xFFFF
	CIFS: Fix use after free of file info structures
	perf/aux: Fix AUX output stopping
	tracing: Fix race in perf_trace_buf initialization
	dm cache: fix bugs when a GFP_NOWAIT allocation fails
	x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
	x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
	pinctrl: cherryview: restore Strago DMI workaround for all versions
	pinctrl: armada-37xx: fix control of pins 32 and up
	pinctrl: armada-37xx: swap polarity on LED group
	btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
	Btrfs: add missing extents release on file extent cluster relocation error
	Btrfs: check for the full sync flag while holding the inode lock during fsync
	btrfs: tracepoints: Fix bad entry members of qgroup events
	memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'
	cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
	xen/netback: fix error path of xenvif_connect_data()
	PCI: PM: Fix pci_power_up()
	blk-rq-qos: fix first node deletion of rq_qos_del()
	RDMA/cxgb4: Do not dma memory off of the stack
	Linux 4.19.81

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I96ed6b6f11919bc6dd4b560b160b8e7059ec574e
2019-10-29 09:41:48 +01:00
Prateek Sood
5ce7528c4d tracing: Fix race in perf_trace_buf initialization
commit 6b1340cc00 upstream.

A race condition exists while initialiazing perf_trace_buf from
perf_trace_init() and perf_kprobe_init().

      CPU0                                        CPU1
perf_trace_init()
  mutex_lock(&event_mutex)
    perf_trace_event_init()
      perf_trace_event_reg()
        total_ref_count == 0
	buf = alloc_percpu()
        perf_trace_buf[i] = buf
        tp_event->class->reg() //fails       perf_kprobe_init()
	goto fail                              perf_trace_event_init()
                                                 perf_trace_event_reg()
        fail:
	  total_ref_count == 0

                                                   total_ref_count == 0
                                                   buf = alloc_percpu()
                                                   perf_trace_buf[i] = buf
                                                   tp_event->class->reg()
                                                   total_ref_count++

          free_percpu(perf_trace_buf[i])
          perf_trace_buf[i] = NULL

Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0,
causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf
getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring
event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should
fix this race.

The race caused the following bug:

 Unable to handle kernel paging request at virtual address 0000003106f2003c
 Mem abort info:
   ESR = 0x96000045
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000045
   CM = 0, WnR = 1
 user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000
 [0000003106f2003c] pgd=0000000000000000, pud=0000000000000000
 Internal error: Oops: 96000045 [#1] PREEMPT SMP
 Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000)
 pstate: 80400005 (Nzcv daif +PAN -UAO)
 pc : __memset+0x20/0x1ac
 lr : memset+0x3c/0x50
 sp : ffffffc09319fc50

  __memset+0x20/0x1ac
  perf_trace_buf_alloc+0x140/0x1a0
  perf_trace_sys_enter+0x158/0x310
  syscall_trace_enter+0x348/0x7c0
  el0_svc_common+0x11c/0x368
  el0_svc_handler+0x12c/0x198
  el0_svc+0x8/0xc

Ramdumps showed the following:
  total_ref_count = 3
  perf_trace_buf = (
      0x0 -> NULL,
      0x0 -> NULL,
      0x0 -> NULL,
      0x0 -> NULL)

Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org

Cc: stable@vger.kernel.org
Fixes: e12f03d703 ("perf/core: Implement the 'perf_kprobe' PMU")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29 09:20:03 +01:00
Alexander Shishkin
96202569b9 perf/aux: Fix AUX output stopping
commit f3a519e4ad upstream.

Commit:

  8a58ddae23 ("perf/core: Fix exclusive events' grouping")

allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:

Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29 09:20:03 +01:00
Greg Kroah-Hartman
d8a623cfbb This is the 4.19.80 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2o0vkACgkQONu9yGCS
 aT50rQ//So/ShGy5+D6cmsPVbd8p/9X0fM5D6VjgJ2pj+HbalxW7JhJ0tyVdfkO1
 AS6p/24bmUMWI3pxJBbJjfggbyleV2UFqlHIei+SjfHd0sghZ+E/XY7mLdcKvx0a
 dGdXuxF+enb18pNiUKUrJUgqksgfO40yMXyZhjGf5A+/JmYsNaJhkyCWgPfGIB5i
 YuowOWC4Rkds/CQxyRSRJl+YSQWh8j7Ke0tdjhMQxnmyzFD0wcE5vUtq2vIP7yxO
 ypPrz9zWTase3y1yCcPsXd0m4Ji1VpF99hlgAr0HLFNcZn2kNYFpyblfOrI/zfoK
 RqUJgVVlZaWvhMOrueQO+XbebXdCWTJHiNTUDxIFakEAPNz+XkTQQf6LvzOjcFxB
 oLHnp1qBCn72y6o6Q3XmauFcmYBokyfBvXDAJsXYr+X5B4akn/fpyzzBCInX0h+r
 og5zpLbXDLszG3j76TPSpcjd+t4xqMy+MG+jkH/mLtMAFyaVi2sfMzsZJAQCACr2
 wNHRFQQGjDmQjovkwSRYENQBUuITSj+VrBUD0M7lnfA/Fni3BAJaxY5I6WeEvbeW
 ejzPVFZB8dfEy7hAKKiVEuI8/akcp8MUXfLJOfTxytLIpYllOBoVC4LtjgO5FYu3
 grSbRuowjrlUCtnCU9H18avLE1ScFFEGTmPOqI6xDqKiZf59QsQ=
 =sKVA
 -----END PGP SIGNATURE-----

Merge 4.19.80 into android-4.19

Changes in 4.19.80
	panic: ensure preemption is disabled during panic()
	f2fs: use EINVAL for superblock with invalid magic
	USB: rio500: Remove Rio 500 kernel driver
	USB: yurex: Don't retry on unexpected errors
	USB: yurex: fix NULL-derefs on disconnect
	USB: usb-skeleton: fix runtime PM after driver unbind
	USB: usb-skeleton: fix NULL-deref on disconnect
	xhci: Fix false warning message about wrong bounce buffer write length
	xhci: Prevent device initiated U1/U2 link pm if exit latency is too long
	xhci: Check all endpoints for LPM timeout
	xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based hosts
	usb: xhci: wait for CNR controller not ready bit in xhci resume
	xhci: Prevent deadlock when xhci adapter breaks during init
	xhci: Increase STS_SAVE timeout in xhci_suspend()
	USB: adutux: fix use-after-free on disconnect
	USB: adutux: fix NULL-derefs on disconnect
	USB: adutux: fix use-after-free on release
	USB: iowarrior: fix use-after-free on disconnect
	USB: iowarrior: fix use-after-free on release
	USB: iowarrior: fix use-after-free after driver unbind
	USB: usblp: fix runtime PM after driver unbind
	USB: chaoskey: fix use-after-free on release
	USB: ldusb: fix NULL-derefs on driver unbind
	serial: uartlite: fix exit path null pointer
	USB: serial: keyspan: fix NULL-derefs on open() and write()
	USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20
	USB: serial: option: add Telit FN980 compositions
	USB: serial: option: add support for Cinterion CLS8 devices
	USB: serial: fix runtime PM after driver unbind
	USB: usblcd: fix I/O after disconnect
	USB: microtek: fix info-leak at probe
	USB: dummy-hcd: fix power budget for SuperSpeed mode
	usb: renesas_usbhs: gadget: Do not discard queues in usb_ep_set_{halt,wedge}()
	usb: renesas_usbhs: gadget: Fix usb_ep_set_{halt,wedge}() behavior
	USB: legousbtower: fix slab info leak at probe
	USB: legousbtower: fix deadlock on disconnect
	USB: legousbtower: fix potential NULL-deref on disconnect
	USB: legousbtower: fix open after failed reset request
	USB: legousbtower: fix use-after-free on release
	mei: me: add comet point (lake) LP device ids
	mei: avoid FW version request on Ibex Peak and earlier
	gpio: eic: sprd: Fix the incorrect EIC offset when toggling
	Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc
	staging: vt6655: Fix memory leak in vt6655_probe
	iio: adc: hx711: fix bug in sampling of data
	iio: adc: ad799x: fix probe error handling
	iio: adc: axp288: Override TS pin bias current for some models
	iio: light: opt3001: fix mutex unlock race
	efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
	perf llvm: Don't access out-of-scope array
	perf inject jit: Fix JIT_CODE_MOVE filename
	blk-wbt: fix performance regression in wbt scale_up/scale_down
	CIFS: Gracefully handle QueryInfo errors during open
	CIFS: Force revalidate inode when dentry is stale
	CIFS: Force reval dentry if LOOKUP_REVAL flag is set
	kernel/sysctl.c: do not override max_threads provided by userspace
	mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
	firmware: google: increment VPD key_len properly
	gpiolib: don't clear FLAG_IS_OUT when emulating open-drain/open-source
	iio: adc: stm32-adc: move registers definitions
	iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
	cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic
	btrfs: fix incorrect updating of log root tree
	btrfs: fix uninitialized ret in ref-verify
	NFS: Fix O_DIRECT accounting of number of bytes read/written
	MIPS: Disable Loongson MMI instructions for kernel build
	MIPS: elf_hwcap: Export userspace ASEs
	ACPICA: ACPI 6.3: PPTT add additional fields in Processor Structure Flags
	ACPI/PPTT: Add support for ACPI 6.3 thread flag
	arm64: topology: Use PPTT to determine if PE is a thread
	Fix the locking in dcache_readdir() and friends
	media: stkwebcam: fix runtime PM after driver unbind
	arm64/sve: Fix wrong free for task->thread.sve_state
	tracing/hwlat: Report total time spent in all NMIs during the sample
	tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
	ftrace: Get a reference counter for the trace_array on filter files
	tracing: Get trace_array reference for available_tracers files
	hwmon: Fix HWMON_P_MIN_ALARM mask
	x86/asm: Fix MWAITX C-state hint value
	PCI: vmd: Fix config addressing when using bus offsets
	perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization
	Linux 4.19.80

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8ef1835585f79b752712a835852a4bc960ae3b97
2019-10-17 15:33:07 -07:00
Mark-PK Tsai
0603d82bca perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization
commit 310aa0a25b upstream.

If we disable the compiler's auto-initialization feature, if
-fplugin-arg-structleak_plugin-byref or -ftrivial-auto-var-init=pattern
are disabled, arch_hw_breakpoint may be used before initialization after:

  9a4903dde2 ("perf/hw_breakpoint: Split attribute parse and commit")

On our ARM platform, the struct step_ctrl in arch_hw_breakpoint, which
used to be zero-initialized by kzalloc(), may be used in
arch_install_hw_breakpoint() without initialization.

Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alix Wu <alix.wu@mediatek.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: YJ Chiang <yj.chiang@mediatek.com>
Link: https://lkml.kernel.org/r/20190906060115.9460-1-mark-pk.tsai@mediatek.com
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Doug Anderson <dianders@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17 13:45:44 -07:00
Steven Rostedt (VMware)
b9040fab5f tracing: Get trace_array reference for available_tracers files
commit 194c2c74f5 upstream.

As instances may have different tracers available, we need to look at the
trace_array descriptor that shows the list of the available tracers for the
instance. But there's a race between opening the file and an admin
deleting the instance. The trace_array_get() needs to be called before
accessing the trace_array.

Cc: stable@vger.kernel.org
Fixes: 607e2ea167 ("tracing: Set up infrastructure to allow tracers for instances")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17 13:45:41 -07:00
Steven Rostedt (VMware)
a6c9fb2c2c ftrace: Get a reference counter for the trace_array on filter files
commit 9ef16693af upstream.

The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for
an instance now. They need to take a reference to the instance otherwise
there could be a race between accessing the files and deleting the instance.

It wasn't until the :mod: caching where these file operations started
referencing the trace_array directly.

Cc: stable@vger.kernel.org
Fixes: 673feb9d76 ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17 13:45:40 -07:00
Srivatsa S. Bhat (VMware)
b7f758631d tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
commit fc64e4ad80 upstream.

max_latency is intended to record the maximum ever observed hardware
latency, which may occur in either part of the loop (inner/outer). So
we need to also consider the outer-loop sample when updating
max_latency.

Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu

Fixes: e7c15cd8a1 ("tracing: Added hardware latency tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17 13:45:39 -07:00
Srivatsa S. Bhat (VMware)
6271cbff93 tracing/hwlat: Report total time spent in all NMIs during the sample
commit 98dc19c114 upstream.

nmi_total_ts is supposed to record the total time spent in *all* NMIs
that occur on the given CPU during the (active portion of the)
sampling window. However, the code seems to be overwriting this
variable for each NMI, thereby only recording the time spent in the
most recent NMI. Fix it by accumulating the duration instead.

Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu

Fixes: 7b2c862501 ("tracing: Add NMI tracing in hwlat detector")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17 13:45:38 -07:00
Michal Hocko
7bbe6eefdb kernel/sysctl.c: do not override max_threads provided by userspace
commit b0f53dbc4b upstream.

Partially revert 16db3d3f11 ("kernel/sysctl.c: threads-max observe
limits") because the patch is causing a regression to any workload which
needs to override the auto-tuning of the limit provided by kernel.

set_max_threads is implementing a boot time guesstimate to provide a
sensible limit of the concurrently running threads so that runaways will
not deplete all the memory.  This is a good thing in general but there
are workloads which might need to increase this limit for an application
to run (reportedly WebSpher MQ is affected) and that is simply not
possible after the mentioned change.  It is also very dubious to
override an admin decision by an estimation that doesn't have any direct
relation to correctness of the kernel operation.

Fix this by dropping set_max_threads from sysctl_max_threads so any
value is accepted as long as it fits into MAX_THREADS which is important
to check because allowing more threads could break internal robust futex
restriction.  While at it, do not use MIN_THREADS as the lower boundary
because it is also only a heuristic for automatic estimation and admin
might have a good reason to stop new threads to be created even when
below this limit.

This became more severe when we switched x86 from 4k to 8k kernel
stacks.  Starting since 6538b8ea88 ("x86_64: expand kernel stack to
16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned
value.

In the particular case

  3.12
  kernel.threads-max = 515561

  4.4
  kernel.threads-max = 200000

Neither of the two values is really insane on 32GB machine.

I am not sure we want/need to tune the max_thread value further.  If
anything the tuning should be removed altogether if proven not useful in
general.  But we definitely need a way to override this auto-tuning.

Link: http://lkml.kernel.org/r/20190922065801.GB18814@dhcp22.suse.cz
Fixes: 16db3d3f11 ("kernel/sysctl.c: threads-max observe limits")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17 13:45:19 -07:00
Will Deacon
7d1688c673 panic: ensure preemption is disabled during panic()
commit 20bb759a66 upstream.

Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
calling CPU in an infinite loop, but with interrupts and preemption
enabled.  From this state, userspace can continue to be scheduled,
despite the system being "dead" as far as the kernel is concerned.

This is easily reproducible on arm64 when booting with "nosmp" on the
command line; a couple of shell scripts print out a periodic "Ping"
message whilst another triggers a crash by writing to
/proc/sysrq-trigger:

  | sysrq: Trigger a crash
  | Kernel panic - not syncing: sysrq triggered crash
  | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
  | Hardware name: linux,dummy-virt (DT)
  | Call trace:
  |  dump_backtrace+0x0/0x148
  |  show_stack+0x14/0x20
  |  dump_stack+0xa0/0xc4
  |  panic+0x140/0x32c
  |  sysrq_handle_reboot+0x0/0x20
  |  __handle_sysrq+0x124/0x190
  |  write_sysrq_trigger+0x64/0x88
  |  proc_reg_write+0x60/0xa8
  |  __vfs_write+0x18/0x40
  |  vfs_write+0xa4/0x1b8
  |  ksys_write+0x64/0xf0
  |  __arm64_sys_write+0x14/0x20
  |  el0_svc_common.constprop.0+0xb0/0x168
  |  el0_svc_handler+0x28/0x78
  |  el0_svc+0x8/0xc
  | Kernel Offset: disabled
  | CPU features: 0x0002,24002004
  | Memory Limit: none
  | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
  |  Ping 2!
  |  Ping 1!
  |  Ping 1!
  |  Ping 2!

The issue can also be triggered on x86 kernels if CONFIG_SMP=n,
otherwise local interrupts are disabled in 'smp_send_stop()'.

Disable preemption in 'panic()' before re-enabling interrupts.

Link: http://lkml.kernel.org/r/20191002123538.22609-1-will@kernel.org
Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
Signed-off-by: Will Deacon <will@kernel.org>
Reported-by: Xogium <contact@xogium.me>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17 13:44:46 -07:00
Kalesh Singh
d112bff3b7 BACKPORT: PM/sleep: Expose suspend stats in sysfs
Userspace can get suspend stats from the suspend stats debugfs node.
Since debugfs doesn't have stable ABI, expose suspend stats in
sysfs under /sys/power/suspend_stats.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 2c8db5bef9)
[ fixed conflicts in Documentation ]
Bug: 129087298
Change-Id: Ia8f0d8551ae693cb5f3e1abc609fb9bbfeb695f5
Signed-off-by: Tri Vo <trong@google.com>
2019-10-11 14:04:42 -07:00
Tri Vo
2c9f5fa9c3 UPSTREAM: PM / wakeup: Show wakeup sources stats in sysfs
Add an ID and a device pointer to 'struct wakeup_source'. Use them to to
expose wakeup sources statistics in sysfs under
/sys/class/wakeup/wakeup<ID>/*.

Co-developed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Co-developed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Tri Vo <trong@android.com>
Tested-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit c8377adfa7)
Bug: 129087298
Signed-off-by: Tri Vo <trong@google.com>
Change-Id: Iecd3412423f9d499981f44d3b69507eaa62a2cd9
2019-10-11 14:04:42 -07:00
Tri Vo
5bc2bdfb22 UPSTREAM: PM / wakeup: Use wakeup_source_register() in wakelock.c
kernel/power/wakelock.c duplicates wakeup source creation and
registration code from drivers/base/power/wakeup.c.

Change struct wakelock's wakeup source to a pointer and use
wakeup_source_register() function to create and register said wakeup
source. Use wakeup_source_unregister() on cleanup path.

Signed-off-by: Tri Vo <trong@android.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 2434aea58e)
Bug: 129087298
Signed-off-by: Tri Vo <trong@google.com>
Change-Id: I4e6b3c613c561fb382f17c3c31b6584aebabfb5d
2019-10-11 14:04:42 -07:00
Greg Kroah-Hartman
ef55d5261c This is the 4.19.79 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2grBgACgkQONu9yGCS
 aT6xRBAA0pTW2W/VvzBHBLeVlmNtwQZb8x7civVb72iZkltKR9tTPim90PULpz/P
 iO7kh8KqkgVUqdgBE0VzkHGWUSThggfSTQiqzCqOgTwV8WQWqSF8ET0HU8zbglYB
 5pXSojoRYmurGVznd4Ll6aWa5brXIKwf1mDSrFHagOyOLxQmyggHaTRSLx36BSfj
 gunE2ideB1oTaPmd/2aTI03CU3jRwXmowe8rZIDa8pJEpplZPFdk0YOPXg2t6uRI
 bjJGO8bhfR/14r/3h76IwsEiVVXIcCeEVm0fos/H6NUypedfi7jlT0Ldzg1/zZti
 mUMkbPGHcJbOWfBYPQq8xQzviCa+MFraA4Tek5h/Lf7kf3NpjE20AnH3pb9TaqQf
 mJYUGziCoOOOz8k+0eNtIjIZiCysOnf9sI5rGhMYb9qfZoZGG6RiitqyVYNa+rzJ
 wvIUQZ4vSnYmQMAXqxyayfSZvFbMxv6pAdeH0NrXVRgFF6dnKG9TSsCnIuQaJxAE
 OQRaYEJktMUBs81hS0IjnJNDFLW3r++s87xEYvCt4L7XGSrxMJ3jW6xLZlmET68G
 4UIddJ81zIuqpGY1qoWdWZAp3nfRfSX4ehOnoNmIDyC9pRhiCKc+N6j5rX8gBNO/
 SO8YOaNf9RTphhEG6Op7u4ZbU+UR4pYP+rjKveyT2HKPH6D/Tv0=
 =wt6H
 -----END PGP SIGNATURE-----

Merge 4.19.79 into android-4.19

Changes in 4.19.79
	s390/process: avoid potential reading of freed stack
	KVM: s390: Test for bad access register and size at the start of S390_MEM_OP
	s390/topology: avoid firing events before kobjs are created
	s390/cio: exclude subchannels with no parent from pseudo check
	KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts
	KVM: PPC: Book3S HV: Check for MMU ready on piggybacked virtual cores
	KVM: PPC: Book3S HV: Don't lose pending doorbell request on migration on P9
	KVM: X86: Fix userspace set invalid CR4
	KVM: nVMX: handle page fault in vmread fix
	nbd: fix max number of supported devs
	PM / devfreq: tegra: Fix kHz to Hz conversion
	ASoC: Define a set of DAPM pre/post-up events
	ASoC: sgtl5000: Improve VAG power and mute control
	powerpc/mce: Fix MCE handling for huge pages
	powerpc/mce: Schedule work from irq_work
	powerpc/powernv: Restrict OPAL symbol map to only be readable by root
	powerpc/powernv/ioda: Fix race in TCE level allocation
	powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions
	can: mcp251x: mcp251x_hw_reset(): allow more time after a reset
	tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file
	crypto: qat - Silence smp_processor_id() warning
	crypto: skcipher - Unmap pages after an external error
	crypto: cavium/zip - Add missing single_release()
	crypto: caam - fix concurrency issue in givencrypt descriptor
	crypto: ccree - account for TEE not ready to report
	crypto: ccree - use the full crypt length value
	MIPS: Treat Loongson Extensions as ASEs
	power: supply: sbs-battery: use correct flags field
	power: supply: sbs-battery: only return health when battery present
	tracing: Make sure variable reference alias has correct var_ref_idx
	usercopy: Avoid HIGHMEM pfn warning
	timer: Read jiffies once when forwarding base clk
	PCI: vmd: Fix shadow offsets to reflect spec changes
	PCI: Restore Resizable BAR size bits correctly for 1MB BARs
	watchdog: imx2_wdt: fix min() calculation in imx2_wdt_set_timeout
	perf stat: Fix a segmentation fault when using repeat forever
	drm/omap: fix max fclk divider for omap36xx
	drm/msm/dsi: Fix return value check for clk_get_parent
	drm/nouveau/kms/nv50-: Don't create MSTMs for eDP connectors
	drm/i915/gvt: update vgpu workload head pointer correctly
	mmc: sdhci: improve ADMA error reporting
	mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
	Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
	xen/xenbus: fix self-deadlock after killing user process
	ieee802154: atusb: fix use-after-free at disconnect
	s390/cio: avoid calling strlen on null pointer
	cfg80211: initialize on-stack chandefs
	arm64: cpufeature: Detect SSBS and advertise to userspace
	ima: always return negative code for error
	ima: fix freeing ongoing ahash_request
	fs: nfs: Fix possible null-pointer dereferences in encode_attrs()
	9p: Transport error uninitialized
	9p: avoid attaching writeback_fid on mmap with type PRIVATE
	xen/pci: reserve MCFG areas earlier
	ceph: fix directories inode i_blkbits initialization
	ceph: reconnect connection if session hang in opening state
	watchdog: aspeed: Add support for AST2600
	netfilter: nf_tables: allow lookups in dynamic sets
	drm/amdgpu: Fix KFD-related kernel oops on Hawaii
	drm/amdgpu: Check for valid number of registers to read
	pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
	pwm: stm32-lp: Add check in case requested period cannot be achieved
	x86/purgatory: Disable the stackleak GCC plugin for the purgatory
	ntb: point to right memory window index
	thermal: Fix use-after-free when unregistering thermal zone device
	thermal_hwmon: Sanitize thermal_zone type
	libnvdimm/region: Initialize bad block for volatile namespaces
	fuse: fix memleak in cuse_channel_open
	libnvdimm/nfit_test: Fix acpi_handle redefinition
	sched/membarrier: Call sync_core only before usermode for same mm
	sched/membarrier: Fix private expedited registration check
	sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
	perf build: Add detection of java-11-openjdk-devel package
	kernel/elfcore.c: include proper prototypes
	perf unwind: Fix libunwind build failure on i386 systems
	nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
	drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed
	KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP
	KVM: nVMX: Fix consistency check on injected exception error code
	nbd: fix crash when the blksize is zero
	powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()
	powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag
	tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure
	tick: broadcast-hrtimer: Fix a race in bc_set_next
	perf tools: Fix segfault in cpu_cache_level__read()
	perf stat: Reset previous counts on repeat with interval
	riscv: Avoid interrupts being erroneously enabled in handle_exception()
	arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3
	KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe
	arm64: docs: Document SSBS HWCAP
	arm64: fix SSBS sanitization
	arm64: Add sysfs vulnerability show for spectre-v1
	arm64: add sysfs vulnerability show for meltdown
	arm64: enable generic CPU vulnerabilites support
	arm64: Always enable ssb vulnerability detection
	arm64: Provide a command line to disable spectre_v2 mitigation
	arm64: Advertise mitigation of Spectre-v2, or lack thereof
	arm64: Always enable spectre-v2 vulnerability detection
	arm64: add sysfs vulnerability show for spectre-v2
	arm64: add sysfs vulnerability show for speculative store bypass
	arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSB
	arm64: Force SSBS on context switch
	arm64: Use firmware to detect CPUs that are not affected by Spectre-v2
	arm64/speculation: Support 'mitigations=' cmdline option
	vfs: Fix EOVERFLOW testing in put_compat_statfs64
	coresight: etm4x: Use explicit barriers on enable/disable
	staging: erofs: fix an error handling in erofs_readdir()
	staging: erofs: some compressed cluster should be submitted for corrupted images
	staging: erofs: add two missing erofs_workgroup_put for corrupted images
	staging: erofs: detect potential multiref due to corrupted images
	cfg80211: add and use strongly typed element iteration macros
	cfg80211: Use const more consistently in for_each_element macros
	nl80211: validate beacon head
	Linux 4.19.79

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie4f85994b5f3e53658c42833d0dc712575d0902e
2019-10-11 19:13:57 +02:00
Balasubramani Vivekanandan
e5331c37c0 tick: broadcast-hrtimer: Fix a race in bc_set_next
[ Upstream commit b9023b91dd ]

When a cpu requests broadcasting, before starting the tick broadcast
hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
the required synchronization when the callback is active on other core.

The callback could have already executed tick_handle_oneshot_broadcast()
and could have also returned. But still there is a small time window where
the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
without doing anything, but the next_event of the tick broadcast clock
device is already set to a timeout value.

In the race condition diagram below, CPU #1 is running the timer callback
and CPU #2 is entering idle state and so calls bc_set_next().

In the worst case, the next_event will contain an expiry time, but the
hrtimer will not be started which happens when the racing callback returns
HRTIMER_NORESTART. The hrtimer might never recover if all further requests
from the CPUs to subscribe to tick broadcast have timeout greater than the
next_event of tick broadcast clock device. This leads to cascading of
failures and finally noticed as rcu stall warnings

Here is a depiction of the race condition

CPU #1 (Running timer callback)                   CPU #2 (Enter idle
                                                  and subscribe to
                                                  tick broadcast)
---------------------                             ---------------------

__run_hrtimer()                                   tick_broadcast_enter()

  bc_handler()                                      __tick_broadcast_oneshot_control()

    tick_handle_oneshot_broadcast()

      raw_spin_lock(&tick_broadcast_lock);

      dev->next_event = KTIME_MAX;                  //wait for tick_broadcast_lock
      //next_event for tick broadcast clock
      set to KTIME_MAX since no other cores
      subscribed to tick broadcasting

      raw_spin_unlock(&tick_broadcast_lock);

    if (dev->next_event == KTIME_MAX)
      return HRTIMER_NORESTART
    // callback function exits without
       restarting the hrtimer                      //tick_broadcast_lock acquired
                                                   raw_spin_lock(&tick_broadcast_lock);

                                                   tick_broadcast_set_event()

                                                     clockevents_program_event()

                                                       dev->next_event = expires;

                                                       bc_set_next()

                                                         hrtimer_try_to_cancel()
                                                         //returns -1 since the timer
                                                         callback is active. Exits without
                                                         restarting the timer
  cpu_base->running = NULL;

The comment that hrtimer cannot be armed from within the callback is
wrong. It is fine to start the hrtimer from within the callback. Also it is
safe to start the hrtimer from the enter/exit idle code while the broadcast
handler is active. The enter/exit idle code and the broadcast handler are
synchronized using tick_broadcast_lock. So there is no need for the
existing try to cancel logic. All this can be removed which will eliminate
the race condition as well.

Fixes: 5d1638acb9 ("tick: Introduce hrtimer based broadcast")
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11 18:21:28 +02:00
Valdis Kletnieks
b0aaf65bb1 kernel/elfcore.c: include proper prototypes
[ Upstream commit 0f74914071 ]

When building with W=1, gcc properly complains that there's no prototypes:

  CC      kernel/elfcore.o
kernel/elfcore.c:7:17: warning: no previous prototype for 'elf_core_extra_phdrs' [-Wmissing-prototypes]
    7 | Elf_Half __weak elf_core_extra_phdrs(void)
      |                 ^~~~~~~~~~~~~~~~~~~~
kernel/elfcore.c:12:12: warning: no previous prototype for 'elf_core_write_extra_phdrs' [-Wmissing-prototypes]
   12 | int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/elfcore.c:17:12: warning: no previous prototype for 'elf_core_write_extra_data' [-Wmissing-prototypes]
   17 | int __weak elf_core_write_extra_data(struct coredump_params *cprm)
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~
kernel/elfcore.c:22:15: warning: no previous prototype for 'elf_core_extra_data_size' [-Wmissing-prototypes]
   22 | size_t __weak elf_core_extra_data_size(void)
      |               ^~~~~~~~~~~~~~~~~~~~~~~~

Provide the include file so gcc is happy, and we don't have potential code drift

Link: http://lkml.kernel.org/r/29875.1565224705@turing-police
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11 18:21:23 +02:00
KeMeng Shi
46ff0e2f86 sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
[ Upstream commit 714e501e16 ]

An oops can be triggered in the scheduler when running qemu on arm64:

 Unable to handle kernel paging request at virtual address ffff000008effe40
 Internal error: Oops: 96000007 [#1] SMP
 Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
 pstate: 20000085 (nzCv daIf -PAN -UAO)
 pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
 lr : move_queued_task.isra.21+0x124/0x298
 ...
 Call trace:
  __ll_sc___cmpxchg_case_acq_4+0x4/0x20
  __migrate_task+0xc8/0xe0
  migration_cpu_stop+0x170/0x180
  cpu_stopper_thread+0xec/0x178
  smpboot_thread_fn+0x1ac/0x1e8
  kthread+0x134/0x138
  ret_from_fork+0x10/0x18

__set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to
migrage the process if process is not currently running on any one of the
CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an
invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if
CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects
check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if
corresponding bit is coincidentally set. As a consequence, kernel will
access an invalid rq address associate with the invalid CPU in
migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.

The reproduce the crash:

  1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling
  sched_setaffinity.

  2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online"
  and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.

  3) Oops appears if the invalid CPU is set in memory after tested cpumask.

Signed-off-by: KeMeng Shi <shikemeng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11 18:21:22 +02:00
Mathieu Desnoyers
6cb7aa1b4f sched/membarrier: Fix private expedited registration check
[ Upstream commit fc0d77387c ]

Fix a logic flaw in the way membarrier_register_private_expedited()
handles ready state checks for private expedited sync core and private
expedited registrations.

If a private expedited membarrier registration is first performed, and
then a private expedited sync_core registration is performed, the ready
state check will skip the second registration when it really should not.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-2-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11 18:21:22 +02:00
Wanpeng Li
e409b81d9d Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
commit 89340d0935 upstream.

This patch reverts commit 75437bb304 (locking/pvqspinlock: Don't
wait if vCPU is preempted).  A large performance regression was caused
by this commit.  on over-subscription scenarios.

The test was run on a Xeon Skylake box, 2 sockets, 40 cores, 80 threads,
with three VMs of 80 vCPUs each.  The score of ebizzy -M is reduced from
13000-14000 records/s to 1700-1800 records/s:

          Host                Guest                score

vanilla w/o kvm optimizations     upstream    1700-1800 records/s
vanilla w/o kvm optimizations     revert      13000-14000 records/s
vanilla w/ kvm optimizations      upstream    4500-5000 records/s
vanilla w/ kvm optimizations      revert      14000-15500 records/s

Exit from aggressive wait-early mechanism can result in premature yield
and extra scheduling latency.

Actually, only 6% of wait_early events are caused by vcpu_is_preempted()
being true.  However, when one vCPU voluntarily releases its vCPU, all
the subsequently waiters in the queue will do the same and the cascading
effect leads to bad performance.

kvm optimizations:
[1] commit d73eb57b80 (KVM: Boost vCPUs that are delivering interrupts)
[2] commit 266e85a5ec (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)

Tested-by: loobinliu@tencent.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: loobinliu@tencent.com
Cc: stable@vger.kernel.org
Fixes: 75437bb304 (locking/pvqspinlock: Don't wait if vCPU is preempted)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11 18:21:06 +02:00
Li RongQing
06f250215b timer: Read jiffies once when forwarding base clk
commit e430d802d6 upstream.

The timer delayed for more than 3 seconds warning was triggered during
testing.

  Workqueue: events_unbound sched_tick_remote
  RIP: 0010:sched_tick_remote+0xee/0x100
  ...
  Call Trace:
   process_one_work+0x18c/0x3a0
   worker_thread+0x30/0x380
   kthread+0x113/0x130
   ret_from_fork+0x22/0x40

The reason is that the code in collect_expired_timers() uses jiffies
unprotected:

    if (next_event > jiffies)
        base->clk = jiffies;

As the compiler is allowed to reload the value base->clk can advance
between the check and the store and in the worst case advance farther than
next event. That causes the timer expiry to be delayed until the wheel
pointer wraps around.

Convert the code to use READ_ONCE()

Fixes: 236968383c ("timers: Optimize collect_expired_timers() for NOHZ")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Liang ZhiCheng <liangzhicheng@baidu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1568894687-14499-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11 18:20:59 +02:00
Tom Zanussi
e010c98351 tracing: Make sure variable reference alias has correct var_ref_idx
commit 17f8607a16 upstream.

Original changelog from Steve Rostedt (except last sentence which
explains the problem, and the Fixes: tag):

I performed a three way histogram with the following commands:

echo 'irq_lat u64 lat pid_t pid' > synthetic_events
echo 'wake_lat u64 lat u64 irqlat pid_t pid' >> synthetic_events
echo 'hist:keys=common_pid:irqts=common_timestamp.usecs if function == 0xffffffff81200580' > events/timer/hrtimer_start/trigger
echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$irqts:onmatch(timer.hrtimer_start).irq_lat($lat,pid) if common_flags & 1' > events/sched/sched_waking/trigger
echo 'hist:keys=pid:wakets=common_timestamp.usecs,irqlat=lat' > events/synthetic/irq_lat/trigger
echo 'hist:keys=next_pid:lat=common_timestamp.usecs-$wakets,irqlat=$irqlat:onmatch(synthetic.irq_lat).wake_lat($lat,$irqlat,next_pid)' > events/sched/sched_switch/trigger
echo 1 > events/synthetic/wake_lat/enable

Basically I wanted to see:

 hrtimer_start (calling function tick_sched_timer)

Note:

  # grep tick_sched_timer /proc/kallsyms
ffffffff81200580 t tick_sched_timer

And save the time of that, and then record sched_waking if it is called
in interrupt context and with the same pid as the hrtimer_start, it
will record the latency between that and the waking event.

I then look at when the task that is woken is scheduled in, and record
the latency between the wakeup and the task running.

At the end, the wake_lat synthetic event will show the wakeup to
scheduled latency, as well as the irq latency in from hritmer_start to
the wakeup. The problem is that I found this:

          <idle>-0     [007] d...   190.485261: wake_lat: lat=27 irqlat=190485230 pid=698
          <idle>-0     [005] d...   190.485283: wake_lat: lat=40 irqlat=190485239 pid=10
          <idle>-0     [002] d...   190.488327: wake_lat: lat=56 irqlat=190488266 pid=335
          <idle>-0     [005] d...   190.489330: wake_lat: lat=64 irqlat=190489262 pid=10
          <idle>-0     [003] d...   190.490312: wake_lat: lat=43 irqlat=190490265 pid=77
          <idle>-0     [005] d...   190.493322: wake_lat: lat=54 irqlat=190493262 pid=10
          <idle>-0     [005] d...   190.497305: wake_lat: lat=35 irqlat=190497267 pid=10
          <idle>-0     [005] d...   190.501319: wake_lat: lat=50 irqlat=190501264 pid=10

The irqlat seemed quite large! Investigating this further, if I had
enabled the irq_lat synthetic event, I noticed this:

          <idle>-0     [002] d.s.   249.429308: irq_lat: lat=164968 pid=335
          <idle>-0     [002] d...   249.429369: wake_lat: lat=55 irqlat=249429308 pid=335

Notice that the timestamp of the irq_lat "249.429308" is awfully
similar to the reported irqlat variable. In fact, all instances were
like this. It appeared that:

  irqlat=$irqlat

Wasn't assigning the old $irqlat to the new irqlat variable, but
instead was assigning the $irqts to it.

The issue is that assigning the old $irqlat to the new irqlat variable
creates a variable reference alias, but the alias creation code
forgets to make sure the alias uses the same var_ref_idx to access the
reference.

Link: http://lkml.kernel.org/r/1567375321.5282.12.camel@kernel.org

Cc: Linux Trace Devel <linux-trace-devel@vger.kernel.org>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes: 7e8b88a30b ("tracing: Add hist trigger support for variable reference aliases")
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11 18:20:57 +02:00
Catalin Marinas
75443b7002 UPSTREAM: arm64: Tighten the PR_{SET, GET}_TAGGED_ADDR_CTRL prctl() unused arguments
(Upstream commit 3e91ec89f5).

Require that arg{3,4,5} of the PR_{SET,GET}_TAGGED_ADDR_CTRL prctl and
arg2 of the PR_GET_TAGGED_ADDR_CTRL prctl() are zero rather than ignored
for future extensions.

Acked-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: I8bb5c3eb4728440880c971d77904f7e45b571ddc
2019-10-07 15:27:39 -04:00
Catalin Marinas
f077ee2609 BACKPORT: arm64: Introduce prctl() options to control the tagged user addresses ABI
(Upstream commit 63f0c60379).

It is not desirable to relax the ABI to allow tagged user addresses into
the kernel indiscriminately. This patch introduces a prctl() interface
for enabling or disabling the tagged ABI with a global sysctl control
for preventing applications from enabling the relaxed ABI (meant for
testing user-space prctl() return error checking without reconfiguring
the kernel). The ABI properties are inherited by threads of the same
application and fork()'ed children but cleared on execve(). A Kconfig
option allows the overall disabling of the relaxed ABI.

The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
MTE-specific settings like imprecise vs precise exceptions.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Change-Id: I2d52c5589b05415faab315c116245f1058d64750
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
2019-10-07 15:27:39 -04:00
Greg Kroah-Hartman
75337a6f96 This is the 4.19.78 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2bbnkACgkQONu9yGCS
 aT6bng/+Jvj4gXLq2w+KmeN1SRbNu2ee+GjQsgQR6JZ3/dY5+rzPhuL37Op0fQd6
 UwnLhY4TL3PUiRCE8pNaVYI8nDfpxRkohYP+SMtGyoQKmoiy3W/SWe3CgEwniLwg
 k9TsuqxUsIUeEdSr6Bjbry0IU4VoZ3MP0cmMc1SrFUqJzFoGoUMHsHmQJvDPiy1f
 l7oZUgYrXArcnPhCda6peD9AJUfuIRKAM4BW47WN6Z9moqAAAa60eXN/u/hHp+Qc
 w55AZTSxel7CMbLDMnZ6/xWDgY/FHTLjkmhdIl9H6Qi8SbrJwq23zXnBO5xVameQ
 4MaLIgrp7M5sohAVFdqAVZYyZrkX91ssVujYwc+I6O1TYdja1Usj2TzdyL/MPqzY
 FLM9s3P0C3xKLdlg5gq9BxxnohIIhNmBy069NGsmfFd9jP2o6vFUEjiVxSY7Hp3r
 GpZP1ETAfMNHCG3jN3o5EkwyLoQHegFWfLpCIEF++k9gjo2CP00+Lf16RVrSsG47
 ILobFW5Wy4RXFyd7M8PrjSyuAtuzkUTzxg6P+6zuEtPwwvZPtadd97dGbA90pKvv
 eB1UPHu8/emMhW/8fwDBbpeQbIh0pHtX2yQq/LTItEHGr+YjRTIyH3Z/fST5itre
 ZDofsls4A+70TQ5/XOgjlDjco93iUs8KULDzQqvTFIuUIlvk3hQ=
 =7fm2
 -----END PGP SIGNATURE-----

Merge 4.19.78 into android-4.19

Changes in 4.19.78
	tpm: use tpm_try_get_ops() in tpm-sysfs.c.
	tpm: Fix TPM 1.2 Shutdown sequence to prevent future TPM operations
	drm/bridge: tc358767: Increase AUX transfer length limit
	drm/panel: simple: fix AUO g185han01 horizontal blanking
	video: ssd1307fb: Start page range at page_offset
	drm/stm: attach gem fence to atomic state
	drm/panel: check failure cases in the probe func
	drm/rockchip: Check for fast link training before enabling psr
	drm/radeon: Fix EEH during kexec
	gpu: drm: radeon: Fix a possible null-pointer dereference in radeon_connector_set_property()
	PCI: rpaphp: Avoid a sometimes-uninitialized warning
	ipmi_si: Only schedule continuously in the thread in maintenance mode
	clk: qoriq: Fix -Wunused-const-variable
	clk: sunxi-ng: v3s: add missing clock slices for MMC2 module clocks
	drm/amd/display: fix issue where 252-255 values are clipped
	drm/amd/display: reprogram VM config when system resume
	powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window
	clk: actions: Don't reference clk_init_data after registration
	clk: sirf: Don't reference clk_init_data after registration
	clk: sprd: Don't reference clk_init_data after registration
	clk: zx296718: Don't reference clk_init_data after registration
	powerpc/xmon: Check for HV mode when dumping XIVE info from OPAL
	powerpc/rtas: use device model APIs and serialization during LPM
	powerpc/futex: Fix warning: 'oldval' may be used uninitialized in this function
	powerpc/pseries/mobility: use cond_resched when updating device tree
	pinctrl: tegra: Fix write barrier placement in pmx_writel
	powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
	vfio_pci: Restore original state on release
	drm/nouveau/volt: Fix for some cards having 0 maximum voltage
	pinctrl: amd: disable spurious-firing GPIO IRQs
	clk: renesas: mstp: Set GENPD_FLAG_ALWAYS_ON for clock domain
	clk: renesas: cpg-mssr: Set GENPD_FLAG_ALWAYS_ON for clock domain
	drm/amd/display: support spdif
	drm/amdgpu/si: fix ASIC tests
	powerpc/64s/exception: machine check use correct cfar for late handler
	pstore: fs superblock limits
	clk: qcom: gcc-sdm845: Use floor ops for sdcc clks
	powerpc/pseries: correctly track irq state in default idle
	pinctrl: meson-gxbb: Fix wrong pinning definition for uart_c
	arm64: fix unreachable code issue with cmpxchg
	clk: at91: select parent if main oscillator or bypass is enabled
	powerpc: dump kernel log before carrying out fadump or kdump
	mbox: qcom: add APCS child device for QCS404
	clk: sprd: add missing kfree
	scsi: core: Reduce memory required for SCSI logging
	dma-buf/sw_sync: Synchronize signal vs syncpt free
	ext4: fix potential use after free after remounting with noblock_validity
	MIPS: Ingenic: Disable broken BTB lookup optimization.
	MIPS: tlbex: Explicitly cast _PAGE_NO_EXEC to a boolean
	i2c-cht-wc: Fix lockdep warning
	mfd: intel-lpss: Remove D3cold delay
	PCI: tegra: Fix OF node reference leak
	HID: wacom: Fix several minor compiler warnings
	livepatch: Nullify obj->mod in klp_module_coming()'s error path
	ARM: 8898/1: mm: Don't treat faults reported from cache maintenance as writes
	soundwire: intel: fix channel number reported by hardware
	ARM: 8875/1: Kconfig: default to AEABI w/ Clang
	rtc: snvs: fix possible race condition
	rtc: pcf85363/pcf85263: fix regmap error in set_time
	HID: apple: Fix stuck function keys when using FN
	PCI: rockchip: Propagate errors for optional regulators
	PCI: histb: Propagate errors for optional regulators
	PCI: imx6: Propagate errors for optional regulators
	PCI: exynos: Propagate errors for optional PHYs
	security: smack: Fix possible null-pointer dereferences in smack_socket_sock_rcv_skb()
	ARM: 8903/1: ensure that usable memory in bank 0 starts from a PMD-aligned address
	fat: work around race with userspace's read via blockdev while mounting
	pktcdvd: remove warning on attempting to register non-passthrough dev
	hypfs: Fix error number left in struct pointer member
	crypto: hisilicon - Fix double free in sec_free_hw_sgl()
	kbuild: clean compressed initramfs image
	ocfs2: wait for recovering done after direct unlock request
	kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K
	arm64: consider stack randomization for mmap base only when necessary
	mips: properly account for stack randomization and stack guard gap
	arm: properly account for stack randomization and stack guard gap
	arm: use STACK_TOP when computing mmap base address
	block: mq-deadline: Fix queue restart handling
	bpf: fix use after free in prog symbol exposure
	cxgb4:Fix out-of-bounds MSI-X info array access
	erspan: remove the incorrect mtu limit for erspan
	hso: fix NULL-deref on tty open
	ipv6: drop incoming packets having a v4mapped source address
	ipv6: Handle missing host route in __ipv6_ifa_notify
	net: ipv4: avoid mixed n_redirects and rate_tokens usage
	net: qlogic: Fix memory leak in ql_alloc_large_buffers
	net: Unpublish sk from sk_reuseport_cb before call_rcu
	nfc: fix memory leak in llcp_sock_bind()
	qmi_wwan: add support for Cinterion CLS8 devices
	rxrpc: Fix rxrpc_recvmsg tracepoint
	sch_dsmark: fix potential NULL deref in dsmark_init()
	udp: fix gso_segs calculations
	vsock: Fix a lockdep warning in __vsock_release()
	net: dsa: rtl8366: Check VLAN ID and not ports
	udp: only do GSO if # of segs > 1
	net/rds: Fix error handling in rds_ib_add_one()
	xen-netfront: do not use ~0U as error return value for xennet_fill_frags()
	tipc: fix unlimited bundling of small messages
	sch_cbq: validate TCA_CBQ_WRROPT to avoid crash
	soundwire: Kconfig: fix help format
	soundwire: fix regmap dependencies and align with other serial links
	Smack: Don't ignore other bprm->unsafe flags if LSM_UNSAFE_PTRACE is set
	smack: use GFP_NOFS while holding inode_smack::smk_lock
	NFC: fix attrs checks in netlink interface
	kexec: bail out upon SIGKILL when allocating memory.
	9p/cache.c: Fix memory leak in v9fs_cache_session_get_cookie
	Linux 4.19.78

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I02db9c4a0cc1f784e6ac1523599fcbed747a6375
2019-10-07 19:17:35 +02:00
Tetsuo Handa
d85bc11a68 kexec: bail out upon SIGKILL when allocating memory.
commit 7c3a6aedcd upstream.

syzbot found that a thread can stall for minutes inside kexec_load() after
that thread was killed by SIGKILL [1].  It turned out that the reproducer
was trying to allocate 2408MB of memory using kimage_alloc_page() from
kimage_load_normal_segment().  Let's check for SIGKILL before doing memory
allocation.

[1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e

Link: http://lkml.kernel.org/r/993c9185-d324-2640-d061-bed2dd18b1f7@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+8ab2d0f39fb79fe6ca40@syzkaller.appspotmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07 18:57:28 +02:00
Daniel Borkmann
ed568ca736 bpf: fix use after free in prog symbol exposure
commit c751798aa2 upstream.

syzkaller managed to trigger the warning in bpf_jit_free() which checks via
bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
in kallsyms, and subsequently trips over GPF when walking kallsyms entries:

  [...]
  8021q: adding VLAN 0 to HW filter on device batadv0
  8021q: adding VLAN 0 to HW filter on device batadv0
  WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events bpf_prog_free_deferred
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x113/0x167 lib/dump_stack.c:113
   panic+0x212/0x40b kernel/panic.c:214
   __warn.cold.8+0x1b/0x38 kernel/panic.c:571
   report_bug+0x1a4/0x200 lib/bug.c:186
   fixup_bug arch/x86/kernel/traps.c:178 [inline]
   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
   do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
  RIP: 0010:bpf_jit_free+0x1e8/0x2a0
  Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff <0f> 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
  RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
  RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
  RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
  RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
  R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
  R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
  BUG: unable to handle kernel paging request at fffffbfff400d000
  #PF error: [normal kernel read fault]
  PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events bpf_prog_free_deferred
  RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
  RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
  RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
  RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
  RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
  Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
  [...]

Upon further debugging, it turns out that whenever we trigger this
issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
but yet bpf_jit_free() reported that the entry is /in use/.

Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
fd is exposed to the public, a parallel close request came in right
before we attempted to do the bpf_prog_kallsyms_add().

Given at this time the prog reference count is one, we start to rip
everything underneath us via bpf_prog_release() -> bpf_prog_put().
The memory is eventually released via deferred free, so we're seeing
that bpf_jit_free() has a kallsym entry because we added it from
bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.

Therefore, move both notifications /before/ we install the fd. The
issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
because upon bpf_prog_get_fd_by_id() we'll take another reference to
the BPF prog, so we're still holding the original reference from the
bpf_prog_load().

Fixes: 6ee52e2a3f ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
Fixes: 74451e66d5 ("bpf: make jited programs visible in traces")
Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07 18:57:19 +02:00
Miroslav Benes
0f0ced702d livepatch: Nullify obj->mod in klp_module_coming()'s error path
[ Upstream commit 4ff96fb52c ]

klp_module_coming() is called for every module appearing in the system.
It sets obj->mod to a patched module for klp_object obj. Unfortunately
it leaves it set even if an error happens later in the function and the
patched module is not allowed to be loaded.

klp_is_object_loaded() uses obj->mod variable and could currently give a
wrong return value. The bug is probably harmless as of now.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07 18:57:10 +02:00
Greg Kroah-Hartman
7f1f24fed2 This is the 4.19.77 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2YehoACgkQONu9yGCS
 aT536hAA0w6XF1ogPi23xS5BQ386c3BjWaOJddu0PFkR9utguuZaDG5AKjnkHtm2
 foxKeCGyChCc1h6qFD2wLavsQMephzLWXkMw1/uXfwAPXGWY1hquD46zrFO0IYOc
 dsmKh5V8ZT70lmPTBHCDVowp1xUczNFhuNRHz1KzVQ3lsKMAhsRiy6vpsXWzIeOM
 VCrayI4jxTVZvoA9Odm9fTXpau0Qb9k4pqGGU84dz/xWfGzrz/AOmp+kWuZRGepS
 fQi+46HK3idmGGNBVGkJYkDdpkefdrYdjlVHkkoAZHlaB+QaOrre0trNArK+knRu
 jJIl8R/M50yYbkn97vVIpwoh19rV0BI7KUFKO4C4NCByOOV+9vjJ5g1FuZ7c3pmj
 XKZiBmFBJVqbI1uwgqEn+/Tv6DyEz4gUzo5GHOe2i/Bh2nSguHZzO+Yhat1U+J+Y
 3QVfrIS10jICWBAqm147FepGtNxbCPP3plyqbJdrMwgM6GOMd83v+rY/5okB2YK4
 SHhzuevQoxujjeoOgHKjIqTiCIaJMt5ZvlCFqODg8NgpjTNDAbCA/UUQWs7MlrqX
 6uwH2QLwk3h+bEXUfGzsbsk5isTDpHr1ch6SI9T675JHRZCE461RoR82XpjLOyHD
 90tBJk9pfKlLqSEyiaWPPpXErhoS5WCVKM5yuT/rSS/i2Ve4VH4=
 =Q+c9
 -----END PGP SIGNATURE-----

Merge 4.19.77 into android-4.19

Changes in 4.19.77
	arcnet: provide a buffer big enough to actually receive packets
	cdc_ncm: fix divide-by-zero caused by invalid wMaxPacketSize
	macsec: drop skb sk before calling gro_cells_receive
	net/phy: fix DP83865 10 Mbps HDX loopback disable function
	net: qrtr: Stop rx_worker before freeing node
	net/sched: act_sample: don't push mac header on ip6gre ingress
	net_sched: add max len check for TCA_KIND
	nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
	openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC
	ppp: Fix memory leak in ppp_write
	sch_netem: fix a divide by zero in tabledist()
	skge: fix checksum byte order
	usbnet: ignore endpoints with invalid wMaxPacketSize
	usbnet: sanity checking of packet sizes and device mtu
	net: sched: fix possible crash in tcf_action_destroy()
	tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
	net/mlx5: Add device ID of upcoming BlueField-2
	mISDN: enforce CAP_NET_RAW for raw sockets
	appletalk: enforce CAP_NET_RAW for raw sockets
	ax25: enforce CAP_NET_RAW for raw sockets
	ieee802154: enforce CAP_NET_RAW for raw sockets
	nfc: enforce CAP_NET_RAW for raw sockets
	nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
	ALSA: hda: Flush interrupts on disabling
	regulator: lm363x: Fix off-by-one n_voltages for lm3632 ldo_vpos/ldo_vneg
	ASoC: tlv320aic31xx: suppress error message for EPROBE_DEFER
	ASoC: sgtl5000: Fix of unmute outputs on probe
	ASoC: sgtl5000: Fix charge pump source assignment
	firmware: qcom_scm: Use proper types for dma mappings
	dmaengine: bcm2835: Print error in case setting DMA mask fails
	leds: leds-lp5562 allow firmware files up to the maximum length
	media: dib0700: fix link error for dibx000_i2c_set_speed
	media: mtk-cir: lower de-glitch counter for rc-mm protocol
	media: exynos4-is: fix leaked of_node references
	media: hdpvr: Add device num check and handling
	media: i2c: ov5640: Check for devm_gpiod_get_optional() error
	time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
	sched/fair: Fix imbalance due to CPU affinity
	sched/core: Fix CPU controller for !RT_GROUP_SCHED
	x86/apic: Make apic_pending_intr_clear() more robust
	sched/deadline: Fix bandwidth accounting at all levels after offline migration
	x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails
	x86/apic: Soft disable APIC before initializing it
	ALSA: hda - Show the fatal CORB/RIRB error more clearly
	ALSA: i2c: ak4xxx-adda: Fix a possible null pointer dereference in build_adc_controls()
	EDAC/mc: Fix grain_bits calculation
	media: iguanair: add sanity checks
	base: soc: Export soc_device_register/unregister APIs
	ALSA: usb-audio: Skip bSynchAddress endpoint check if it is invalid
	ia64:unwind: fix double free for mod->arch.init_unw_table
	EDAC/altera: Use the proper type for the IRQ status bits
	ASoC: rsnd: don't call clk_get_rate() under atomic context
	arm64/prefetch: fix a -Wtype-limits warning
	md/raid1: end bio when the device faulty
	md: don't call spare_active in md_reap_sync_thread if all member devices can't work
	md: don't set In_sync if array is frozen
	media: media/platform: fsl-viu.c: fix build for MICROBLAZE
	ACPI / processor: don't print errors for processorIDs == 0xff
	loop: Add LOOP_SET_DIRECT_IO to compat ioctl
	EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()
	efi: cper: print AER info of PCIe fatal error
	firmware: arm_scmi: Check if platform has released shmem before using
	sched/fair: Use rq_lock/unlock in online_fair_sched_group
	idle: Prevent late-arriving interrupts from disrupting offline
	media: gspca: zero usb_buf on error
	perf config: Honour $PERF_CONFIG env var to specify alternate .perfconfig
	perf test vfs_getname: Disable ~/.perfconfig to get default output
	media: mtk-mdp: fix reference count on old device tree
	media: fdp1: Reduce FCP not found message level to debug
	media: em28xx: modules workqueue not inited for 2nd device
	media: rc: imon: Allow iMON RC protocol for ffdc 7e device
	dmaengine: iop-adma: use correct printk format strings
	perf record: Support aarch64 random socket_id assignment
	media: vsp1: fix memory leak of dl on error return path
	media: i2c: ov5645: Fix power sequence
	media: omap3isp: Don't set streaming state on random subdevs
	media: imx: mipi csi-2: Don't fail if initial state times-out
	net: lpc-enet: fix printk format strings
	m68k: Prevent some compiler warnings in Coldfire builds
	ARM: dts: imx7d: cl-som-imx7: make ethernet work again
	ARM: dts: imx7-colibri: disable HS400
	media: radio/si470x: kill urb on error
	media: hdpvr: add terminating 0 at end of string
	ASoC: uniphier: Fix double reset assersion when transitioning to suspend state
	tools headers: Fixup bitsperlong per arch includes
	ASoC: sun4i-i2s: Don't use the oversample to calculate BCLK
	led: triggers: Fix a memory leak bug
	nbd: add missing config put
	media: mceusb: fix (eliminate) TX IR signal length limit
	media: dvb-frontends: use ida for pll number
	posix-cpu-timers: Sanitize bogus WARNONS
	media: dvb-core: fix a memory leak bug
	libperf: Fix alignment trap with xyarray contents in 'perf stat'
	EDAC/amd64: Recognize DRAM device type ECC capability
	EDAC/amd64: Decode syndrome before translating address
	PM / devfreq: passive: Use non-devm notifiers
	PM / devfreq: exynos-bus: Correct clock enable sequence
	media: cec-notifier: clear cec_adap in cec_notifier_unregister
	media: saa7146: add cleanup in hexium_attach()
	media: cpia2_usb: fix memory leaks
	media: saa7134: fix terminology around saa7134_i2c_eeprom_md7134_gate()
	perf trace beauty ioctl: Fix off-by-one error in cmd->string table
	media: ov9650: add a sanity check
	ASoC: es8316: fix headphone mixer volume table
	ACPI / CPPC: do not require the _PSD method
	sched/cpufreq: Align trace event behavior of fast switching
	x86/apic/vector: Warn when vector space exhaustion breaks affinity
	arm64: kpti: ensure patched kernel text is fetched from PoU
	x86/mm/pti: Do not invoke PTI functions when PTI is disabled
	ASoC: fsl_ssi: Fix clock control issue in master mode
	x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable()
	nvmet: fix data units read and written counters in SMART log
	nvme-multipath: fix ana log nsid lookup when nsid is not found
	ALSA: firewire-motu: add support for MOTU 4pre
	iommu/amd: Silence warnings under memory pressure
	libata/ahci: Drop PCS quirk for Denverton and beyond
	iommu/iova: Avoid false sharing on fq_timer_on
	libtraceevent: Change users plugin directory
	ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks
	ACPI: custom_method: fix memory leaks
	ACPI / PCI: fix acpi_pci_irq_enable() memory leak
	closures: fix a race on wakeup from closure_sync
	hwmon: (acpi_power_meter) Change log level for 'unsafe software power cap'
	md/raid1: fail run raid1 array when active disk less than one
	dmaengine: ti: edma: Do not reset reserved paRAM slots
	kprobes: Prohibit probing on BUG() and WARN() address
	s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
	x86/cpu: Add Tiger Lake to Intel family
	platform/x86: intel_pmc_core: Do not ioremap RAM
	ASoC: dmaengine: Make the pcm->name equal to pcm->id if the name is not set
	raid5: don't set STRIPE_HANDLE to stripe which is in batch list
	mmc: core: Clarify sdio_irq_pending flag for MMC_CAP2_SDIO_IRQ_NOTHREAD
	mmc: sdhci: Fix incorrect switch to HS mode
	mmc: core: Add helper function to indicate if SDIO IRQs is enabled
	mmc: dw_mmc: Re-store SDIO IRQs mask at system resume
	raid5: don't increment read_errors on EILSEQ return
	libertas: Add missing sentinel at end of if_usb.c fw_table
	e1000e: add workaround for possible stalled packet
	ALSA: hda - Drop unsol event handler for Intel HDMI codecs
	drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)
	media: ttusb-dec: Fix info-leak in ttusb_dec_send_command()
	ALSA: hda/realtek - Blacklist PC beep for Lenovo ThinkCentre M73/93
	iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems
	btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type
	media: omap3isp: Set device on omap3isp subdevs
	PM / devfreq: passive: fix compiler warning
	iwlwifi: fw: don't send GEO_TX_POWER_LIMIT command to FW version 36
	ALSA: firewire-tascam: handle error code when getting current source of clock
	ALSA: firewire-tascam: check intermediate state of clock status and retry
	scsi: scsi_dh_rdac: zero cdb in send_mode_select()
	scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag
	printk: Do not lose last line in kmsg buffer dump
	IB/mlx5: Free mpi in mp_slave mode
	IB/hfi1: Define variables as unsigned long to fix KASAN warning
	randstruct: Check member structs in is_pure_ops_struct()
	Revert "ceph: use ceph_evict_inode to cleanup inode's resource"
	ceph: use ceph_evict_inode to cleanup inode's resource
	ALSA: hda/realtek - PCI quirk for Medion E4254
	blk-mq: add callback of .cleanup_rq
	scsi: implement .cleanup_rq callback
	powerpc/imc: Dont create debugfs files for cpu-less nodes
	fuse: fix missing unlock_page in fuse_writepage()
	parisc: Disable HP HSC-PCI Cards to prevent kernel crash
	KVM: x86: always stop emulation on page fault
	KVM: x86: set ctxt->have_exception in x86_decode_insn()
	KVM: x86: Manually calculate reserved bits when loading PDPTRS
	media: sn9c20x: Add MSI MS-1039 laptop to flip_dmi_table
	media: don't drop front-end reference count for ->detach
	binfmt_elf: Do not move brk for INTERP-less ET_EXEC
	ASoC: Intel: NHLT: Fix debug print format
	ASoC: Intel: Skylake: Use correct function to access iomem space
	ASoC: Intel: Fix use of potentially uninitialized variable
	ARM: samsung: Fix system restart on S3C6410
	ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-up
	Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
	arm64: tlb: Ensure we execute an ISB following walk cache invalidation
	arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328
	alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP
	regulator: Defer init completion for a while after late_initcall
	efifb: BGRT: Improve efifb_bgrt_sanity_check
	gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
	memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
	memcg, kmem: do not fail __GFP_NOFAIL charges
	i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
	block: fix null pointer dereference in blk_mq_rq_timed_out()
	smb3: allow disabling requesting leases
	ovl: Fix dereferencing possible ERR_PTR()
	ovl: filter of trusted xattr results in audit
	btrfs: fix allocation of free space cache v1 bitmap pages
	Btrfs: fix use-after-free when using the tree modification log
	btrfs: Relinquish CPUs in btrfs_compare_trees
	btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
	btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
	Btrfs: fix race setting up and completing qgroup rescan workers
	md/raid6: Set R5_ReadError when there is read failure on parity disk
	md: don't report active array_state until after revalidate_disk() completes.
	md: only call set_in_sync() when it is expected to succeed.
	cfg80211: Purge frame registrations on iftype change
	/dev/mem: Bail out upon SIGKILL.
	ext4: fix warning inside ext4_convert_unwritten_extents_endio
	ext4: fix punch hole for inline_data file systems
	quota: fix wrong condition in is_quota_modification()
	hwrng: core - don't wait on add_early_randomness()
	i2c: riic: Clear NACK in tend isr
	CIFS: fix max ea value size
	CIFS: Fix oplock handling for SMB 2.1+ protocols
	md/raid0: avoid RAID0 data corruption due to layout confusion.
	fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock
	mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
	drm/amd/display: Restore backlight brightness after system resume
	Linux 4.19.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2c74f09f497a4b45b244a7dd263c5116533dfccb
2019-10-06 11:27:45 +02:00
Thadeu Lima de Souza Cascardo
3784576fc6 alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP
commit f18ddc13af upstream.

ENOTSUPP is not supposed to be returned to userspace. This was found on an
OpenPower machine, where the RTC does not support set_alarm.

On that system, a clock_nanosleep(CLOCK_REALTIME_ALARM, ...) results in
"524 Unknown error 524"

Replace it with EOPNOTSUPP which results in the expected "95 Operation not
supported" error.

Fixes: 1c6b39ad3f (alarmtimers: Return -ENOTSUPP if no RTC device is present)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190903171802.28314-1-cascardo@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05 13:10:07 +02:00
Vincent Whitchurch
40b071992c printk: Do not lose last line in kmsg buffer dump
commit c9dccacfcc upstream.

kmsg_dump_get_buffer() is supposed to select all the youngest log
messages which fit into the provided buffer.  It determines the correct
start index by using msg_print_text() with a NULL buffer to calculate
the size of each entry.  However, when performing the actual writes,
msg_print_text() only writes the entry to the buffer if the written len
is lesser than the size of the buffer.  So if the lengths of the
selected youngest log messages happen to precisely fill up the provided
buffer, the last log message is not included.

We don't want to modify msg_print_text() to fill up the buffer and start
returning a length which is equal to the size of the buffer, since
callers of its other users, such as kmsg_dump_get_line(), depend upon
the current behaviour.

Instead, fix kmsg_dump_get_buffer() to compensate for this.

For example, with the following two final prints:

[    6.427502] AAAAAAAAAAAAA
[    6.427769] BBBBBBBB12345

A dump of a 64-byte buffer filled by kmsg_dump_get_buffer(), before this
patch:

 00000000: 3c 30 3e 5b 20 20 20 20 36 2e 35 32 32 31 39 37  <0>[    6.522197
 00000010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a  ] AAAAAAAAAAAAA.
 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

After this patch:

 00000000: 3c 30 3e 5b 20 20 20 20 36 2e 34 35 36 36 37 38  <0>[    6.456678
 00000010: 5d 20 42 42 42 42 42 42 42 42 31 32 33 34 35 0a  ] BBBBBBBB12345.
 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

Link: http://lkml.kernel.org/r/20190711142937.4083-1-vincent.whitchurch@axis.com
Fixes: e2ae715d66 ("kmsg - kmsg_dump() use iterator to receive log buffer content")
To: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # v3.5+
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05 13:10:01 +02:00
Masami Hiramatsu
fad90d4bfa kprobes: Prohibit probing on BUG() and WARN() address
[ Upstream commit e336b40277 ]

Since BUG() and WARN() may use a trap (e.g. UD2 on x86) to
get the address where the BUG() has occurred, kprobes can not
do single-step out-of-line that instruction. So prohibit
probing on such address.

Without this fix, if someone put a kprobe on WARN(), the
kernel will crash with invalid opcode error instead of
outputing warning message, because kernel can not find
correct bug address.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N . Rao <naveen.n.rao@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/156750890133.19112.3393666300746167111.stgit@devnote2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:09:54 +02:00
Douglas RAILLARD
01e8f487ce sched/cpufreq: Align trace event behavior of fast switching
[ Upstream commit 77c84dd188 ]

Fast switching path only emits an event for the CPU of interest, whereas the
regular path emits an event for all the CPUs that had their frequency changed,
i.e. all the CPUs sharing the same policy.

With the current behavior, looking at cpu_frequency event for a given CPU that
is using the fast switching path will not give the correct frequency signal.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:09:51 +02:00
Thomas Gleixner
8d5fccff7b posix-cpu-timers: Sanitize bogus WARNONS
[ Upstream commit 692117c1f7 ]

Warning when p == NULL and then proceeding and dereferencing p does not
make any sense as the kernel will crash with a NULL pointer dereference
right away.

Bailing out when p == NULL and returning an error code does not cure the
underlying problem which caused p to be NULL. Though it might allow to
do proper debugging.

Same applies to the clock id check in set_process_cpu_timer().

Clean them up and make them return without trying to do further damage.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20190819143801.846497772@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:09:47 +02:00
Peter Zijlstra
5111102360 idle: Prevent late-arriving interrupts from disrupting offline
[ Upstream commit e78a7614f3 ]

Scheduling-clock interrupts can arrive late in the CPU-offline process,
after idle entry and the subsequent call to cpuhp_report_idle_dead().
Once execution passes the call to rcu_report_dead(), RCU is ignoring
the CPU, which results in lockdep complaints when the interrupt handler
uses RCU:

------------------------------------------------------------------------

=============================
WARNING: suspicious RCU usage
5.2.0-rc1+ #681 Not tainted
-----------------------------
kernel/sched/fair.c:9542 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from offline CPU!
rcu_scheduler_active = 2, debug_locks = 1
no locks held by swapper/5/0.

stack backtrace:
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.2.0-rc1+ #681
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
Call Trace:
 <IRQ>
 dump_stack+0x5e/0x8b
 trigger_load_balance+0xa8/0x390
 ? tick_sched_do_timer+0x60/0x60
 update_process_times+0x3b/0x50
 tick_sched_handle+0x2f/0x40
 tick_sched_timer+0x32/0x70
 __hrtimer_run_queues+0xd3/0x3b0
 hrtimer_interrupt+0x11d/0x270
 ? sched_clock_local+0xc/0x74
 smp_apic_timer_interrupt+0x79/0x200
 apic_timer_interrupt+0xf/0x20
 </IRQ>
RIP: 0010:delay_tsc+0x22/0x50
Code: ff 0f 1f 80 00 00 00 00 65 44 8b 05 18 a7 11 48 0f ae e8 0f 31 48 89 d6 48 c1 e6 20 48 09 c6 eb 0e f3 90 65 8b 05 fe a6 11 48 <41> 39 c0 75 18 0f ae e8 0f 31 48 c1 e2 20 48 09 c2 48 89 d0 48 29
RSP: 0000:ffff8f92c0157ed0 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000005 RBX: ffff8c861f356400 RCX: ffff8f92c0157e64
RDX: 000000321214c8cc RSI: 00000032120daa7f RDI: 0000000000260f15
RBP: 0000000000000005 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8c861ee18000 R15: ffff8c861ee18000
 cpuhp_report_idle_dead+0x31/0x60
 do_idle+0x1d5/0x200
 ? _raw_spin_unlock_irqrestore+0x2d/0x40
 cpu_startup_entry+0x14/0x20
 start_secondary+0x151/0x170
 secondary_startup_64+0xa4/0xb0

------------------------------------------------------------------------

This happens rarely, but can be forced by happen more often by
placing delays in cpuhp_report_idle_dead() following the call to
rcu_report_dead().  With this in place, the following rcutorture
scenario reproduces the problem within a few minutes:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 5 --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --configs "TREE04"

This commit uses the crude but effective expedient of moving the disabling
of interrupts within the idle loop to precede the cpu_is_offline()
check.  It also invokes tick_nohz_idle_stop_tick() instead of
tick_nohz_idle_stop_tick_protected() to shut off the scheduling-clock
interrupt.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
[ paulmck: Revert tick_nohz_idle_stop_tick_protected() removal, new callers. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:09:40 +02:00
Phil Auld
9addfbd409 sched/fair: Use rq_lock/unlock in online_fair_sched_group
[ Upstream commit a46d14eca7 ]

Enabling WARN_DOUBLE_CLOCK in /sys/kernel/debug/sched_features causes
warning to fire in update_rq_clock. This seems to be caused by onlining
a new fair sched group not using the rq lock wrappers.

  [] rq->clock_update_flags & RQCF_UPDATED
  [] WARNING: CPU: 5 PID: 54385 at kernel/sched/core.c:210 update_rq_clock+0xec/0x150

  [] Call Trace:
  []  online_fair_sched_group+0x53/0x100
  []  cpu_cgroup_css_online+0x16/0x20
  []  online_css+0x1c/0x60
  []  cgroup_apply_control_enable+0x231/0x3b0
  []  cgroup_mkdir+0x41b/0x530
  []  kernfs_iop_mkdir+0x61/0xa0
  []  vfs_mkdir+0x108/0x1a0
  []  do_mkdirat+0x77/0xe0
  []  do_syscall_64+0x55/0x1d0
  []  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Using the wrappers in online_fair_sched_group instead of the raw locking
removes this warning.

[ tglx: Use rq_*lock_irq() ]

Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20190801133749.11033-1-pauld@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:09:40 +02:00
Juri Lelli
0f30856944 sched/deadline: Fix bandwidth accounting at all levels after offline migration
[ Upstream commit 59d06cea11 ]

If a task happens to be throttled while the CPU it was running on gets
hotplugged off, the bandwidth associated with the task is not correctly
migrated with it when the replenishment timer fires (offline_migration).

Fix things up, for this_bw, running_bw and total_bw, when replenishment
timer fires and task is migrated (dl_task_offline_migration()).

Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-5-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:09:36 +02:00
Juri Lelli
f381d3d2c3 sched/core: Fix CPU controller for !RT_GROUP_SCHED
[ Upstream commit a07db5c086 ]

On !CONFIG_RT_GROUP_SCHED configurations it is currently not possible to
move RT tasks between cgroups to which CPU controller has been attached;
but it is oddly possible to first move tasks around and then make them
RT (setschedule to FIFO/RR).

E.g.:

  # mkdir /sys/fs/cgroup/cpu,cpuacct/group1
  # chrt -fp 10 $$
  # echo $$ > /sys/fs/cgroup/cpu,cpuacct/group1/tasks
  bash: echo: write error: Invalid argument
  # chrt -op 0 $$
  # echo $$ > /sys/fs/cgroup/cpu,cpuacct/group1/tasks
  # chrt -fp 10 $$
  # cat /sys/fs/cgroup/cpu,cpuacct/group1/tasks
  2345
  2598
  # chrt -p 2345
  pid 2345's current scheduling policy: SCHED_FIFO
  pid 2345's current scheduling priority: 10

Also, as Michal noted, it is currently not possible to enable CPU
controller on unified hierarchy with !CONFIG_RT_GROUP_SCHED (if there
are any kernel RT threads in root cgroup, they can't be migrated to the
newly created CPU controller's root in cgroup_update_dfl_csses()).

Existing code comes with a comment saying the "we don't support RT-tasks
being in separate groups". Such comment is however stale and belongs to
pre-RT_GROUP_SCHED times. Also, it doesn't make much sense for
!RT_GROUP_ SCHED configurations, since checks related to RT bandwidth
are not performed at all in these cases.

Make moving RT tasks between CPU controller groups viable by removing
special case check for RT (and DEADLINE) tasks.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: rostedt@goodmis.org
Link: https://lkml.kernel.org/r/20190719063455.27328-1-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:09:36 +02:00
Vincent Guittot
417cf53b4b sched/fair: Fix imbalance due to CPU affinity
[ Upstream commit f6cad8df6b ]

The load_balance() has a dedicated mecanism to detect when an imbalance
is due to CPU affinity and must be handled at parent level. In this case,
the imbalance field of the parent's sched_group is set.

The description of sg_imbalanced() gives a typical example of two groups
of 4 CPUs each and 4 tasks each with a cpumask covering 1 CPU of the first
group and 3 CPUs of the second group. Something like:

	{ 0 1 2 3 } { 4 5 6 7 }
	        *     * * *

But the load_balance fails to fix this UC on my octo cores system
made of 2 clusters of quad cores.

Whereas the load_balance is able to detect that the imbalanced is due to
CPU affinity, it fails to fix it because the imbalance field is cleared
before letting parent level a chance to run. In fact, when the imbalance is
detected, the load_balance reruns without the CPU with pinned tasks. But
there is no other running tasks in the situation described above and
everything looks balanced this time so the imbalance field is immediately
cleared.

The imbalance field should not be cleared if there is no other task to move
when the imbalance is detected.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1561996022-28829-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:09:36 +02:00
Paul E. McKenney
7cebdfa62f time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
[ Upstream commit 84ec3a0787 ]

time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint

The TASKS03 and TREE04 rcutorture scenarios produce the following
lockdep complaint:

	WARNING: inconsistent lock state
	5.2.0-rc1+ #513 Not tainted
	--------------------------------
	inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
	migration/1/14 [HC0[0]:SC0[0]:HE1:SE1] takes:
	(____ptrval____) (tick_broadcast_lock){?...}, at: tick_broadcast_offline+0xf/0x70
	{IN-HARDIRQ-W} state was registered at:
	  lock_acquire+0xb0/0x1c0
	  _raw_spin_lock_irqsave+0x3c/0x50
	  tick_broadcast_switch_to_oneshot+0xd/0x40
	  tick_switch_to_oneshot+0x4f/0xd0
	  hrtimer_run_queues+0xf3/0x130
	  run_local_timers+0x1c/0x50
	  update_process_times+0x1c/0x50
	  tick_periodic+0x26/0xc0
	  tick_handle_periodic+0x1a/0x60
	  smp_apic_timer_interrupt+0x80/0x2a0
	  apic_timer_interrupt+0xf/0x20
	  _raw_spin_unlock_irqrestore+0x4e/0x60
	  rcu_nocb_gp_kthread+0x15d/0x590
	  kthread+0xf3/0x130
	  ret_from_fork+0x3a/0x50
	irq event stamp: 171
	hardirqs last  enabled at (171): [<ffffffff8a201a37>] trace_hardirqs_on_thunk+0x1a/0x1c
	hardirqs last disabled at (170): [<ffffffff8a201a53>] trace_hardirqs_off_thunk+0x1a/0x1c
	softirqs last  enabled at (0): [<ffffffff8a264ee0>] copy_process.part.56+0x650/0x1cb0
	softirqs last disabled at (0): [<0000000000000000>] 0x0

        [...]

To reproduce, run the following rcutorture test:

 $ tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --configs "TASKS03 TREE04"

It turns out that tick_broadcast_offline() was an innocent bystander.
After all, interrupts are supposed to be disabled throughout
take_cpu_down(), and therefore should have been disabled upon entry to
tick_offline_cpu() and thus to tick_broadcast_offline().  This suggests
that one of the CPU-hotplug notifiers was incorrectly enabling interrupts,
and leaving them enabled on return.

Some debugging code showed that the culprit was sched_cpu_dying().
It had irqs enabled after return from sched_tick_stop().  Which in turn
had irqs enabled after return from cancel_delayed_work_sync().  Which is a
wrapper around __cancel_work_timer().  Which can sleep in the case where
something else is concurrently trying to cancel the same delayed work,
and as Thomas Gleixner pointed out on IRC, sleeping is a decidedly bad
idea when you are invoked from take_cpu_down(), regardless of the state
you leave interrupts in upon return.

Code inspection located no reason why the delayed work absolutely
needed to be canceled from sched_tick_stop():  The work is not
bound to the outgoing CPU by design, given that the whole point is
to collect statistics without disturbing the outgoing CPU.

This commit therefore simply drops the cancel_delayed_work_sync() from
sched_tick_stop().  Instead, a new ->state field is added to the tick_work
structure so that the delayed-work handler function sched_tick_remote()
can avoid reposting itself.  A cpu_is_offline() check is also added to
sched_tick_remote() to avoid mucking with the state of an offlined CPU
(though it does appear safe to do so).  The sched_tick_start() and
sched_tick_stop() functions also update ->state, and sched_tick_start()
also schedules the delayed work if ->state indicates that it is not
already in flight.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Apply Peter Zijlstra and Frederic Weisbecker atomics feedback. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190625165238.GJ26519@linux.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:09:35 +02:00
Greg Kroah-Hartman
abfd9e9bf7 This is the 4.19.76 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2S8YUACgkQONu9yGCS
 aT73mA/+OFA6raC1+AwctfsJTqwGjcQ45VFNdTl2odA+/o9MQwfKFkphJaYppxoz
 vm542K1/JBko4gZ5Kb9rz5kqbFvrMdlDfRBHNlhzakhaU0u3fLkKt2LeoTtSIPEA
 kkog51CZScZGTAxUW72t069k1szdP6sj0NzLHNGtaqOLIKdYaTegm5RuiQTk6L/5
 VIMc2ASMAnzPWCO4NPjmrDN7wHPkMQusyG+ja+C2VDlJTZl288eP+eWTWixZXsnh
 nepyuUTUa5pbutpKmeUbNsTek+UYKfk/rcTzaSj7OY2t34jbWBP5ezq/WVlLW88G
 kb3HliXydUljvrAOhAZTKmunGFb12g2TER/h7MUDvs9P6YC38KoCCqk5lMlix+et
 oS9mYLPwbPhOLB42whc0PH3dPLuOFCQLxSKeTBNVRumm4ddpwVHGCJWpCkAhR07S
 h/uLK8aMbrgPifWL3eohe+XsgTdpTXXFApcV8C2wdK9E/dw4BSrbYRG1oJtMSGjC
 KLYYA5CnXWD6oBX/at+aJAx2DZKDESexB8i0jx8ot1Wc0Ac4TFPtET88WGmNyCnm
 ssW37j5Z07SFCEUEW+yujMtiYNbqlVy1nRCRjXcoMqWA2kOQZdo7hr+ihcd2aGgD
 GxzNw8yZaH/CwaKU9kziQsWE0GFEz/supWGu5BgBTO0PZDyUmwg=
 =eUlg
 -----END PGP SIGNATURE-----

Merge 4.19.76 into android-4.19

Changes in 4.19.76
	Revert "Bluetooth: validate BLE connection interval updates"
	net/ibmvnic: free reset work of removed device from queue
	RDMA/restrack: Protect from reentry to resource return path
	powerpc/xive: Fix bogus error code returned by OPAL
	drm/amd/display: readd -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines
	IB/core: Add an unbound WQ type to the new CQ API
	HID: prodikeys: Fix general protection fault during probe
	HID: sony: Fix memory corruption issue on cleanup.
	HID: logitech: Fix general protection fault caused by Logitech driver
	HID: hidraw: Fix invalid read in hidraw_ioctl
	HID: Add quirk for HP X500 PIXART OEM mouse
	mtd: cfi_cmdset_0002: Use chip_good() to retry in do_write_oneword()
	crypto: talitos - fix missing break in switch statement
	CIFS: fix deadlock in cached root handling
	net/mlx5e: Set ECN for received packets using CQE indication
	net/mlx5e: don't set CHECKSUM_COMPLETE on SCTP packets
	mlx5: fix get_ip_proto()
	net/mlx5e: Allow reporting of checksum unnecessary
	net/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded
	net/mlx5e: Rx, Fixup skb checksum for packets with tail padding
	net/mlx5e: Rx, Check ip headers sanity
	iwlwifi: mvm: send BCAST management frames to the right station
	iwlwifi: mvm: always init rs_fw with 20MHz bandwidth rates
	media: tvp5150: fix switch exit in set control handler
	ASoC: Intel: cht_bsw_max98090_ti: Enable codec clock once and keep it enabled
	ASoC: fsl: Fix of-node refcount unbalance in fsl_ssi_probe_from_dt()
	ALSA: usb-audio: Add Hiby device family to quirks for native DSD support
	ALSA: usb-audio: Add DSD support for EVGA NU Audio
	ALSA: dice: fix wrong packet parameter for Alesis iO26
	ALSA: hda - Add laptop imic fixup for ASUS M9V laptop
	ALSA: hda - Apply AMD controller workaround for Raven platform
	objtool: Clobber user CFLAGS variable
	pinctrl: sprd: Use define directive for sprd_pinconf_params values
	power: supply: sysfs: ratelimit property read error message
	locking/lockdep: Add debug_locks check in __lock_downgrade()
	scsi: qla2xxx: Turn off IOCB timeout timer on IOCB completion
	scsi: qla2xxx: Remove all rports if fabric scan retry fails
	scsi: qla2xxx: Return switch command on a timeout
	Revert "drm/amd/powerplay: Enable/Disable NBPSTATE on On/OFF of UVD"
	bpf: libbpf: retry loading program on EAGAIN
	irqchip/gic-v3-its: Fix LPI release for Multi-MSI devices
	f2fs: check all the data segments against all node ones
	PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing it
	bcache: remove redundant LIST_HEAD(journal) from run_cache_set()
	initramfs: don't free a non-existent initrd
	blk-mq: change gfp flags to GFP_NOIO in blk_mq_realloc_hw_ctxs
	blk-mq: move cancel of requeue_work to the front of blk_exit_queue
	Revert "f2fs: avoid out-of-range memory access"
	dm zoned: fix invalid memory access
	net/ibmvnic: Fix missing { in __ibmvnic_reset
	f2fs: fix to do sanity check on segment bitmap of LFS curseg
	drm: Flush output polling on shutdown
	net: don't warn in inet diag when IPV6 is disabled
	Bluetooth: btrtl: HCI reset on close for Realtek BT chip
	ACPI: video: Add new hw_changes_brightness quirk, set it on PB Easynote MZ35
	drm/nouveau/disp/nv50-: fix center/aspect-corrected scaling
	xfs: don't crash on null attr fork xfs_bmapi_read
	netfilter: nft_socket: fix erroneous socket assignment
	Bluetooth: btrtl: Additional Realtek 8822CE Bluetooth devices
	net_sched: check cops->tcf_block in tc_bind_tclass()
	net/rds: An rds_sock is added too early to the hash table
	net/rds: Check laddr_check before calling it
	f2fs: use generic EFSBADCRC/EFSCORRUPTED
	Linux 4.19.76

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iee7c7dfb1bfcf16ad7eaffd9974d056622479030
2019-10-01 08:51:37 +02:00
Waiman Long
9423770eb3 locking/lockdep: Add debug_locks check in __lock_downgrade()
[ Upstream commit 7149258057 ]

Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
warning right after a previous lockdep warning. It is likely that the
previous warning turned off lock debugging causing the lockdep to have
inconsistency states leading to the lock downgrade warning.

Fix that by add a check for debug_locks at the beginning of
__lock_downgrade().

Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzbot+53383ae265fb161ef488@syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/1547093005-26085-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-01 08:26:07 +02:00
Greg Kroah-Hartman
de5730eaef This is the 4.19.75 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2FslsACgkQONu9yGCS
 aT4kYBAAkOZ1wVwFD4mFkUmKvLmsGlwwkY/5/kQneBDUj4VQG9/1PFSN7Cfb9DdJ
 zdIcsdsfx/J+41FKJe9rxgJL6ttB1L8ob6GYdCI/8uA23TUGCQB5RSF/cwGeZUSz
 RRvqm1gstRimh4c+kibgkr3yxwBIUTzBMlBz1OMTsbP9YVzheGPahml2/mJAyb6C
 z6ETlLmrw0VixyyyvAF6r210K9qftjK4nMMDeFvftgU/eJUr59jBhSkEirS3jo5G
 KKP0kD3wDiOzqhZ83qU0bEG9EIiayap6k9H3r1u4Qu0xjyc095Jta+3JFpOqd66u
 CLfAKO0wf/jVx3/3EzWLtnxfXIpcfWi7Vj6rcTjASOsH8PrCLageHbyoA5JmKGsW
 gp4HUgwdgQPtMU7rFXCVEcoLqu0uU3PUGkOQlcx9AYLoaE2LsTijcLLJqb0tZztr
 IetrhXFVmeMnz2/ejqvORZw3mLNYMTD6OfNATMEgh1LkXqaCCWXdTVj2Bsp4IcD8
 d63E8ftILxxanfNjRS0T5+kc+yCkQs8oNRqZGXQQ9zjVzXiu0kyKDIh93lC7V+yF
 EM4pO/+kEljtc6vP+2hdpCG7buwvhSklOs2TvWJpU7umwEfHfxeetvnQajDzk5n0
 XLPDc+B/ZThND8+DrlhHvkx4dMU7xtR6IDvix9XpME65pWiB7nk=
 =ebAT
 -----END PGP SIGNATURE-----

Merge 4.19.75 into android-4.19

Changes in 4.19.75
	netfilter: nf_flow_table: set default timeout after successful insertion
	HID: wacom: generic: read HID_DG_CONTACTMAX from any feature report
	RDMA/restrack: Release task struct which was hold by CM_ID object
	Input: elan_i2c - remove Lenovo Legion Y7000 PnpID
	powerpc/mm/radix: Use the right page size for vmemmap mapping
	USB: usbcore: Fix slab-out-of-bounds bug during device reset
	media: tm6000: double free if usb disconnect while streaming
	phy: renesas: rcar-gen3-usb2: Disable clearing VBUS in over-current
	ip6_gre: fix a dst leak in ip6erspan_tunnel_xmit
	udp: correct reuseport selection with connected sockets
	xen-netfront: do not assume sk_buff_head list is empty in error handling
	net_sched: let qdisc_put() accept NULL pointer
	KVM: coalesced_mmio: add bounds checking
	firmware: google: check if size is valid when decoding VPD data
	serial: sprd: correct the wrong sequence of arguments
	tty/serial: atmel: reschedule TX after RX was started
	mwifiex: Fix three heap overflow at parsing element in cfg80211_ap_settings
	nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds
	ieee802154: hwsim: Fix error handle path in hwsim_init_module
	ieee802154: hwsim: unregister hw while hwsim_subscribe_all_others fails
	ARM: dts: am57xx: Disable voltage switching for SD card
	ARM: OMAP2+: Fix missing SYSC_HAS_RESET_STATUS for dra7 epwmss
	bus: ti-sysc: Fix using configured sysc mask value
	s390/bpf: fix lcgr instruction encoding
	ARM: OMAP2+: Fix omap4 errata warning on other SoCs
	ARM: dts: dra74x: Fix iodelay configuration for mmc3
	ARM: OMAP1: ams-delta-fiq: Fix missing irq_ack
	bus: ti-sysc: Simplify cleanup upon failures in sysc_probe()
	s390/bpf: use 32-bit index for tail calls
	selftests/bpf: fix "bind{4, 6} deny specific IP & port" on s390
	tools: bpftool: close prog FD before exit on showing a single program
	fpga: altera-ps-spi: Fix getting of optional confd gpio
	netfilter: ebtables: Fix argument order to ADD_COUNTER
	netfilter: nft_flow_offload: missing netlink attribute policy
	netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_info
	NFSv4: Fix return values for nfs4_file_open()
	NFSv4: Fix return value in nfs_finish_open()
	NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup
	Kconfig: Fix the reference to the IDT77105 Phy driver in the description of ATM_NICSTAR_USE_IDT77105
	xdp: unpin xdp umem pages in error path
	qed: Add cleanup in qed_slowpath_start()
	ARM: 8874/1: mm: only adjust sections of valid mm structures
	batman-adv: Only read OGM2 tvlv_len after buffer len check
	bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
	r8152: Set memory to all 0xFFs on failed reg reads
	x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines
	netfilter: xt_physdev: Fix spurious error message in physdev_mt_check
	netfilter: nf_conntrack_ftp: Fix debug output
	NFSv2: Fix eof handling
	NFSv2: Fix write regression
	kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol
	cifs: set domainName when a domain-key is used in multiuser
	cifs: Use kzfree() to zero out the password
	usb: host: xhci-tegra: Set DMA mask correctly
	ARM: 8901/1: add a criteria for pfn_valid of arm
	ibmvnic: Do not process reset during or after device removal
	sky2: Disable MSI on yet another ASUS boards (P6Xxxx)
	i2c: designware: Synchronize IRQs when unregistering slave client
	perf/x86/intel: Restrict period on Nehalem
	perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
	amd-xgbe: Fix error path in xgbe_mod_init()
	tools/power x86_energy_perf_policy: Fix "uninitialized variable" warnings at -O2
	tools/power x86_energy_perf_policy: Fix argument parsing
	tools/power turbostat: fix buffer overrun
	net: aquantia: fix out of memory condition on rx side
	net: seeq: Fix the function used to release some memory in an error handling path
	dmaengine: ti: dma-crossbar: Fix a memory leak bug
	dmaengine: ti: omap-dma: Add cleanup in omap_dma_probe()
	x86/uaccess: Don't leak the AC flags into __get_user() argument evaluation
	x86/hyper-v: Fix overflow bug in fill_gva_list()
	keys: Fix missing null pointer check in request_key_auth_describe()
	iommu/amd: Flush old domains in kdump kernel
	iommu/amd: Fix race in increase_address_space()
	PCI: kirin: Fix section mismatch warning
	ovl: fix regression caused by overlapping layers detection
	floppy: fix usercopy direction
	binfmt_elf: move brk out of mmap when doing direct loader exec
	arm64: kpti: Whitelist Cortex-A CPUs that don't implement the CSV3 field
	media: technisat-usb2: break out of loop at end of buffer
	Linux 4.19.75

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1dd841f112ee81497cd085b102979f45ee5e6b9d
2019-09-21 07:55:26 +02:00
Marc Zyngier
9a74f799b9 kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol
[ Upstream commit 2a1a3fa0f2 ]

An arm64 kernel configured with

  CONFIG_KPROBES=y
  CONFIG_KALLSYMS=y
  # CONFIG_KALLSYMS_ALL is not set
  CONFIG_KALLSYMS_BASE_RELATIVE=y

reports the following kprobe failure:

  [    0.032677] kprobes: failed to populate blacklist: -22
  [    0.033376] Please take care of using kprobes.

It appears that kprobe fails to retrieve the symbol at address
0xffff000010081000, despite this symbol being in System.map:

  ffff000010081000 T __exception_text_start

This symbol is part of the first group of aliases in the
kallsyms_offsets array (symbol names generated using ugly hacks in
scripts/kallsyms.c):

  kallsyms_offsets:
          .long   0x1000 // do_undefinstr
          .long   0x1000 // efi_header_end
          .long   0x1000 // _stext
          .long   0x1000 // __exception_text_start
          .long   0x12b0 // do_cp15instr

Looking at the implementation of get_symbol_pos(), it returns the
lowest index for aliasing symbols. In this case, it return 0.

But kallsyms_lookup_size_offset() considers 0 as a failure, which
is obviously wrong (there is definitely a valid symbol living there).
In turn, the kprobe blacklisting stops abruptly, hence the original
error.

A CONFIG_KALLSYMS_ALL kernel wouldn't fail as there is always
some random symbols at the beginning of this array, which are never
looked up via kallsyms_lookup_size_offset.

Fix it by considering that get_symbol_pos() is always successful
(which is consistent with the other uses of this function).

Fixes: ffc5089196 ("[PATCH] Create kallsyms_lookup_size_offset()")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-21 07:17:02 +02:00