Commit graph

29,125 commits

Author SHA1 Message Date
Johannes Weiner
cf46cf40bc UPSTREAM: psi: Fix a division error in psi poll()
The psi window size is a u64 an can be up to 10 seconds right now,
which exceeds the lower 32 bits of the variable. We currently use
div_u64 for it, which is meant only for 32-bit divisors. The result is
garbage pressure sampling values and even potential div0 crashes.

Use div64_u64.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Jingfeng Xie <xiejingfeng@linux.alibaba.com>
Link: https://lkml.kernel.org/r/20191203183524.41378-3-hannes@cmpxchg.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

(cherry picked from commit c3466952ca)

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I49fdfd55751d1a2cde19666624c9c5d76dc78dad
2020-02-28 15:30:04 +00:00
Johannes Weiner
55013802e8 UPSTREAM: sched/psi: Fix sampling error and rare div0 crashes with cgroups and high uptime
Jingfeng reports rare div0 crashes in psi on systems with some uptime:

[58914.066423] divide error: 0000 [#1] SMP
[58914.070416] Modules linked in: ipmi_poweroff ipmi_watchdog toa overlay fuse tcp_diag inet_diag binfmt_misc aisqos(O) aisqos_hotfixes(O)
[58914.083158] CPU: 94 PID: 140364 Comm: kworker/94:2 Tainted: G W OE K 4.9.151-015.ali3000.alios7.x86_64 #1
[58914.093722] Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, BIOS 3.23.34 02/14/2019
[58914.102728] Workqueue: events psi_update_work
[58914.107258] task: ffff8879da83c280 task.stack: ffffc90059dcc000
[58914.113336] RIP: 0010:[] [] psi_update_stats+0x1c1/0x330
[58914.122183] RSP: 0018:ffffc90059dcfd60 EFLAGS: 00010246
[58914.127650] RAX: 0000000000000000 RBX: ffff8858fe98be50 RCX: 000000007744d640
[58914.134947] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00003594f700648e
[58914.142243] RBP: ffffc90059dcfdf8 R08: 0000359500000000 R09: 0000000000000000
[58914.149538] R10: 0000000000000000 R11: 0000000000000000 R12: 0000359500000000
[58914.156837] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8858fe98bd78
[58914.164136] FS: 0000000000000000(0000) GS:ffff887f7f380000(0000) knlGS:0000000000000000
[58914.172529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[58914.178467] CR2: 00007f2240452090 CR3: 0000005d5d258000 CR4: 00000000007606f0
[58914.185765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[58914.193061] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[58914.200360] PKRU: 55555554
[58914.203221] Stack:
[58914.205383] ffff8858fe98bd48 00000000000002f0 0000002e81036d09 ffffc90059dcfde8
[58914.213168] ffff8858fe98bec8 0000000000000000 0000000000000000 0000000000000000
[58914.220951] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[58914.228734] Call Trace:
[58914.231337] [] psi_update_work+0x22/0x60
[58914.237067] [] process_one_work+0x189/0x420
[58914.243063] [] worker_thread+0x4e/0x4b0
[58914.248701] [] ? process_one_work+0x420/0x420
[58914.254869] [] kthread+0xe6/0x100
[58914.259994] [] ? kthread_park+0x60/0x60
[58914.265640] [] ret_from_fork+0x39/0x50
[58914.271193] Code: 41 29 c3 4d 39 dc 4d 0f 42 dc <49> f7 f1 48 8b 13 48 89 c7 48 c1
[58914.279691] RIP [] psi_update_stats+0x1c1/0x330

The crashing instruction is trying to divide the observed stall time
by the sampling period. The period, stored in R8, is not 0, but we are
dividing by the lower 32 bits only, which are all 0 in this instance.

We could switch to a 64-bit division, but the period shouldn't be that
big in the first place. It's the time between the last update and the
next scheduled one, and so should always be around 2s and comfortably
fit into 32 bits.

The bug is in the initialization of new cgroups: we schedule the first
sampling event in a cgroup as an offset of sched_clock(), but fail to
initialize the last_update timestamp, and it defaults to 0. That
results in a bogusly large sampling period the first time we run the
sampling code, and consequently we underreport pressure for the first
2s of a cgroup's life. But worse, if sched_clock() is sufficiently
advanced on the system, and the user gets unlucky, the period's lower
32 bits can all be 0 and the sampling division will crash.

Fix this by initializing the last update timestamp to the creation
time of the cgroup, thus correctly marking the start of the first
pressure sampling period in a new cgroup.

Reported-by: Jingfeng Xie <xiejingfeng@linux.alibaba.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Link: https://lkml.kernel.org/r/20191203183524.41378-2-hannes@cmpxchg.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

(cherry picked from commit 3dfbe25c27)

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Iaada5c2f1a03cf38cbb053adde478f762ce40843
2020-02-28 15:29:50 +00:00
Miles Chen
88a47f1659 UPSTREAM: sched/psi: Correct overly pessimistic size calculation
When passing a equal or more then 32 bytes long string to psi_write(),
psi_write() copies 31 bytes to its buf and overwrites buf[30]
with '\0'. Which makes the input string 1 byte shorter than
it should be.

Fix it by copying sizeof(buf) bytes when nbytes >= sizeof(buf).

This does not cause problems in normal use case like:
"some 500000 10000000" or "full 500000 10000000" because they
are less than 32 bytes in length.

	/* assuming nbytes == 35 */
	char buf[32];

	buf_size = min(nbytes, (sizeof(buf) - 1)); /* buf_size = 31 */
	if (copy_from_user(buf, user_buf, buf_size))
		return -EFAULT;

	buf[buf_size - 1] = '\0'; /* buf[30] = '\0' */

Before:

 %cd /proc/pressure/
 %echo "123456789|123456789|123456789|1234" > memory
 [   22.473497] nbytes=35,buf_size=31
 [   22.473775] 123456789|123456789|123456789| (print 30 chars)
 %sh: write error: Invalid argument

 %echo "123456789|123456789|123456789|1" > memory
 [   64.916162] nbytes=32,buf_size=31
 [   64.916331] 123456789|123456789|123456789| (print 30 chars)
 %sh: write error: Invalid argument

After:

 %cd /proc/pressure/
 %echo "123456789|123456789|123456789|1234" > memory
 [  254.837863] nbytes=35,buf_size=32
 [  254.838541] 123456789|123456789|123456789|1 (print 31 chars)
 %sh: write error: Invalid argument

 %echo "123456789|123456789|123456789|1" > memory
 [ 9965.714935] nbytes=32,buf_size=32
 [ 9965.715096] 123456789|123456789|123456789|1 (print 31 chars)
 %sh: write error: Invalid argument

Also remove the superfluous parentheses.

Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Cc: <linux-mediatek@lists.infradead.org>
Cc: <wsd_upstream@mediatek.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190912103452.13281-1-miles.chen@mediatek.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

(cherry picked from commit 4adcdcea71)

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9371b4d5e465bb8b84ff7adf5f40f30696c6ff70
2020-02-28 15:28:23 +00:00
Sami Tolvanen
8028f78053 ANDROID: Disable wq fp check in CFI builds
With non-canonical CFI, LLVM generates jump table entries for external
symbols in modules and as a result, a function pointer passed from a
module to the core kernel will have a different address.

Disable the warning for now.

Bug: 145210207
Change-Id: Ifdcee3479280f7b97abdee6b4c746f447e0944e6
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Alistair Delva <adelva@google.com>
2020-02-27 00:09:59 +00:00
Todd Kjos
08256862e0 ANDROID: increase limit on sched-tune boost groups
Some devices need an additional sched-tune boost group to
optimize performance for key tasks

Bug: 150302001
Change-Id: I392c8cc05a8851f1d416c381b4a27242924c2c27
Signed-off-by: Todd Kjos <tkjos@google.com>
2020-02-26 21:55:29 +00:00
Greg Kroah-Hartman
4dc4199770 This is the 4.19.106 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5TfLwACgkQONu9yGCS
 aT5wlRAAhZELK39c78NMCTZKHtKGLsGb2os2IiI7zIRbqNNwnvJi+jAc3kgbS9jP
 +W+wnhYFtFisDvqdCQ009I6A0NA1p3Nqy166JplW0iIg1e7rgUKKUfabCN9sJmjh
 HGK913cJlHwGmkSxq//sBucBwWhYYGaHec28pZ7uCFATjWrTaH3G4VrvLStuicYR
 YgS9MH261tWJKJm5+V2MxnOOI0103+Uey+xVqwSnLlV+qmasxwDCMU5ae+SK7e7f
 cXIkNZwvDph1zunekHg+jd64GN3GYswXVcRighWP0n7Lr+0tGPN7SY5pvZIjZLv/
 sdroyrqAxytTYP32hypIUgsToVvJr7zXD09LGdsgOCKVwFVn8yl1e4zgGKH3L9Xu
 OK2krI90v1MVevibyaNndZ4UDKilF75oE2YYDOFW/BU1lorFAIzk4hh15CfKc8s1
 KHRjePfcgQREs/SGK8k2BAmf/JwxFN1/Ro5dl7MvKn07ZYqx6QOwUoMhgxspIntN
 9TlFw6elu1RSwu2BFts9wvoHO1tr7GZBa1cVkNF8qV1rzaGVY68aLDvvHGdffD6W
 JgX+BCfr6vcN7R4izak1RxzAoqDrRxS0vWoC1vVsPqeIIZydSxpYDquaFnbZm+Wc
 MRuh5gpQ2PzTXuMLeBB+ig6UnzsAO3x+3yIG/l5ZmmYxJbMFBKU=
 =zE/i
 -----END PGP SIGNATURE-----

Merge 4.19.106 into android-4.19

Changes in 4.19.106
	core: Don't skip generic XDP program execution for cloned SKBs
	enic: prevent waking up stopped tx queues over watchdog reset
	net/smc: fix leak of kernel memory to user space
	net: dsa: tag_qca: Make sure there is headroom for tag
	net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGS
	net/sched: flower: add missing validation of TCA_FLOWER_FLAGS
	Revert "KVM: nVMX: Use correct root level for nested EPT shadow page tables"
	Revert "KVM: VMX: Add non-canonical check on writes to RTIT address MSRs"
	KVM: nVMX: Use correct root level for nested EPT shadow page tables
	drm/gma500: Fixup fbdev stolen size usage evaluation
	cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
	brcmfmac: Fix use after free in brcmf_sdio_readframes()
	leds: pca963x: Fix open-drain initialization
	ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT
	ALSA: ctl: allow TLV read operation for callback type of element in locked case
	gianfar: Fix TX timestamping with a stacked DSA driver
	pinctrl: sh-pfc: sh7264: Fix CAN function GPIOs
	pxa168fb: Fix the function used to release some memory in an error handling path
	media: i2c: mt9v032: fix enum mbus codes and frame sizes
	powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number
	gpio: gpio-grgpio: fix possible sleep-in-atomic-context bugs in grgpio_irq_map/unmap()
	iommu/vt-d: Fix off-by-one in PASID allocation
	char/random: silence a lockdep splat with printk()
	media: sti: bdisp: fix a possible sleep-in-atomic-context bug in bdisp_device_run()
	pinctrl: baytrail: Do not clear IRQ flags on direct-irq enabled pins
	efi/x86: Map the entire EFI vendor string before copying it
	MIPS: Loongson: Fix potential NULL dereference in loongson3_platform_init()
	sparc: Add .exit.data section.
	uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()
	usb: gadget: udc: fix possible sleep-in-atomic-context bugs in gr_probe()
	usb: dwc2: Fix IN FIFO allocation
	clocksource/drivers/bcm2835_timer: Fix memory leak of timer
	kselftest: Minimise dependency of get_size on C library interfaces
	jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
	x86/sysfb: Fix check for bad VRAM size
	pwm: omap-dmtimer: Simplify error handling
	s390/pci: Fix possible deadlock in recover_store()
	powerpc/iov: Move VF pdev fixup into pcibios_fixup_iov()
	tracing: Fix tracing_stat return values in error handling paths
	tracing: Fix very unlikely race of registering two stat tracers
	ARM: 8952/1: Disable kmemleak on XIP kernels
	ext4, jbd2: ensure panic when aborting with zero errno
	ath10k: Correct the DMA direction for management tx buffers
	drm/amd/display: Retrain dongles when SINK_COUNT becomes non-zero
	nbd: add a flush_workqueue in nbd_start_device
	KVM: s390: ENOTSUPP -> EOPNOTSUPP fixups
	kconfig: fix broken dependency in randconfig-generated .config
	clk: qcom: rcg2: Don't crash if our parent can't be found; return an error
	drm/amdgpu: remove 4 set but not used variable in amdgpu_atombios_get_connector_info_from_object_table
	drm/amdgpu: Ensure ret is always initialized when using SOC15_WAIT_ON_RREG
	regulator: rk808: Lower log level on optional GPIOs being not available
	net/wan/fsl_ucc_hdlc: reject muram offsets above 64K
	NFC: port100: Convert cpu_to_le16(le16_to_cpu(E1) + E2) to use le16_add_cpu().
	selinux: fall back to ref-walk if audit is required
	arm64: dts: allwinner: H6: Add PMU mode
	arm: dts: allwinner: H3: Add PMU node
	selinux: ensure we cleanup the internal AVC counters on error in avc_insert()
	arm64: dts: qcom: msm8996: Disable USB2 PHY suspend by core
	ARM: dts: imx6: rdu2: Disable WP for USDHC2 and USDHC3
	ARM: dts: imx6: rdu2: Limit USBH1 to Full Speed
	PCI: iproc: Apply quirk_paxc_bridge() for module as well as built-in
	media: cx23885: Add support for AVerMedia CE310B
	PCI: Add generic quirk for increasing D3hot delay
	PCI: Increase D3 delay for AMD Ryzen5/7 XHCI controllers
	media: v4l2-device.h: Explicitly compare grp{id,mask} to zero in v4l2_device macros
	reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling
	r8169: check that Realtek PHY driver module is loaded
	fore200e: Fix incorrect checks of NULL pointer dereference
	netfilter: nft_tunnel: add the missing ERSPAN_VERSION nla_policy
	ALSA: usx2y: Adjust indentation in snd_usX2Y_hwdep_dsp_status
	b43legacy: Fix -Wcast-function-type
	ipw2x00: Fix -Wcast-function-type
	iwlegacy: Fix -Wcast-function-type
	rtlwifi: rtl_pci: Fix -Wcast-function-type
	orinoco: avoid assertion in case of NULL pointer
	ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1
	scsi: ufs: Complete pending requests in host reset and restore path
	scsi: aic7xxx: Adjust indentation in ahc_find_syncrate
	drm/mediatek: handle events when enabling/disabling crtc
	ARM: dts: r8a7779: Add device node for ARM global timer
	selinux: ensure we cleanup the internal AVC counters on error in avc_update()
	dmaengine: Store module owner in dma_device struct
	dmaengine: imx-sdma: Fix memory leak
	crypto: chtls - Fixed memory leak
	x86/vdso: Provide missing include file
	PM / devfreq: rk3399_dmc: Add COMPILE_TEST and HAVE_ARM_SMCCC dependency
	pinctrl: sh-pfc: sh7269: Fix CAN function GPIOs
	reset: uniphier: Add SCSSI reset control for each channel
	RDMA/rxe: Fix error type of mmap_offset
	clk: sunxi-ng: add mux and pll notifiers for A64 CPU clock
	ALSA: sh: Fix unused variable warnings
	clk: uniphier: Add SCSSI clock gate for each channel
	ALSA: sh: Fix compile warning wrt const
	tools lib api fs: Fix gcc9 stringop-truncation compilation error
	ACPI: button: Add DMI quirk for Razer Blade Stealth 13 late 2019 lid switch
	mlx5: work around high stack usage with gcc
	drm: remove the newline for CRC source name.
	ARM: dts: stm32: Add power-supply for DSI panel on stm32f469-disco
	usbip: Fix unsafe unaligned pointer usage
	udf: Fix free space reporting for metadata and virtual partitions
	staging: rtl8188: avoid excessive stack usage
	IB/hfi1: Add software counter for ctxt0 seq drop
	soc/tegra: fuse: Correct straps' address for older Tegra124 device trees
	efi/x86: Don't panic or BUG() on non-critical error conditions
	rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nulls
	Input: edt-ft5x06 - work around first register access error
	x86/nmi: Remove irq_work from the long duration NMI handler
	wan: ixp4xx_hss: fix compile-testing on 64-bit
	ASoC: atmel: fix build error with CONFIG_SND_ATMEL_SOC_DMA=m
	tty: synclinkmp: Adjust indentation in several functions
	tty: synclink_gt: Adjust indentation in several functions
	visorbus: fix uninitialized variable access
	driver core: platform: Prevent resouce overflow from causing infinite loops
	driver core: Print device when resources present in really_probe()
	bpf: Return -EBADRQC for invalid map type in __bpf_tx_xdp_map
	vme: bridges: reduce stack usage
	drm/nouveau/secboot/gm20b: initialize pointer in gm20b_secboot_new()
	drm/nouveau/gr/gk20a,gm200-: add terminators to method lists read from fw
	drm/nouveau: Fix copy-paste error in nouveau_fence_wait_uevent_handler
	drm/nouveau/drm/ttm: Remove set but not used variable 'mem'
	drm/nouveau/fault/gv100-: fix memory leak on module unload
	drm/vmwgfx: prevent memory leak in vmw_cmdbuf_res_add
	usb: musb: omap2430: Get rid of musb .set_vbus for omap2430 glue
	iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
	f2fs: set I_LINKABLE early to avoid wrong access by vfs
	f2fs: free sysfs kobject
	scsi: iscsi: Don't destroy session if there are outstanding connections
	arm64: fix alternatives with LLVM's integrated assembler
	drm/amd/display: fixup DML dependencies
	watchdog/softlockup: Enforce that timestamp is valid on boot
	f2fs: fix memleak of kobject
	x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
	pwm: omap-dmtimer: Remove PWM chip in .remove before making it unfunctional
	cmd64x: potential buffer overflow in cmd64x_program_timings()
	ide: serverworks: potential overflow in svwks_set_pio_mode()
	pwm: Remove set but not set variable 'pwm'
	btrfs: fix possible NULL-pointer dereference in integrity checks
	btrfs: safely advance counter when looking up bio csums
	btrfs: device stats, log when stats are zeroed
	module: avoid setting info->name early in case we can fall back to info->mod->name
	remoteproc: Initialize rproc_class before use
	irqchip/mbigen: Set driver .suppress_bind_attrs to avoid remove problems
	ALSA: hda/hdmi - add retry logic to parse_intel_hdmi()
	kbuild: use -S instead of -E for precise cc-option test in Kconfig
	x86/decoder: Add TEST opcode to Group3-2
	s390: adjust -mpacked-stack support check for clang 10
	s390/ftrace: generate traced function stack frame
	driver core: platform: fix u32 greater or equal to zero comparison
	ALSA: hda - Add docking station support for Lenovo Thinkpad T420s
	drm/nouveau/mmu: fix comptag memory leak
	powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV
	bcache: cached_dev_free needs to put the sb page
	iommu/vt-d: Remove unnecessary WARN_ON_ONCE()
	selftests: bpf: Reset global state between reuseport test runs
	jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record
	jbd2: make sure ESHUTDOWN to be recorded in the journal superblock
	ARM: 8951/1: Fix Kexec compilation issue.
	hostap: Adjust indentation in prism2_hostapd_add_sta
	iwlegacy: ensure loop counter addr does not wrap and cause an infinite loop
	cifs: fix NULL dereference in match_prepath
	bpf: map_seq_next should always increase position index
	ceph: check availability of mds cluster on mount after wait timeout
	rbd: work around -Wuninitialized warning
	irqchip/gic-v3: Only provision redistributors that are enabled in ACPI
	drm/nouveau/disp/nv50-: prevent oops when no channel method map provided
	ftrace: fpid_next() should increase position index
	trigger_next should increase position index
	radeon: insert 10ms sleep in dce5_crtc_load_lut
	ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
	lib/scatterlist.c: adjust indentation in __sg_alloc_table
	reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
	bcache: explicity type cast in bset_bkey_last()
	irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL
	iwlwifi: mvm: Fix thermal zone registration
	microblaze: Prevent the overflow of the start
	brd: check and limit max_part par
	drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency
	drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage
	NFS: Fix memory leaks
	help_next should increase position index
	cifs: log warning message (once) if out of disk space
	virtio_balloon: prevent pfn array overflow
	mlxsw: spectrum_dpipe: Add missing error path
	drm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2)
	Linux 4.19.106

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia1032b50dd82b42e13973120dcbf94ae7b864648
2020-02-24 09:13:25 +01:00
Vasily Averin
9ed840b756 trigger_next should increase position index
[ Upstream commit 6722b23e7a ]

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

Without patch:
 # dd bs=30 skip=1 if=/sys/kernel/tracing/events/sched/sched_switch/trigger
 dd: /sys/kernel/tracing/events/sched/sched_switch/trigger: cannot skip to specified offset
 n traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
 # Available triggers:
 # traceon traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
 6+1 records in
 6+1 records out
 206 bytes copied, 0.00027916 s, 738 kB/s

Notice the printing of "# Available triggers:..." after the line.

With the patch:
 # dd bs=30 skip=1 if=/sys/kernel/tracing/events/sched/sched_switch/trigger
 dd: /sys/kernel/tracing/events/sched/sched_switch/trigger: cannot skip to specified offset
 n traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
 2+1 records in
 2+1 records out
 88 bytes copied, 0.000526867 s, 167 kB/s

It only prints the end of the file, and does not restart.

Link: http://lkml.kernel.org/r/3c35ee24-dd3a-8119-9c19-552ed253388a@virtuozzo.com

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:52 +01:00
Vasily Averin
ddb005d906 ftrace: fpid_next() should increase position index
[ Upstream commit e4075e8bdf ]

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

Without patch:
 # dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid
 dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset
 id
 no pid
 2+1 records in
 2+1 records out
 10 bytes copied, 0.000213285 s, 46.9 kB/s

Notice the "id" followed by "no pid".

With the patch:
 # dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid
 dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset
 id
 0+1 records in
 0+1 records out
 3 bytes copied, 0.000202112 s, 14.8 kB/s

Notice that it only prints "id" and not the "no pid" afterward.

Link: http://lkml.kernel.org/r/4f87c6ad-f114-30bb-8506-c32274ce2992@virtuozzo.com

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:52 +01:00
Vasily Averin
ca2b459365 bpf: map_seq_next should always increase position index
[ Upstream commit 90435a7891 ]

If seq_file .next fuction does not change position index,
read after some lseek can generate an unexpected output.

See also: https://bugzilla.kernel.org/show_bug.cgi?id=206283

v1 -> v2: removed missed increment in end of function

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/eca84fdd-c374-a154-d874-6c7b55fc3bc4@virtuozzo.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:51 +01:00
Jessica Yu
c371b1e41f module: avoid setting info->name early in case we can fall back to info->mod->name
[ Upstream commit 708e0ada19 ]

In setup_load_info(), info->name (which contains the name of the module,
mostly used for early logging purposes before the module gets set up)
gets unconditionally assigned if .modinfo is missing despite the fact
that there is an if (!info->name) check near the end of the function.
Avoid assigning a placeholder string to info->name if .modinfo doesn't
exist, so that we can fall back to info->mod->name later on.

Fixes: 5fdc7db644 ("module: setup load info before module_sig_check()")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:49 +01:00
Thomas Gleixner
c2913e2c50 watchdog/softlockup: Enforce that timestamp is valid on boot
[ Upstream commit 11e31f608b ]

Robert reported that during boot the watchdog timestamp is set to 0 for one
second which is the indicator for a watchdog reset.

The reason for this is that the timestamp is in seconds and the time is
taken from sched clock and divided by ~1e9. sched clock starts at 0 which
means that for the first second during boot the watchdog timestamp is 0,
i.e. reset.

Use ULONG_MAX as the reset indicator value so the watchdog works correctly
right from the start. ULONG_MAX would only conflict with a real timestamp
if the system reaches an uptime of 136 years on 32bit and almost eternity
on 64bit.

Reported-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87o8v3uuzl.fsf@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:49 +01:00
Steven Rostedt (VMware)
56d3793229 tracing: Fix very unlikely race of registering two stat tracers
[ Upstream commit dfb6cd1e65 ]

Looking through old emails in my INBOX, I came across a patch from Luis
Henriques that attempted to fix a race of two stat tracers registering the
same stat trace (extremely unlikely, as this is done in the kernel, and
probably doesn't even exist). The submitted patch wasn't quite right as it
needed to deal with clean up a bit better (if two stat tracers were the
same, it would have the same files).

But to make the code cleaner, all we needed to do is to keep the
all_stat_sessions_mutex held for most of the registering function.

Link: http://lkml.kernel.org/r/1410299375-20068-1-git-send-email-luis.henriques@canonical.com

Fixes: 002bb86d8d ("tracing/ftrace: separate events tracing and stats tracing engine")
Reported-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:39 +01:00
Luis Henriques
fb0085070a tracing: Fix tracing_stat return values in error handling paths
[ Upstream commit afccc00f75 ]

tracing_stat_init() was always returning '0', even on the error paths.  It
now returns -ENODEV if tracing_init_dentry() fails or -ENOMEM if it fails
to created the 'trace_stat' debugfs directory.

Link: http://lkml.kernel.org/r/1410299381-20108-1-git-send-email-luis.henriques@canonical.com

Fixes: ed6f1c996b ("tracing: Check return value of tracing_init_dentry()")
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
[ Pulled from the archeological digging of my INBOX ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:39 +01:00
Peter Zijlstra
b9dc4d61b5 cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
[ Upstream commit 45178ac0ce ]

Paul reported a very sporadic, rcutorture induced, workqueue failure.
When the planets align, the workqueue rescuer's self-migrate fails and
then triggers a WARN for running a work on the wrong CPU.

Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call
could be ignored! When stopper->enabled is false, stop_machine will
insta complete the work, without actually doing the work. Worse, it
will not WARN about this (we really should fix this).

It turns out there is a small window where a freshly online'ed CPU is
marked 'online' but doesn't yet have the stopper task running:

	BP				AP

	bringup_cpu()
	  __cpu_up(cpu, idle)	 -->	start_secondary()
					...
					cpu_startup_entry()
	  bringup_wait_for_ap()
	    wait_for_ap_thread() <--	  cpuhp_online_idle()
					  while (1)
					    do_idle()

					... available to run kthreads ...

	    stop_machine_unpark()
	      stopper->enable = true;

Close this by moving the stop_machine_unpark() into
cpuhp_online_idle(), such that the stopper thread is ready before we
start the idle loop and schedule.

Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Debugged-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:35 +01:00
Quentin Perret
2a557de670 UPSTREAM: sched/topology: Introduce a sysctl for Energy Aware Scheduling
In its current state, Energy Aware Scheduling (EAS) starts automatically
on asymmetric platforms having an Energy Model (EM). However, there are
users who want to have an EM (for thermal management for example), but
don't want EAS with it.

In order to let users disable EAS explicitly, introduce a new sysctl
called 'sched_energy_aware'. It is enabled by default so that EAS can
start automatically on platforms where it makes sense. Flipping it to 0
rebuilds the scheduling domains and disables EAS.

Bug: 120440300
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-11-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 8d5d0cfb63)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I4ca842d07b82869cfab7542c8c4351f631e1024d
2020-02-19 10:50:59 +00:00
Greg Kroah-Hartman
4eee97caec This is the 4.19.104 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5HEigACgkQONu9yGCS
 aT7Gcw/6AkDGkK5U/aDpKMqWiRmZUqDIg8U9xR+44Gl57Q71vicrzq8NGPHxxbsF
 slWoCyXLVSD7bMWGsTD0qJR8muROAraMxDl8dCxojEXHnXFMx4A4Cf0h1E0lY0mu
 Jq/O9m33ZMSppjio88sCcLpo0pbXF+cCX1CY87NI5QUitUzHgRh18W8BtyFpMMI8
 eC0Fc+hMWax3+qqHt/hFVpufaTKm35zLCpGjGAJiHd7GFvqUJnuAzBYCs1Cf8NO1
 KrrL3l/IWk8z3Z0Wc9PbBz309a9H6FVpjrXSXj6URkxjtqJ0F0mBMaIYxhaUF8PD
 CHY5xLyqKodC8/7O5zNOrP80oT9nqJvsmKwUwlG34IJuMVaq/o+hZu+88JVB02Yw
 v9XVcaQda5aZgWF9cBWzFQEcNwHFDCQ9VNidLDcHJLGPyFo/BogvMo8T4yPM9tI0
 O0PSFm/yYu0airZSCzIbPzuF2Iv+iilVtq+o10VRDsGtEYAOzTL7nA01MkdXFhwy
 4V+Q51C90TGo13BnnZ6xpEqjspuDWgeOD71/xkQ5cnyFgam0XQq/5R6JJghJIHOP
 7p8NMMyNhK2FnOGrFUgqvwBCp6Dap1ISZyKvie1Z8vuCJsZcwMVIw8fxAzoZWOjj
 MlmmePjlbC7XTFxjdo0jrQTdvBwq+gFgNitD7UAlfHAdqKJKKA4=
 =8ktI
 -----END PGP SIGNATURE-----

Merge 4.19.104 into android-4.19

Changes in 4.19.104
	ASoC: pcm: update FE/BE trigger order based on the command
	hv_sock: Remove the accept port restriction
	IB/mlx4: Fix memory leak in add_gid error flow
	RDMA/netlink: Do not always generate an ACK for some netlink operations
	RDMA/core: Fix locking in ib_uverbs_event_read
	RDMA/uverbs: Verify MR access flags
	scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails
	PCI/IOV: Fix memory leak in pci_iov_add_virtfn()
	ath10k: pci: Only dump ATH10K_MEM_REGION_TYPE_IOREG when safe
	PCI/switchtec: Fix vep_vector_number ioread width
	PCI: Don't disable bridge BARs when assigning bus resources
	nfs: NFS_SWAP should depend on SWAP
	NFS: Revalidate the file size on a fatal write error
	NFS/pnfs: Fix pnfs_generic_prepare_to_resend_writes()
	NFSv4: try lease recovery on NFS4ERR_EXPIRED
	serial: uartps: Add a timeout to the tx empty wait
	gpio: zynq: Report gpio direction at boot
	spi: spi-mem: Add extra sanity checks on the op param
	spi: spi-mem: Fix inverted logic in op sanity check
	rtc: hym8563: Return -EINVAL if the time is known to be invalid
	rtc: cmos: Stop using shared IRQ
	ARC: [plat-axs10x]: Add missing multicast filter number to GMAC node
	platform/x86: intel_mid_powerbtn: Take a copy of ddata
	ARM: dts: at91: Reenable UART TX pull-ups
	ARM: dts: am43xx: add support for clkout1 clock
	ARM: dts: at91: sama5d3: fix maximum peripheral clock rates
	ARM: dts: at91: sama5d3: define clock rate range for tcb1
	tools/power/acpi: fix compilation error
	powerpc/pseries/vio: Fix iommu_table use-after-free refcount warning
	powerpc/pseries: Allow not having ibm, hypertas-functions::hcall-multi-tce for DDW
	iommu/arm-smmu-v3: Populate VMID field for CMDQ_OP_TLBI_NH_VA
	KVM: arm/arm64: vgic-its: Fix restoration of unmapped collections
	ARM: 8949/1: mm: mark free_memmap as __init
	arm64: cpufeature: Fix the type of no FP/SIMD capability
	arm64: ptrace: nofpsimd: Fail FP/SIMD regset operations
	KVM: arm/arm64: Fix young bit from mmu notifier
	KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests
	KVM: arm: Make inject_abt32() inject an external abort instead
	KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
	mtd: onenand_base: Adjust indentation in onenand_read_ops_nolock
	mtd: sharpslpart: Fix unsigned comparison to zero
	crypto: artpec6 - return correct error code for failed setkey()
	crypto: atmel-sha - fix error handling when setting hmac key
	media: i2c: adv748x: Fix unsafe macros
	pinctrl: sh-pfc: r8a7778: Fix duplicate SDSELF_B and SD1_CLK_B
	mwifiex: Fix possible buffer overflows in mwifiex_ret_wmm_get_status()
	mwifiex: Fix possible buffer overflows in mwifiex_cmd_append_vsie_tlv()
	libertas: don't exit from lbs_ibss_join_existing() with RCU read lock held
	libertas: make lbs_ibss_join_existing() return error code on rates overflow
	scsi: megaraid_sas: Do not initiate OCR if controller is not in ready state
	x86/stackframe: Move ENCODE_FRAME_POINTER to asm/frame.h
	x86/stackframe, x86/ftrace: Add pt_regs frame annotations
	serial: uartps: Move the spinlock after the read of the tx empty
	padata: fix null pointer deref of pd->pinst
	Linux 4.19.104

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I42a465b140183dcc8cf49e19903d0e8f4b688930
2020-02-19 08:31:05 +01:00
Daniel Jordan
cad926f70b padata: fix null pointer deref of pd->pinst
The 4.19 backport dc34710a7a ("padata: Remove broken queue flushing")
removed padata_alloc_pd()'s assignment to pd->pinst, resulting in:

    Unable to handle kernel NULL pointer dereference ...
    ...
    pc : padata_reorder+0x144/0x2e0
    ...
    Call trace:
     padata_reorder+0x144/0x2e0
     padata_do_serial+0xc8/0x128
     pcrypt_aead_enc+0x60/0x70 [pcrypt]
     padata_parallel_worker+0xd8/0x138
     process_one_work+0x1bc/0x4b8
     worker_thread+0x164/0x580
     kthread+0x134/0x138
     ret_from_fork+0x10/0x18

This happened because the backport was based on an enhancement that
moved this assignment but isn't in 4.19:

  bfde23ce20 ("padata: unbind parallel jobs from specific CPUs")

Simply restore the assignment to fix the crash.

Fixes: dc34710a7a ("padata: Remove broken queue flushing")
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-14 16:33:28 -05:00
Greg Kroah-Hartman
3389e56d31 This is the 4.19.103 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5Cn0wACgkQONu9yGCS
 aT584xAAtePSlzTxst/jukREoyrpAfTM1BeovMdsZEBpKh+/F3n1udqHeo+iNAAN
 qSOig012aW2qP7b5/4CrEU9ZRTvd0AM4fog7ABLJVahMYMqoJgod8TRaE4v0nVut
 eRans6w3NbZJCZwdw2aiu5gwFfjwJLSUckBNmj4XVYdyfh7q0BgnZV5OY0V+zhuG
 1MWXaylbRqjguR/ZFk0UPAmRaqNKHbwfCJ1V0ygL9xQkJM0cUn7hX9/CqM4aYnm6
 m1oux4ektLAmF1XK4NiQEuRBMeFO74XlKcsZqQHf/b4FZfcPergcPwIj8ugtCHzJ
 kx2QgURDjgH4Tnu+Q0ScPrjj2kjU8rWmjqlcv1PcUyOWm+MR0OK9bW7TLEntMSF8
 HOEe9j6SsjQNIOoYh1YcMnuGjKNIZjl2L3VbDzpVN2GxZxwAutY6G68tV7sbA2pu
 wtsrAVOqdcjoo0ruRmwognBqQAdNdsbiBx7bgcNjVEXWL0N3Ddiv6CNYwnehA5Hq
 cvQwVQpFGP9ZGYUcCMbdwR+7kJzVy6V2S615M8GkE9FouOwTfV60zM/sZ1rFVt1J
 70zxfRX5ys19aTAVkbi6pHHCUJ0ZAiTgWujp5Hp4kPt7gEz01Ur0s1kI3b7b6iWh
 cuycRFULvqeXCApQacs//lOVDoUV20uFcL/zqOFM33v/+YzkyjA=
 =3D8z
 -----END PGP SIGNATURE-----

Merge 4.19.103 into android-4.19

Changes in 4.19.103
	Revert "drm/sun4i: dsi: Change the start delay calculation"
	ovl: fix lseek overflow on 32bit
	kernel/module: Fix memleak in module_add_modinfo_attrs()
	media: iguanair: fix endpoint sanity check
	ocfs2: fix oops when writing cloned file
	x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
	udf: Allow writing to 'Rewritable' partitions
	printk: fix exclusive_console replaying
	iwlwifi: mvm: fix NVM check for 3168 devices
	sparc32: fix struct ipc64_perm type definition
	cls_rsvp: fix rsvp_policy
	gtp: use __GFP_NOWARN to avoid memalloc warning
	l2tp: Allow duplicate session creation with UDP
	net: hsr: fix possible NULL deref in hsr_handle_frame()
	net_sched: fix an OOB access in cls_tcindex
	net: stmmac: Delete txtimer in suspend()
	bnxt_en: Fix TC queue mapping.
	tcp: clear tp->total_retrans in tcp_disconnect()
	tcp: clear tp->delivered in tcp_disconnect()
	tcp: clear tp->data_segs{in|out} in tcp_disconnect()
	tcp: clear tp->segs_{in|out} in tcp_disconnect()
	rxrpc: Fix use-after-free in rxrpc_put_local()
	rxrpc: Fix insufficient receive notification generation
	rxrpc: Fix missing active use pinning of rxrpc_local object
	rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect
	media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
	mfd: dln2: More sanity checking for endpoints
	ipc/msg.c: consolidate all xxxctl_down() functions
	tracing: Fix sched switch start/stop refcount racy updates
	rcu: Avoid data-race in rcu_gp_fqs_check_wake()
	brcmfmac: Fix memory leak in brcmf_usbdev_qinit
	usb: typec: tcpci: mask event interrupts when remove driver
	usb: gadget: legacy: set max_speed to super-speed
	usb: gadget: f_ncm: Use atomic_t to track in-flight request
	usb: gadget: f_ecm: Use atomic_t to track in-flight request
	ALSA: usb-audio: Fix endianess in descriptor validation
	ALSA: dummy: Fix PCM format loop in proc output
	mm/memory_hotplug: fix remove_memory() lockdep splat
	mm: move_pages: report the number of non-attempted pages
	media/v4l2-core: set pages dirty upon releasing DMA buffers
	media: v4l2-core: compat: ignore native command codes
	media: v4l2-rect.h: fix v4l2_rect_map_inside() top/left adjustments
	lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more()
	irqdomain: Fix a memory leak in irq_domain_push_irq()
	platform/x86: intel_scu_ipc: Fix interrupt support
	ALSA: hda: Add Clevo W65_67SB the power_save blacklist
	KVM: arm64: Correct PSTATE on exception entry
	KVM: arm/arm64: Correct CPSR on exception entry
	KVM: arm/arm64: Correct AArch32 SPSR on exception entry
	KVM: arm64: Only sign-extend MMIO up to register width
	MIPS: fix indentation of the 'RELOCS' message
	MIPS: boot: fix typo in 'vmlinux.lzma.its' target
	s390/mm: fix dynamic pagetable upgrade for hugetlbfs
	powerpc/xmon: don't access ASDR in VMs
	powerpc/pseries: Advance pfn if section is not present in lmb_is_removable()
	smb3: fix signing verification of large reads
	PCI: tegra: Fix return value check of pm_runtime_get_sync()
	mmc: spi: Toggle SPI polarity, do not hardcode it
	ACPI: video: Do not export a non working backlight interface on MSI MS-7721 boards
	ACPI / battery: Deal with design or full capacity being reported as -1
	ACPI / battery: Use design-cap for capacity calculations if full-cap is not available
	ACPI / battery: Deal better with neither design nor full capacity not being reported
	alarmtimer: Unregister wakeup source when module get fails
	ubifs: Reject unsupported ioctl flags explicitly
	ubifs: don't trigger assertion on invalid no-key filename
	ubifs: Fix FS_IOC_SETFLAGS unexpectedly clearing encrypt flag
	ubifs: Fix deadlock in concurrent bulk-read and writepage
	crypto: geode-aes - convert to skcipher API and make thread-safe
	PCI: keystone: Fix link training retries initiation
	mmc: sdhci-of-at91: fix memleak on clk_get failure
	hv_balloon: Balloon up according to request page number
	mfd: axp20x: Mark AXP20X_VBUS_IPSOUT_MGMT as volatile
	crypto: api - Check spawn->alg under lock in crypto_drop_spawn
	crypto: ccree - fix backlog memory leak
	crypto: ccree - fix pm wrongful error reporting
	crypto: ccree - fix PM race condition
	scripts/find-unused-docs: Fix massive false positives
	scsi: qla2xxx: Fix mtcp dump collection failure
	power: supply: ltc2941-battery-gauge: fix use-after-free
	ovl: fix wrong WARN_ON() in ovl_cache_update_ino()
	f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
	f2fs: fix miscounted block limit in f2fs_statfs_project()
	f2fs: code cleanup for f2fs_statfs_project()
	PM: core: Fix handling of devices deleted during system-wide resume
	of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc
	dm zoned: support zone sizes smaller than 128MiB
	dm space map common: fix to ensure new block isn't already in use
	dm crypt: fix benbi IV constructor crash if used in authenticated mode
	dm: fix potential for q->make_request_fn NULL pointer
	dm writecache: fix incorrect flush sequence when doing SSD mode commit
	padata: Remove broken queue flushing
	tracing: Annotate ftrace_graph_hash pointer with __rcu
	tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
	ftrace: Add comment to why rcu_dereference_sched() is open coded
	ftrace: Protect ftrace_graph_hash with ftrace_sync
	samples/bpf: Don't try to remove user's homedir on clean
	crypto: ccp - set max RSA modulus size for v3 platform devices as well
	crypto: pcrypt - Do not clear MAY_SLEEP flag in original request
	crypto: atmel-aes - Fix counter overflow in CTR mode
	crypto: api - Fix race condition in crypto_spawn_alg
	crypto: picoxcell - adjust the position of tasklet_init and fix missed tasklet_kill
	scsi: qla2xxx: Fix unbound NVME response length
	NFS: Fix memory leaks and corruption in readdir
	NFS: Directory page cache pages need to be locked when read
	jbd2_seq_info_next should increase position index
	Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES
	btrfs: set trans->drity in btrfs_commit_transaction
	Btrfs: fix race between adding and putting tree mod seq elements and nodes
	ARM: tegra: Enable PLLP bypass during Tegra124 LP1
	iwlwifi: don't throw error when trying to remove IGTK
	mwifiex: fix unbalanced locking in mwifiex_process_country_ie()
	sunrpc: expiry_time should be seconds not timeval
	gfs2: move setting current->backing_dev_info
	gfs2: fix O_SYNC write handling
	drm/rect: Avoid division by zero
	media: rc: ensure lirc is initialized before registering input device
	tools/kvm_stat: Fix kvm_exit filter name
	xen/balloon: Support xend-based toolstack take two
	watchdog: fix UAF in reboot notifier handling in watchdog core code
	bcache: add readahead cache policy options via sysfs interface
	eventfd: track eventfd_signal() recursion depth
	aio: prevent potential eventfd recursion on poll
	KVM: x86: Refactor picdev_write() to prevent Spectre-v1/L1TF attacks
	KVM: x86: Refactor prefix decoding to prevent Spectre-v1/L1TF attacks
	KVM: x86: Protect pmu_intel.c from Spectre-v1/L1TF attacks
	KVM: x86: Protect DR-based index computations from Spectre-v1/L1TF attacks
	KVM: x86: Protect kvm_lapic_reg_write() from Spectre-v1/L1TF attacks
	KVM: x86: Protect kvm_hv_msr_[get|set]_crash_data() from Spectre-v1/L1TF attacks
	KVM: x86: Protect ioapic_write_indirect() from Spectre-v1/L1TF attacks
	KVM: x86: Protect MSR-based index computations in pmu.h from Spectre-v1/L1TF attacks
	KVM: x86: Protect ioapic_read_indirect() from Spectre-v1/L1TF attacks
	KVM: x86: Protect MSR-based index computations from Spectre-v1/L1TF attacks in x86.c
	KVM: x86: Protect x86_decode_insn from Spectre-v1/L1TF attacks
	KVM: x86: Protect MSR-based index computations in fixed_msr_to_seg_unit() from Spectre-v1/L1TF attacks
	KVM: x86: Fix potential put_fpu() w/o load_fpu() on MPX platform
	KVM: PPC: Book3S HV: Uninit vCPU if vcore creation fails
	KVM: PPC: Book3S PR: Free shared page if mmu initialization fails
	x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
	KVM: x86: Don't let userspace set host-reserved cr4 bits
	KVM: x86: Free wbinvd_dirty_mask if vCPU creation fails
	KVM: s390: do not clobber registers during guest reset/store status
	clk: tegra: Mark fuse clock as critical
	drm/amd/dm/mst: Ignore payload update failures
	percpu: Separate decrypted varaibles anytime encryption can be enabled
	scsi: qla2xxx: Fix the endianness of the qla82xx_get_fw_size() return type
	scsi: csiostor: Adjust indentation in csio_device_reset
	scsi: qla4xxx: Adjust indentation in qla4xxx_mem_free
	scsi: ufs: Recheck bkops level if bkops is disabled
	phy: qualcomm: Adjust indentation in read_poll_timeout
	ext2: Adjust indentation in ext2_fill_super
	powerpc/44x: Adjust indentation in ibm4xx_denali_fixup_memsize
	drm: msm: mdp4: Adjust indentation in mdp4_dsi_encoder_enable
	NFC: pn544: Adjust indentation in pn544_hci_check_presence
	ppp: Adjust indentation into ppp_async_input
	net: smc911x: Adjust indentation in smc911x_phy_configure
	net: tulip: Adjust indentation in {dmfe, uli526x}_init_module
	IB/mlx5: Fix outstanding_pi index for GSI qps
	IB/core: Fix ODP get user pages flow
	nfsd: fix delay timer on 32-bit architectures
	nfsd: fix jiffies/time_t mixup in LRU list
	nfsd: Return the correct number of bytes written to the file
	ubi: fastmap: Fix inverted logic in seen selfcheck
	ubi: Fix an error pointer dereference in error handling code
	mfd: da9062: Fix watchdog compatible string
	mfd: rn5t618: Mark ADC control register volatile
	bonding/alb: properly access headers in bond_alb_xmit()
	net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port
	net: mvneta: move rx_dropped and rx_errors in per-cpu stats
	net_sched: fix a resource leak in tcindex_set_parms()
	net: systemport: Avoid RBUF stuck in Wake-on-LAN mode
	net/mlx5: IPsec, Fix esp modify function attribute
	net/mlx5: IPsec, fix memory leak at mlx5_fpga_ipsec_delete_sa_ctx
	net: macb: Remove unnecessary alignment check for TSO
	net: macb: Limit maximum GEM TX length in TSO
	net: dsa: b53: Always use dev->vlan_enabled in b53_configure_vlan()
	ext4: fix deadlock allocating crypto bounce page from mempool
	btrfs: use bool argument in free_root_pointers()
	btrfs: free block groups after free'ing fs trees
	drm: atmel-hlcdc: enable clock before configuring timing engine
	drm/dp_mst: Remove VCPI while disabling topology mgr
	btrfs: flush write bio if we loop in extent_write_cache_pages
	KVM: x86/mmu: Apply max PA check for MMIO sptes to 32-bit KVM
	KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM
	KVM: VMX: Add non-canonical check on writes to RTIT address MSRs
	KVM: nVMX: vmread should not set rflags to specify success in case of #PF
	KVM: Use vcpu-specific gva->hva translation when querying host page size
	KVM: Play nice with read-only memslots when querying host page size
	mm: zero remaining unavailable struct pages
	mm: return zero_resv_unavail optimization
	mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
	cifs: fail i/o on soft mounts if sessionsetup errors out
	x86/apic/msi: Plug non-maskable MSI affinity race
	clocksource: Prevent double add_timer_on() for watchdog_timer
	perf/core: Fix mlock accounting in perf_mmap()
	rxrpc: Fix service call disconnection
	Linux 4.19.103

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0d7f09085c3541373e0fd6b2e3ffacc5e34f7d55
2020-02-11 15:05:03 -08:00
Song Liu
a3623db43a perf/core: Fix mlock accounting in perf_mmap()
commit 003461559e upstream.

Decreasing sysctl_perf_event_mlock between two consecutive perf_mmap()s of
a perf ring buffer may lead to an integer underflow in locked memory
accounting. This may lead to the undesired behaviors, such as failures in
BPF map creation.

Address this by adjusting the accounting logic to take into account the
possibility that the amount of already locked memory may exceed the
current limit.

Fixes: c4b7547974 ("perf/core: Make the mlock accounting simple again")
Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lkml.kernel.org/r/20200123181146.2238074-1-songliubraving@fb.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:34:19 -08:00
Konstantin Khlebnikov
6284d30e96 clocksource: Prevent double add_timer_on() for watchdog_timer
commit febac332a8 upstream.

Kernel crashes inside QEMU/KVM are observed:

  kernel BUG at kernel/time/timer.c:1154!
  BUG_ON(timer_pending(timer) || !timer->function) in add_timer_on().

At the same time another cpu got:

  general protection fault: 0000 [#1] SMP PTI of poinson pointer 0xdead000000000200 in:

  __hlist_del at include/linux/list.h:681
  (inlined by) detach_timer at kernel/time/timer.c:818
  (inlined by) expire_timers at kernel/time/timer.c:1355
  (inlined by) __run_timers at kernel/time/timer.c:1686
  (inlined by) run_timer_softirq at kernel/time/timer.c:1699

Unfortunately kernel logs are badly scrambled, stacktraces are lost.

Printing the timer->function before the BUG_ON() pointed to
clocksource_watchdog().

The execution of clocksource_watchdog() can race with a sequence of
clocksource_stop_watchdog() .. clocksource_start_watchdog():

expire_timers()
 detach_timer(timer, true);
  timer->entry.pprev = NULL;
 raw_spin_unlock_irq(&base->lock);
 call_timer_fn
  clocksource_watchdog()

					clocksource_watchdog_kthread() or
					clocksource_unbind()

					spin_lock_irqsave(&watchdog_lock, flags);
					clocksource_stop_watchdog();
					 del_timer(&watchdog_timer);
					 watchdog_running = 0;
					spin_unlock_irqrestore(&watchdog_lock, flags);

					spin_lock_irqsave(&watchdog_lock, flags);
					clocksource_start_watchdog();
					 add_timer_on(&watchdog_timer, ...);
					 watchdog_running = 1;
					spin_unlock_irqrestore(&watchdog_lock, flags);

  spin_lock(&watchdog_lock);
  add_timer_on(&watchdog_timer, ...);
   BUG_ON(timer_pending(timer) || !timer->function);
    timer_pending() -> true
    BUG()

I.e. inside clocksource_watchdog() watchdog_timer could be already armed.

Check timer_pending() before calling add_timer_on(). This is sufficient as
all operations are synchronized by watchdog_lock.

Fixes: 75c5158f70 ("timekeeping: Update clocksource with stop_machine")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/158048693917.4378.13823603769948933793.stgit@buzz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:34:18 -08:00
Thomas Gleixner
032a2bf978 x86/apic/msi: Plug non-maskable MSI affinity race
commit 6f1a4891a5 upstream.

Evan tracked down a subtle race between the update of the MSI message and
the device raising an interrupt internally on PCI devices which do not
support MSI masking. The update of the MSI message is non-atomic and
consists of either 2 or 3 sequential 32bit wide writes to the PCI config
space.

   - Write address low 32bits
   - Write address high 32bits (If supported by device)
   - Write data

When an interrupt is migrated then both address and data might change, so
the kernel attempts to mask the MSI interrupt first. But for MSI masking is
optional, so there exist devices which do not provide it. That means that
if the device raises an interrupt internally between the writes then a MSI
message is sent built from half updated state.

On x86 this can lead to spurious interrupts on the wrong interrupt
vector when the affinity setting changes both address and data. As a
consequence the device interrupt can be lost causing the device to
become stuck or malfunctioning.

Evan tried to handle that by disabling MSI accross an MSI message
update. That's not feasible because disabling MSI has issues on its own:

 If MSI is disabled the PCI device is routing an interrupt to the legacy
 INTx mechanism. The INTx delivery can be disabled, but the disablement is
 not working on all devices.

 Some devices lose interrupts when both MSI and INTx delivery are disabled.

Another way to solve this would be to enforce the allocation of the same
vector on all CPUs in the system for this kind of screwed devices. That
could be done, but it would bring back the vector space exhaustion problems
which got solved a few years ago.

Fortunately the high address (if supported by the device) is only relevant
when X2APIC is enabled which implies interrupt remapping. In the interrupt
remapping case the affinity setting is happening at the interrupt remapping
unit and the PCI MSI message is programmed only once when the PCI device is
initialized.

That makes it possible to solve it with a two step update:

  1) Target the MSI msg to the new vector on the current target CPU

  2) Target the MSI msg to the new vector on the new target CPU

In both cases writing the MSI message is only changing a single 32bit word
which prevents the issue of inconsistency.

After writing the final destination it is necessary to check whether the
device issued an interrupt while the intermediate state #1 (new vector,
current CPU) was in effect.

This is possible because the affinity change is always happening on the
current target CPU. The code runs with interrupts disabled, so the
interrupt can be detected by checking the IRR of the local APIC. If the
vector is pending in the IRR then the interrupt is retriggered on the new
target CPU by sending an IPI for the associated vector on the target CPU.

This can cause spurious interrupts on both the local and the new target
CPU.

 1) If the new vector is not in use on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then interrupt entry code will
    ignore that spurious interrupt. The vector is marked so that the
    'No irq handler for vector' warning is supressed once.

 2) If the new vector is in use already on the local CPU then the IRR check
    might see an pending interrupt from the device which is using this
    vector. The IPI to the new target CPU will then invoke the handler of
    the device, which got the affinity change, even if that device did not
    issue an interrupt

 3) If the new vector is in use already on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then the handler of the device which
    uses that vector on the local CPU will be invoked.

expose issues in device driver interrupt handlers which are not prepared to
handle a spurious interrupt correctly. This not a regression, it's just
exposing something which was already broken as spurious interrupts can
happen for a lot of reasons and all driver handlers need to be able to deal
with them.

Reported-by: Evan Green <evgreen@chromium.org>
Debugged-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Evan Green <evgreen@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:34:18 -08:00
Steven Rostedt (VMware)
0948d6294d ftrace: Protect ftrace_graph_hash with ftrace_sync
[ Upstream commit 54a16ff6f2 ]

As function_graph tracer can run when RCU is not "watching", it can not be
protected by synchronize_rcu() it requires running a task on each CPU before
it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.

Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72

Cc: stable@vger.kernel.org
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:05 -08:00
Steven Rostedt (VMware)
c03d235980 ftrace: Add comment to why rcu_dereference_sched() is open coded
[ Upstream commit 16052dd5bd ]

Because the function graph tracer can execute in sections where RCU is not
"watching", the rcu_dereference_sched() for the has needs to be open coded.
This is fine because the RCU "flavor" of the ftrace hash is protected by
its own RCU handling (it does its own little synchronization on every CPU
and does not rely on RCU sched).

Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:04 -08:00
Amol Grover
30afa80b0f tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
[ Upstream commit fd0e6852c4 ]

Fix following instances of sparse error
kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison

Use rcu_dereference_protected to dereference the newly annotated pointer.

Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.com

Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:04 -08:00
Amol Grover
f144ad2e84 tracing: Annotate ftrace_graph_hash pointer with __rcu
[ Upstream commit 24a9729f83 ]

Fix following instances of sparse error
kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison

Use rcu_dereference_protected to access the __rcu annotated pointer.

Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:04 -08:00
Herbert Xu
dc34710a7a padata: Remove broken queue flushing
[ Upstream commit 07928d9bfc ]

The function padata_flush_queues is fundamentally broken because
it cannot force padata users to complete the request that is
underway.  IOW padata has to passively wait for the completion
of any outstanding work.

As it stands flushing is used in two places.  Its use in padata_stop
is simply unnecessary because nothing depends on the queues to
be flushed afterwards.

The other use in padata_replace is more substantial as we depend
on it to free the old pd structure.  This patch instead uses the
pd->refcnt to dynamically free the pd structure once all requests
are complete.

Fixes: 2b73b07ab8 ("padata: Flush the padata queues actively")
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:04 -08:00
Stephen Boyd
b522ff023e alarmtimer: Unregister wakeup source when module get fails
commit 6b6d188aae upstream.

The alarmtimer_rtc_add_device() function creates a wakeup source and then
tries to grab a module reference. If that fails the function returns early
with an error code, but fails to remove the wakeup source.

Cleanup this exit path so there is no dangling wakeup source, which is
named 'alarmtime' left allocated which will conflict with another RTC
device that may be registered later.

Fixes: 51218298a2 ("alarmtimer: Ensure RTC module is not unloaded")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200109155910.907-2-swboyd@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:33:59 -08:00
Kevin Hao
4f7d834cec irqdomain: Fix a memory leak in irq_domain_push_irq()
commit 0f394daef8 upstream.

Fix a memory leak reported by kmemleak:
unreferenced object 0xffff000bc6f50e80 (size 128):
  comm "kworker/23:2", pid 201, jiffies 4294894947 (age 942.132s)
  hex dump (first 32 bytes):
    00 00 00 00 41 00 00 00 86 c0 03 00 00 00 00 00  ....A...........
    00 a0 b2 c6 0b 00 ff ff 40 51 fd 10 00 80 ff ff  ........@Q......
  backtrace:
    [<00000000e62d2240>] kmem_cache_alloc_trace+0x1a4/0x320
    [<00000000279143c9>] irq_domain_push_irq+0x7c/0x188
    [<00000000d9f4c154>] thunderx_gpio_probe+0x3ac/0x438
    [<00000000fd09ec22>] pci_device_probe+0xe4/0x198
    [<00000000d43eca75>] really_probe+0xdc/0x320
    [<00000000d3ebab09>] driver_probe_device+0x5c/0xf0
    [<000000005b3ecaa0>] __device_attach_driver+0x88/0xc0
    [<000000004e5915f5>] bus_for_each_drv+0x7c/0xc8
    [<0000000079d4db41>] __device_attach+0xe4/0x140
    [<00000000883bbda9>] device_initial_probe+0x18/0x20
    [<000000003be59ef6>] bus_probe_device+0x98/0xa0
    [<0000000039b03d3f>] deferred_probe_work_func+0x74/0xa8
    [<00000000870934ce>] process_one_work+0x1c8/0x470
    [<00000000e3cce570>] worker_thread+0x1f8/0x428
    [<000000005d64975e>] kthread+0xfc/0x128
    [<00000000f0eaa764>] ret_from_fork+0x10/0x18

Fixes: 495c38d300 ("irqdomain: Add irq_domain_{push,pop}_irq() functions")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200120043547.22271-1-haokexin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:33:57 -08:00
Eric Dumazet
00b13445f9 rcu: Avoid data-race in rcu_gp_fqs_check_wake()
commit 6935c3983b upstream.

The rcu_gp_fqs_check_wake() function uses rcu_preempt_blocked_readers_cgp()
to read ->gp_tasks while other cpus might overwrite this field.

We need READ_ONCE()/WRITE_ONCE() pairs to avoid compiler
tricks and KCSAN splats like the following :

BUG: KCSAN: data-race in rcu_gp_fqs_check_wake / rcu_preempt_deferred_qs_irqrestore

write to 0xffffffff85a7f190 of 8 bytes by task 7317 on cpu 0:
 rcu_preempt_deferred_qs_irqrestore+0x43d/0x580 kernel/rcu/tree_plugin.h:507
 rcu_read_unlock_special+0xec/0x370 kernel/rcu/tree_plugin.h:659
 __rcu_read_unlock+0xcf/0xe0 kernel/rcu/tree_plugin.h:394
 rcu_read_unlock include/linux/rcupdate.h:645 [inline]
 __ip_queue_xmit+0x3b0/0xa40 net/ipv4/ip_output.c:533
 ip_queue_xmit+0x45/0x60 include/net/ip.h:236
 __tcp_transmit_skb+0xdeb/0x1cd0 net/ipv4/tcp_output.c:1158
 __tcp_send_ack+0x246/0x300 net/ipv4/tcp_output.c:3685
 tcp_send_ack+0x34/0x40 net/ipv4/tcp_output.c:3691
 tcp_cleanup_rbuf+0x130/0x360 net/ipv4/tcp.c:1575
 tcp_recvmsg+0x633/0x1a30 net/ipv4/tcp.c:2179
 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec net/socket.c:871 [inline]
 sock_recvmsg net/socket.c:889 [inline]
 sock_recvmsg+0x92/0xb0 net/socket.c:885
 sock_read_iter+0x15f/0x1e0 net/socket.c:967
 call_read_iter include/linux/fs.h:1864 [inline]
 new_sync_read+0x389/0x4f0 fs/read_write.c:414

read to 0xffffffff85a7f190 of 8 bytes by task 10 on cpu 1:
 rcu_gp_fqs_check_wake kernel/rcu/tree.c:1556 [inline]
 rcu_gp_fqs_check_wake+0x93/0xd0 kernel/rcu/tree.c:1546
 rcu_gp_fqs_loop+0x36c/0x580 kernel/rcu/tree.c:1611
 rcu_gp_kthread+0x143/0x220 kernel/rcu/tree.c:1768
 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 10 Comm: rcu_preempt Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
[ paulmck:  Added another READ_ONCE() for RCU CPU stall warnings. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:33:55 -08:00
Mathieu Desnoyers
62bfa26e4d tracing: Fix sched switch start/stop refcount racy updates
commit 64ae572bc7 upstream.

Reading the sched_cmdline_ref and sched_tgid_ref initial state within
tracing_start_sched_switch without holding the sched_register_mutex is
racy against concurrent updates, which can lead to tracepoint probes
being registered more than once (and thus trigger warnings within
tracepoint.c).

[ May be the fix for this bug ]
Link: https://lore.kernel.org/r/000000000000ab6f84056c786b93@google.com

Link: http://lkml.kernel.org/r/20190817141208.15226-1-mathieu.desnoyers@efficios.com

Cc: stable@vger.kernel.org
CC: Steven Rostedt (VMware) <rostedt@goodmis.org>
CC: Joel Fernandes (Google) <joel@joelfernandes.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul E. McKenney <paulmck@linux.ibm.com>
Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com
Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:33:55 -08:00
John Ogness
8360063bfa printk: fix exclusive_console replaying
[ Upstream commit def97da136 ]

Commit f92b070f2d ("printk: Do not miss new messages when replaying
the log") introduced a new variable @exclusive_console_stop_seq to
store when an exclusive console should stop printing. It should be
set to the @console_seq value at registration. However, @console_seq
is previously set to @syslog_seq so that the exclusive console knows
where to begin. This results in the exclusive console immediately
reactivating all the other consoles and thus repeating the messages
for those consoles.

Set @console_seq after @exclusive_console_stop_seq has stored the
current @console_seq value.

Fixes: f92b070f2d ("printk: Do not miss new messages when replaying the log")
Link: http://lkml.kernel.org/r/20191219115322.31160-1-john.ogness@linutronix.de
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:33:51 -08:00
YueHaibing
bdfaaf35ac kernel/module: Fix memleak in module_add_modinfo_attrs()
[ Upstream commit f6d061d617 ]

In module_add_modinfo_attrs() if sysfs_create_file() fails
on the first iteration of the loop (so i = 0), we forget to
free the modinfo_attrs.

Fixes: bc6f2a757d ("kernel/module: Fix mem leak in module_add_modinfo_attrs")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:33:50 -08:00
Greg Kroah-Hartman
83b584a64c This is the 4.19.102 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl461NIACgkQONu9yGCS
 aT6Mqw//W5xIIcs0Ut+P+QYNN6lCTRJ0AvFUolz79M3pyK/rHUluwTvYJbDAeGE3
 sckv96rE1pxj5ZSf6LegXIoALrA4RlYHS8xXkYnRrt6xfrb7UwpqsJtt4Mx+IrJ3
 9uFfaWRSvuDfRCraZxLiE2Bl9xVYvaPfFJYBmH383VB+deYNfpwORFsqNDQT+gR6
 PZLuV0x//Kerwmd4OvaaHR/fIl8YVKmIz5lu3+3WIuVKxTK6Bbd3YzVu13dhVaX2
 mETflLEAO/sYsUQiS4SO22ejLAiWyD8LyMV8s9KeTFQXzML3JpibKnt3ySDfzsFE
 m8VRlaLcQwB0Ca2AVGHA5QV0+V+2+6qh/IcZl630feBueGQX59qLQkOurD4e/9lm
 Na6ZkLPTh9UipIfTu9fvA5HY5lPt2VcSWwG2nLluckfJIpKNFVQEB7vuk9zd7468
 qkXmj/J1YDdJzt2YgD0WZuKu3f1/No7rXbNmT2Oj0+HNWWvIU9xFNFlIPAxo7pJy
 kwekd9+gHI0n1OhLRjzYUyf0pD+j0o75ZHsYYsUW0y6cGoWX/LmQ8JPFi+waHiov
 FOe8FJz/uDtfQnJ4+izAM5Jjbu1LE+L8uGoIExYAv4DuXgPZtI2wtHvP4HHM3Aov
 mDWLesMgizsroViv57aXC0C1ZPksPpGeHT+HcH7RnDQ0kQmpe3E=
 =2XGW
 -----END PGP SIGNATURE-----

Merge 4.19.102 into android-4.19

Changes in 4.19.102
	vfs: fix do_last() regression
	x86/resctrl: Fix use-after-free when deleting resource groups
	x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup
	x86/resctrl: Fix a deadlock due to inaccurate reference
	crypto: pcrypt - Fix user-after-free on module unload
	rsi: add hci detach for hibernation and poweroff
	rsi: fix use-after-free on failed probe and unbind
	perf c2c: Fix return type for histogram sorting comparision functions
	PM / devfreq: Add new name attribute for sysfs
	tools lib: Fix builds when glibc contains strlcpy()
	arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean'
	ext4: validate the debug_want_extra_isize mount option at parse time
	mm/mempolicy.c: fix out of bounds write in mpol_parse_str()
	reiserfs: Fix memory leak of journal device string
	media: digitv: don't continue if remote control state can't be read
	media: af9005: uninitialized variable printked
	media: vp7045: do not read uninitialized values if usb transfer fails
	media: gspca: zero usb_buf
	media: dvb-usb/dvb-usb-urb.c: initialize actlen to 0
	tomoyo: Use atomic_t for statistics counter
	ttyprintk: fix a potential deadlock in interrupt context issue
	Bluetooth: Fix race condition in hci_release_sock()
	cgroup: Prevent double killing of css when enabling threaded cgroup
	media: si470x-i2c: Move free() past last use of 'radio'
	ARM: dts: sun8i: a83t: Correct USB3503 GPIOs polarity
	ARM: dts: am57xx-beagle-x15/am57xx-idk: Remove "gpios" for endpoint dt nodes
	ARM: dts: beagle-x15-common: Model 5V0 regulator
	soc: ti: wkup_m3_ipc: Fix race condition with rproc_boot
	tools lib traceevent: Fix memory leakage in filter_event
	rseq: Unregister rseq for clone CLONE_VM
	clk: sunxi-ng: h6-r: Fix AR100/R_APB2 parent order
	mac80211: mesh: restrict airtime metric to peered established plinks
	clk: mmp2: Fix the order of timer mux parents
	ASoC: rt5640: Fix NULL dereference on module unload
	ixgbevf: Remove limit of 10 entries for unicast filter list
	ixgbe: Fix calculation of queue with VFs and flow director on interface flap
	igb: Fix SGMII SFP module discovery for 100FX/LX.
	platform/x86: GPD pocket fan: Allow somewhat lower/higher temperature limits
	ASoC: sti: fix possible sleep-in-atomic
	qmi_wwan: Add support for Quectel RM500Q
	parisc: Use proper printk format for resource_size_t
	wireless: fix enabling channel 12 for custom regulatory domain
	cfg80211: Fix radar event during another phy CAC
	mac80211: Fix TKIP replay protection immediately after key setup
	wireless: wext: avoid gcc -O3 warning
	netfilter: nft_tunnel: ERSPAN_VERSION must not be null
	net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
	bnxt_en: Fix ipv6 RFS filter matching logic.
	riscv: delete temporary files
	iwlwifi: Don't ignore the cap field upon mcc update
	ARM: dts: am335x-boneblack-common: fix memory size
	vti[6]: fix packet tx through bpf_redirect()
	xfrm interface: fix packet tx through bpf_redirect()
	xfrm: interface: do not confirm neighbor when do pmtu update
	scsi: fnic: do not queue commands during fwreset
	ARM: 8955/1: virt: Relax arch timer version check during early boot
	tee: optee: Fix compilation issue with nommu
	airo: Fix possible info leak in AIROOLDIOCTL/SIOCDEVPRIVATE
	airo: Add missing CAP_NET_ADMIN check in AIROOLDIOCTL/SIOCDEVPRIVATE
	r8152: get default setting of WOL before initializing
	ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1
	qlcnic: Fix CPU soft lockup while collecting firmware dump
	powerpc/fsl/dts: add fsl,erratum-a011043
	net/fsl: treat fsl,erratum-a011043
	net: fsl/fman: rename IF_MODE_XGMII to IF_MODE_10G
	seq_tab_next() should increase position index
	l2t_seq_next should increase position index
	net: Fix skb->csum update in inet_proto_csum_replace16().
	btrfs: do not zero f_bavail if we have available space
	perf report: Fix no libunwind compiled warning break s390 issue
	mm/migrate.c: also overwrite error when it is bigger than zero
	Linux 4.19.102

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia9b63c7932b66f469ab0e88467e1e07741408f0b
2020-02-05 19:20:26 +00:00
Michal Koutný
6d26630912 cgroup: Prevent double killing of css when enabling threaded cgroup
commit 3bc0bb36fa upstream.

The test_cgcore_no_internal_process_constraint_on_threads selftest when
running with subsystem controlling noise triggers two warnings:

> [  597.443115] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3131 cgroup_apply_control_enable+0xe0/0x3f0
> [  597.443413] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3177 cgroup_apply_control_disable+0xa6/0x160

Both stem from a call to cgroup_type_write. The first warning was also
triggered by syzkaller.

When we're switching cgroup to threaded mode shortly after a subsystem
was disabled on it, we can see the respective subsystem css dying there.

The warning in cgroup_apply_control_enable is harmless in this case
since we're not adding new subsys anyway.
The warning in cgroup_apply_control_disable indicates an attempt to kill
css of recently disabled subsystem repeatedly.

The commit prevents these situations by making cgroup_type_write wait
for all dying csses to go away before re-applying subtree controls.
When at it, the locations of WARN_ON_ONCE calls are moved so that
warning is triggered only when we are about to misuse the dying css.

Reported-by: syzbot+5493b2a54d31d6aea629@syzkaller.appspotmail.com
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-05 14:43:39 +00:00
Greg Kroah-Hartman
1b44c9bd91 This is the 4.19.101 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl41RsgACgkQONu9yGCS
 aT4P7A/+PZVt4c6phHZ9tj0OV4TjAWfu3IX9nLypzyBxjmBeJu8yt1pkNrfKj6fT
 +N3MjDlmAYss5CV6SOACPWXdhAQF3SsM6PR+CSrzwpS3+iAVTqNTaHpZqJFBgr3R
 cDe+MksbMLDpw3x+hXWV1E6WKcJZZJVeANuaD09HQDRVqKw1hRGxGEdyPChEjT71
 Ml3o9a2TYzOvRClBtBHPRQNy/MP4cVv06kS7jefDNh1z9PMsD2w01W54ur44WFJb
 aujt6bLyJlcs0cPdSkU7D8pmgzs/0cxW8N+4gCpfW66P6FJL8SU4RDTujUARlyvC
 oP5d62XrARXAO0hh1NYdWyUqpQjOFJRTWfEqW+lFGo5s9yL9oPW8vcCBKBuZfg+u
 HlVCCTCyU/IJN0DMeqdneThDg8sxirlzHu/NllgGIf7rhyMRqRmruQZXc1W3/7e8
 UgqqAEFkgVmJgq3mVWlHsV5Fmgb+PQlqj4rSB05wlAbXsQwF0nbSS/lsvwDR8qqE
 8nO/PQoxpQyAOYJ+iyaCsq51IsJUCwWOto8L/RpdYSbFpLTn+BRmNdDr7jHOVnPl
 FshugoXijE6IrVGIJhJBGGy/E+eG8Dru7IZEsi2UZLsw+bBvucqv7raIHAJ2YRaL
 8ZuwwmvpZpCOdYSWa7lIgqZb0qOTyR+b6UQ57X8hS5U3MZ2jMOE=
 =+bpt
 -----END PGP SIGNATURE-----

Merge 4.19.101 into android-4.19

Changes in 4.19.101
	orinoco_usb: fix interface sanity check
	rsi_91x_usb: fix interface sanity check
	usb: dwc3: pci: add ID for the Intel Comet Lake -V variant
	USB: serial: ir-usb: add missing endpoint sanity check
	USB: serial: ir-usb: fix link-speed handling
	USB: serial: ir-usb: fix IrLAP framing
	usb: dwc3: turn off VBUS when leaving host mode
	staging: most: net: fix buffer overflow
	staging: wlan-ng: ensure error return is actually returned
	staging: vt6656: correct packet types for CTS protect, mode.
	staging: vt6656: use NULLFUCTION stack on mac80211
	staging: vt6656: Fix false Tx excessive retries reporting.
	serial: 8250_bcm2835aux: Fix line mismatch on driver unbind
	component: do not dereference opaque pointer in debugfs
	mei: me: add comet point (lake) H device ids
	iio: st_gyro: Correct data for LSM9DS0 gyro
	crypto: chelsio - fix writing tfm flags to wrong place
	cifs: Fix memory allocation in __smb2_handle_cancelled_cmd()
	ath9k: fix storage endpoint lookup
	brcmfmac: fix interface sanity check
	rtl8xxxu: fix interface sanity check
	zd1211rw: fix storage endpoint lookup
	net_sched: ematch: reject invalid TCF_EM_SIMPLE
	net_sched: fix ops->bind_class() implementations
	HID: multitouch: Add LG MELF0410 I2C touchscreen support
	arc: eznps: fix allmodconfig kconfig warning
	HID: Add quirk for Xin-Mo Dual Controller
	HID: ite: Add USB id match for Acer SW5-012 keyboard dock
	HID: Add quirk for incorrect input length on Lenovo Y720
	drivers/hid/hid-multitouch.c: fix a possible null pointer access.
	phy: qcom-qmp: Increase PHY ready timeout
	phy: cpcap-usb: Prevent USB line glitches from waking up modem
	watchdog: max77620_wdt: fix potential build errors
	watchdog: rn5t618_wdt: fix module aliases
	spi: spi-dw: Add lock protect dw_spi rx/tx to prevent concurrent calls
	drivers/net/b44: Change to non-atomic bit operations on pwol_mask
	net: wan: sdla: Fix cast from pointer to integer of different size
	gpio: max77620: Add missing dependency on GPIOLIB_IRQCHIP
	atm: eni: fix uninitialized variable warning
	HID: steam: Fix input device disappearing
	platform/x86: dell-laptop: disable kbd backlight on Inspiron 10xx
	PCI: Add DMA alias quirk for Intel VCA NTB
	iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping
	ARM: OMAP2+: SmartReflex: add omap_sr_pdata definition
	usb-storage: Disable UAS on JMicron SATA enclosure
	sched/fair: Add tmp_alone_branch assertion
	sched/fair: Fix insertion in rq->leaf_cfs_rq_list
	rsi: fix use-after-free on probe errors
	rsi: fix memory leak on failed URB submission
	rsi: fix non-atomic allocation in completion handler
	crypto: af_alg - Use bh_lock_sock in sk_destruct
	random: try to actively add entropy rather than passively wait for it
	block: cleanup __blkdev_issue_discard()
	block: fix 32 bit overflow in __blkdev_issue_discard()
	KVM: arm64: Write arch.mdcr_el2 changes since last vcpu_load on VHE
	Linux 4.19.101

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I801cd8d04eea35b4b53957cc69c0987d88094992
2020-02-02 20:22:38 +00:00
Patrick Bellasi
8b2fbd9076 UPSTREAM: sched/fair/util_est: Implement faster ramp-up EWMA on utilization increases
The estimated utilization for a task:

   util_est = max(util_avg, est.enqueue, est.ewma)

is defined based on:

 - util_avg: the PELT defined utilization
 - est.enqueued: the util_avg at the end of the last activation
 - est.ewma:     a exponential moving average on the est.enqueued samples

According to this definition, when a task suddenly changes its bandwidth
requirements from small to big, the EWMA will need to collect multiple
samples before converging up to track the new big utilization.

This slow convergence towards bigger utilization values is not
aligned to the default scheduler behavior, which is to optimize for
performance. Moreover, the est.ewma component fails to compensate for
temporarely utilization drops which spans just few est.enqueued samples.

To let util_est do a better job in the scenario depicted above, change
its definition by making util_est directly follow upward motion and
only decay the est.ewma on downward.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@matbug.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191023205630.14469-1-patrick.bellasi@matbug.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit b8c9636140)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I5c0bdd401f3fe599a2b7b9215c9a3a621f91002d
2020-02-01 17:35:00 +00:00
Qais Yousef
f503db1178 ANDROID: Re-use SUGOV_RT_MAX_FREQ to control uclamp rt behavior
By default uclamp RT tasks will use the max frequency, which is not the
desired default behavior on mobile devices.

Re-use the SUGOV_RT_MAX_FREQ sched_feat to control the default behavior.

When SUGOV_RT_MAX_FREQ is NOT selected, the uclamp_min value of the RT
tasks will be 0.

Note, since now we use SUGOV_RT_MAX_FREQ to enforce the default max
frequency for RT when uclamp is compiled in; the condition in
schedutil_cpu_util() needs to be inverted so that max no longer
unconditionally applied when uclamp is compiled in && SUGOV_RT_MAX_FREQ
is true. This unconditional application means uclamp values are always
ignored which is not what we want when uclamp is compiled in.

Bug: 120440300
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I3d36f1ebed6ef35a6299af32bbf4462d0353e783
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 16:14:12 +00:00
Valentin Schneider
ecce1cf84a BACKPORT: sched/fair: Make EAS wakeup placement consider uclamp restrictions
task_fits_capacity() has just been made uclamp-aware, and
find_energy_efficient_cpu() needs to go through the same treatment.

Things are somewhat different here however - using the task max clamp isn't
sufficient. Consider the following setup:

  The target runqueue, rq:
    rq.cpu_capacity_orig = 512
    rq.cfs.avg.util_avg = 200
    rq.uclamp.max = 768 // the max p.uclamp.max of all enqueued p's is 768

  The waking task, p (not yet enqueued on rq):
    p.util_est = 600
    p.uclamp.max = 100

Now, consider the following code which doesn't use the rq clamps:

  util = uclamp_task_util(p);
  // Does the task fit in the spare CPU capacity?
  cpu = cpu_of(rq);
  fits_capacity(util, cpu_capacity(cpu) - cpu_util(cpu))

This would lead to:

  util = 100;
  fits_capacity(100, 512 - 200)

fits_capacity() would return true. However, enqueuing p on that CPU *will*
cause it to become overutilized since rq clamp values are max-aggregated,
so we'd remain with

  rq.uclamp.max = 768

which comes from the other tasks already enqueued on rq. Thus, we could
select a high enough frequency to reach beyond 0.8 * 512 utilization
(== overutilized) after enqueuing p on rq. What find_energy_efficient_cpu()
needs here is uclamp_rq_util_with() which lets us peek at the future
utilization landscape, including rq-wide uclamp values.

Make find_energy_efficient_cpu() use uclamp_rq_util_with() for its
fits_capacity() check. This is in line with what compute_energy() ends up
using for estimating utilization.

[QP: moved changes to select_cpu_candidates(), which is the equivalent
to the mainline path, and fix missing dependency on fits_capacity() by
using the open coded version]

Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-6-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1d42509e47)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Ibe1643cd5e6c97daceceae9733344e54bf4a4857
2020-02-01 16:14:11 +00:00
Valentin Schneider
50262f741b BACKPORT: sched/fair: Make task_fits_capacity() consider uclamp restrictions
task_fits_capacity() drives CPU selection at wakeup time, and is also used
to detect misfit tasks. Right now it does so by comparing task_util_est()
with a CPU's capacity, but doesn't take into account uclamp restrictions.

There's a few interesting uses that can come out of doing this. For
instance, a low uclamp.max value could prevent certain tasks from being
flagged as misfit tasks, so they could merrily remain on low-capacity CPUs.
Similarly, a high uclamp.min value would steer tasks towards high capacity
CPUs at wakeup (and, should that fail, later steered via misfit balancing),
so such "boosted" tasks would favor CPUs of higher capacity.

Introduce uclamp_task_util() and make task_fits_capacity() use it.

[QP: fixed missing dependency on fits_capacity() by using the open coded
alternative]

Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-5-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit a7008c07a5)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Iabde2eda7252c3bcc273e61260a7a12a7de991b1
2020-02-01 16:14:11 +00:00
Patrick Bellasi
f609a2239f ANDROID: sched/core: Move SchedTune task API into UtilClamp wrappers
The main SchedTune API calls realted to task tuning attributes are now
wrapped by more generic and mainlinish UtilClamp calls.

The new APIs are:

 - uclamp_task(p)               <= boosted_task_util(p)
 - uclamp_boosted(p)            <= schedtune_task_boost(p) > 0
 - uclamp_latency_sensitive(p)  <= schedtune_prefer_idle(p)

Let's provide also an implementation of the same API based on the new
uclamp.uclamp_latency_sensitive flag.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
[Modified the patch to use uclamp.latency_sensitive instead mainline
attributes]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ib1a6902e1c07a82a370e36bf1776d895b7528cbc
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 16:14:11 +00:00
Quentin Perret
752b47b84d ANDROID: sched/core: Add a latency-sensitive flag to uclamp
Add a 'latency_sensitive' flag to uclamp in order to express the need
for some tasks to find a CPU where they can wake-up quickly. This is not
expected to be used without cgroup support, so add solely a cgroup
interface for it.

As this flag represents a boolean attribute and not an amount of
resources to be shared, it is not clear what the delegation logic should
be. As such, it is kept simple: every new cgroup starts with
latency_sensitive set to false, regardless of the parent.

In essence, this is similar to SchedTune's prefer-idle flag which was
used in android-4.19 and prior.

Bug: 120440300
Change-Id: I722d8ecabb428bb7b95a5b54bc70a87f182dde2a
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
(cherry picked from commit ad7dd648fc7dbe11f23673a3463af2468a274998)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:17 +00:00
Patrick Bellasi
9a05300da0 ANDROID: sched/tune: Move SchedTune cpu API into UtilClamp wrappers
The SchedTune CPU boosting API is currently used from sugov_get_util()
to get the boosted utilization and to pass it into schedutil_cpu_util().

When UtilClamp is in use instead we call schedutil_cpu_util() by
passing in just the CFS utilization and the clamping is done internally
on the aggregated CFS+RT utilization for FREQUENCY_UTIL calls.

This asymmetry is not required moreover, schedutil code is polluted by
non-mainline SchedTune code.

Wrap SchedTune API call related to cpu utilization boosting with a more
generic and mainlinish UtilClamp call:

 - uclamp_rq_util_with(cpu, util, p)  <= boosted_cpu_util(cpu)

This new API is already used in schedutil_cpu_util() to clamp the
aggregated RT+CFS utilization on FREQUENCY_UTIL calls.

Move the cpu boosting into uclamp_rq_util_with() so that we remove any
SchedTune specific bit from kernel/sched/cpufreq_schedutil.c.

Get rid of the no more required boosted_cpu_util(cpu) method and replace
it with a stune_util(cpu, util) which signature is better aligned with
its uclamp_rq_util_with(cpu, util, p) counterpart.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I45b0f0f54123fe0a2515fa9f1683842e6b99234f
[Removed superfluous __maybe_unused for capacity_orig_of]
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:17 +00:00
Li Guanglei
7e1c333ed1 FROMGIT: sched/core: Fix size of rq::uclamp initialization
rq::uclamp is an array of struct uclamp_rq, make sure we clear the
whole thing.

Bug: 120440300
Fixes: 69842cba9a ("sched/uclamp: Add CPU's clamp buckets refcountinga")
Signed-off-by: Li Guanglei <guanglei.li@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lkml.kernel.org/r/1577259844-12677-1-git-send-email-guangleix.li@gmail.com
(cherry picked from commit dcd6dffb0a
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Id36a2b77c45e586535e8fadfb7d66868ca8fe8c7
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Qais Yousef
45b9d34bec FROMGIT: sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
When a new cgroup is created, the effective uclamp value wasn't updated
with a call to cpu_util_update_eff() that looks at the hierarchy and
update to the most restrictive values.

Fix it by ensuring to call cpu_util_update_eff() when a new cgroup
becomes online.

Without this change, the newly created cgroup uses the default
root_task_group uclamp values, which is 1024 for both uclamp_{min, max},
which will cause the rq to to be clamped to max, hence cause the
system to run at max frequency.

The problem was observed on Ubuntu server and was reproduced on Debian
and Buildroot rootfs.

By default, Ubuntu and Debian create a cpu controller cgroup hierarchy
and add all tasks to it - which creates enough noise to keep the rq
uclamp value at max most of the time. Imitating this behavior makes the
problem visible in Buildroot too which otherwise looks fine since it's a
minimal userspace.

Bug: 120440300
Fixes: 0b60ba2dd3 ("sched/uclamp: Propagate parent clamps")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Doug Smythies <dsmythies@telus.net>
Link: https://lore.kernel.org/lkml/000701d5b965$361b6c60$a2524520$@net/
(cherry picked from commit 7226017ad3
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I9636c60e04d58bbfc5041df1305b34a12b5a3f46
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Valentin Schneider
f59dfad8f9 FROMGIT: sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
The current helper returns (CPU) rq utilization with uclamp restrictions
taken into account. A uclamp task utilization helper would be quite
helpful, but this requires some renaming.

Prepare the code for the introduction of a uclamp_task_util() by renaming
the existing uclamp_util_with() to uclamp_rq_util_with().

Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-4-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit d2b58a286e
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I3e7146b788e079e400167203df5e5dadee2fd232
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Valentin Schneider
254e090f3a FROMGIT: sched/uclamp: Make uclamp util helpers use and return UL values
Vincent pointed out recently that the canonical type for utilization
values is 'unsigned long'. Internally uclamp uses 'unsigned int' values for
cache optimization, but this doesn't have to be exported to its users.

Make the uclamp helpers that deal with utilization use and return unsigned
long values.

Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-3-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 686516b55e
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Id3837f12237e5b77eb3a236bd32457dcd7de743e
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Valentin Schneider
6477d90135 FROMGIT: sched/uclamp: Remove uclamp_util()
The sole user of uclamp_util(), schedutil_cpu_util(), was made to use
uclamp_util_with() instead in commit:

  af24bde8df ("sched/uclamp: Add uclamp support to energy_compute()")

From then on, uclamp_util() has remained unused. Being a simple wrapper
around uclamp_util_with(), we can get rid of it and win back a few lines.

Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Suggested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-2-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 59fe675248
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I11dbff80c6c4be9666438800b2527aca8cd24025
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Qais Yousef
cdadd91444 BACKPORT: sched/rt: Make RT capacity-aware
Capacity Awareness refers to the fact that on heterogeneous systems
(like Arm big.LITTLE), the capacity of the CPUs is not uniform, hence
when placing tasks we need to be aware of this difference of CPU
capacities.

In such scenarios we want to ensure that the selected CPU has enough
capacity to meet the requirement of the running task. Enough capacity
means here that capacity_orig_of(cpu) >= task.requirement.

The definition of task.requirement is dependent on the scheduling class.

For CFS, utilization is used to select a CPU that has >= capacity value
than the cfs_task.util.

	capacity_orig_of(cpu) >= cfs_task.util

DL isn't capacity aware at the moment but can make use of the bandwidth
reservation to implement that in a similar manner CFS uses utilization.
The following patchset implements that:

https://lore.kernel.org/lkml/20190506044836.2914-1-luca.abeni@santannapisa.it/

	capacity_orig_of(cpu)/SCHED_CAPACITY >= dl_deadline/dl_runtime

For RT we don't have a per task utilization signal and we lack any
information in general about what performance requirement the RT task
needs. But with the introduction of uclamp, RT tasks can now control
that by setting uclamp_min to guarantee a minimum performance point.

ATM the uclamp value are only used for frequency selection; but on
heterogeneous systems this is not enough and we need to ensure that the
capacity of the CPU is >= uclamp_min. Which is what implemented here.

	capacity_orig_of(cpu) >= rt_task.uclamp_min

Note that by default uclamp.min is 1024, which means that RT tasks will
always be biased towards the big CPUs, which make for a better more
predictable behavior for the default case.

Must stress that the bias acts as a hint rather than a definite
placement strategy. For example, if all big cores are busy executing
other RT tasks we can't guarantee that a new RT task will be placed
there.

On non-heterogeneous systems the original behavior of RT should be
retained. Similarly if uclamp is not selected in the config.

[ mingo: Minor edits to comments. ]

Bug: 120440300
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191009104611.15363-1-qais.yousef@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 804d402fb6
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git)
[Qais: resolved minor conflict in kernel/sched/cpupri.c]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ifc9da1c47de1aec9b4d87be2614e4c8968366900
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Valentin Schneider
ea9ce42997 UPSTREAM: sched/uclamp: Fix overzealous type replacement
Some uclamp helpers had their return type changed from 'unsigned int' to
'enum uclamp_id' by commit

  0413d7f33e ("sched/uclamp: Always use 'enum uclamp_id' for clamp_id values")

but it happens that some do return a value in the [0, SCHED_CAPACITY_SCALE]
range, which should really be unsigned int. The affected helpers are
uclamp_none(), uclamp_rq_max_value() and uclamp_eff_value(). Fix those up.

Note that this doesn't lead to any obj diff using a relatively recent
aarch64 compiler (8.3-2019.03). The current code of e.g. uclamp_eff_value()
properly returns an 11 bit value (bits_per(1024)) and doesn't seem to do
anything funny. I'm still marking this as fixing the above commit to be on
the safe side.

Bug: 120440300
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: patrick.bellasi@matbug.net
Cc: qperret@google.com
Cc: surenb@google.com
Cc: tj@kernel.org
Fixes: 0413d7f33e ("sched/uclamp: Always use 'enum uclamp_id' for clamp_id values")
Link: https://lkml.kernel.org/r/20191115103908.27610-1-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7763baace1)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I924a99c125372a8fca81cb4bc0c82e6a7183fc8a
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:15 +00:00
Qais Yousef
7125c7cfca UPSTREAM: sched/uclamp: Fix incorrect condition
uclamp_update_active() should perform the update when
p->uclamp[clamp_id].active is true. But when the logic was inverted in
[1], the if condition wasn't inverted correctly too.

[1] https://lore.kernel.org/lkml/20190902073836.GO2369@hirez.programming.kicks-ass.net/

Bug: 120440300
Reported-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Patrick Bellasi <patrick.bellasi@matbug.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: babbe170e0 ("sched/uclamp: Update CPU's refcount on TG's clamp changes")
Link: https://lkml.kernel.org/r/20191114211052.15116-1-qais.yousef@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 6e1ff0773f)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I51b58a6089290277e08a0aaa72b86f852eec1512
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:15 +00:00