Commit graph

252 commits

Author SHA1 Message Date
Quentin Perret
2a557de670 UPSTREAM: sched/topology: Introduce a sysctl for Energy Aware Scheduling
In its current state, Energy Aware Scheduling (EAS) starts automatically
on asymmetric platforms having an Energy Model (EM). However, there are
users who want to have an EM (for thermal management for example), but
don't want EAS with it.

In order to let users disable EAS explicitly, introduce a new sysctl
called 'sched_energy_aware'. It is enabled by default so that EAS can
start automatically on platforms where it makes sense. Flipping it to 0
rebuilds the scheduling domains and disables EAS.

Bug: 120440300
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-11-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 8d5d0cfb63)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I4ca842d07b82869cfab7542c8c4351f631e1024d
2020-02-19 10:50:59 +00:00
Patrick Bellasi
613eecebf9 UPSTREAM: sched/uclamp: Add system default clamps
Tasks without a user-defined clamp value are considered not clamped
and by default their utilization can have any value in the
[0..SCHED_CAPACITY_SCALE] range.

Tasks with a user-defined clamp value are allowed to request any value
in that range, and the required clamp is unconditionally enforced.
However, a "System Management Software" could be interested in limiting
the range of clamp values allowed for all tasks.

Add a privileged interface to define a system default configuration via:

  /proc/sys/kernel/sched_uclamp_util_{min,max}

which works as an unconditional clamp range restriction for all tasks.

With the default configuration, the full SCHED_CAPACITY_SCALE range of
values is allowed for each clamp index. Otherwise, the task-specific
clamp is capped by the corresponding system default value.

Do that by tracking, for each task, the "effective" clamp value and
bucket the task has been refcounted in at enqueue time. This
allows to lazy aggregate "requested" and "system default" values at
enqueue time and simplifies refcounting updates at dequeue time.

The cached bucket ids are used to avoid (relatively) more expensive
integer divisions every time a task is enqueued.

An active flag is used to report when the "effective" value is valid and
thus the task is actually refcounted in the corresponding rq's bucket.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-5-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e8f14172c6)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I4f014c5ec9c312aaad606518f6e205fd0cfbcaa2
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:37 +00:00
Patrick Bellasi
d96bd1d5fc UPSTREAM: sched/uclamp: Add CPU's clamp buckets refcounting
Utilization clamping allows to clamp the CPU's utilization within a
[util_min, util_max] range, depending on the set of RUNNABLE tasks on
that CPU. Each task references two "clamp buckets" defining its minimum
and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp
bucket is active if there is at least one RUNNABLE tasks enqueued on
that CPU and refcounting that bucket.

When a task is {en,de}queued {on,from} a rq, the set of active clamp
buckets on that CPU can change. If the set of active clamp buckets
changes for a CPU a new "aggregated" clamp value is computed for that
CPU. This is because each clamp bucket enforces a different utilization
clamp value.

Clamp values are always MAX aggregated for both util_min and util_max.
This ensures that no task can affect the performance of other
co-scheduled tasks which are more boosted (i.e. with higher util_min
clamp) or less capped (i.e. with higher util_max clamp).

A task has:
   task_struct::uclamp[clamp_id]::bucket_id
to track the "bucket index" of the CPU's clamp bucket it refcounts while
enqueued, for each clamp index (clamp_id).

A runqueue has:
   rq::uclamp[clamp_id]::bucket[bucket_id].tasks
to track how many RUNNABLE tasks on that CPU refcount each
clamp bucket (bucket_id) of a clamp index (clamp_id).
It also has a:
   rq::uclamp[clamp_id]::bucket[bucket_id].value
to track the clamp value of each clamp bucket (bucket_id) of a clamp
index (clamp_id).

The rq::uclamp::bucket[clamp_id][] array is scanned every time it's
needed to find a new MAX aggregated clamp value for a clamp_id. This
operation is required only when it's dequeued the last task of a clamp
bucket tracking the current MAX aggregated clamp value. In this case,
the CPU is either entering IDLE or going to schedule a less boosted or
more clamped task.
The expected number of different clamp values configured at build time
is small enough to fit the full unordered array into a single cache
line, for configurations of up to 7 buckets.

Add to struct rq the basic data structures required to refcount the
number of RUNNABLE tasks for each clamp bucket. Add also the max
aggregation required to update the rq's clamp value at each
enqueue/dequeue event.

Use a simple linear mapping of clamp values into clamp buckets.
Pre-compute and cache bucket_id to avoid integer divisions at
enqueue/dequeue time.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 69842cba9a)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I2c2c23572fb82e004f815cc9c783881355df6836
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:37 +00:00
Greg Kroah-Hartman
c8a465e614 This is the 4.19.92 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4La9gACgkQONu9yGCS
 aT6hlA//TDpj9rdEwkaKyg/Ge4TCOJSOiwlp2/5lg2Sroiuizz527hVybGOOYAHl
 gMA2Syt73PWStyfgl5B3AimcBvPADX8h/b1KiSoIdHFkq5rPFyneB6aEj+5jSK1V
 63UnnTV0T49wt0Jvs6nN0FxI4ZCXbfjzaSVz4BGIflz6h9UUkPAu91CJTKtPmrAp
 pliH20cMOykxyS/KfKa6zDcpIfU0k+DxL5U0Y5F1YRDKc1iPg8e6I3cNLgwKSja6
 21BgdoTyZdvbC85HxSY7V6Dswp4YQPBY3y8crp8npZ9apbYV7eNU3L1+WVQvxpFg
 kahhyjalqwqkKq+cTEsIFj7cjPksSlH/qytTS+lnN3BScXbFPp8GdzIazhQNSCv3
 S/7T51CcvNoVcs9Qeu+nwyvx+H1LH4MYO4C7RYWZhPnMcA+/MxvT5WXNKfjf2ekM
 N5h8xNATllzDuDkX+zVwW8i80SCyhVqQIKbXLn8ugGYW3G5TNdy8Ysh0kdrq26Y+
 LAELsbQhK/Kt8WF+XNBpb9LLbeUGn1GTwhnbEuD7IKI+bVxnmsGk8QUu3h+a9xFh
 lI7bsj8Ku9T+59/9xqAnoStEto+0tdTPB9Cx1jNdWlLiVdkewiDKiUbloFpDFS1n
 L3SvqB68DC/IznQcK970g3aIx9zbkb2KZRdj2Fu7apaY5D9q85I=
 =W+5k
 -----END PGP SIGNATURE-----

Merge 4.19.92 into android-4.19

Changes in 4.19.92
	af_packet: set defaule value for tmo
	fjes: fix missed check in fjes_acpi_add
	mod_devicetable: fix PHY module format
	net: dst: Force 4-byte alignment of dst_metrics
	net: gemini: Fix memory leak in gmac_setup_txqs
	net: hisilicon: Fix a BUG trigered by wrong bytes_compl
	net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive()
	net: qlogic: Fix error paths in ql_alloc_large_buffers()
	net: usb: lan78xx: Fix suspend/resume PHY register access error
	qede: Disable hardware gro when xdp prog is installed
	qede: Fix multicast mac configuration
	sctp: fully initialize v4 addr in some functions
	selftests: forwarding: Delete IPv6 address at the end
	btrfs: don't double lock the subvol_sem for rename exchange
	btrfs: do not call synchronize_srcu() in inode_tree_del
	Btrfs: fix missing data checksums after replaying a log tree
	btrfs: send: remove WARN_ON for readonly mount
	btrfs: abort transaction after failed inode updates in create_subvol
	btrfs: skip log replay on orphaned roots
	btrfs: do not leak reloc root if we fail to read the fs root
	btrfs: handle ENOENT in btrfs_uuid_tree_iterate
	Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
	ALSA: pcm: Avoid possible info leaks from PCM stream buffers
	ALSA: hda/ca0132 - Keep power on during processing DSP response
	ALSA: hda/ca0132 - Avoid endless loop
	ALSA: hda/ca0132 - Fix work handling in delayed HP detection
	drm: mst: Fix query_payload ack reply struct
	drm/panel: Add missing drm_panel_init() in panel drivers
	drm/bridge: analogix-anx78xx: silence -EPROBE_DEFER warnings
	iio: light: bh1750: Resolve compiler warning and make code more readable
	drm/amdgpu: grab the id mgr lock while accessing passid_mapping
	spi: Add call to spi_slave_abort() function when spidev driver is released
	staging: rtl8192u: fix multiple memory leaks on error path
	staging: rtl8188eu: fix possible null dereference
	rtlwifi: prevent memory leak in rtl_usb_probe
	libertas: fix a potential NULL pointer dereference
	ath10k: fix backtrace on coredump
	IB/iser: bound protection_sg size by data_sg size
	media: am437x-vpfe: Setting STD to current value is not an error
	media: i2c: ov2659: fix s_stream return value
	media: ov6650: Fix crop rectangle alignment not passed back
	media: i2c: ov2659: Fix missing 720p register config
	media: ov6650: Fix stored frame format not in sync with hardware
	media: ov6650: Fix stored crop rectangle not in sync with hardware
	tools/power/cpupower: Fix initializer override in hsw_ext_cstates
	media: venus: core: Fix msm8996 frequency table
	ath10k: fix offchannel tx failure when no ath10k_mac_tx_frm_has_freq
	pinctrl: devicetree: Avoid taking direct reference to device name string
	drm/amdkfd: fix a potential NULL pointer dereference (v2)
	selftests/bpf: Correct path to include msg + path
	media: venus: Fix occasionally failures to suspend
	usb: renesas_usbhs: add suspend event support in gadget mode
	hwrng: omap3-rom - Call clk_disable_unprepare() on exit only if not idled
	regulator: max8907: Fix the usage of uninitialized variable in max8907_regulator_probe()
	media: flexcop-usb: fix NULL-ptr deref in flexcop_usb_transfer_init()
	media: cec-funcs.h: add status_req checks
	drm/bridge: dw-hdmi: Refuse DDC/CI transfers on the internal I2C controller
	samples: pktgen: fix proc_cmd command result check logic
	block: Fix writeback throttling W=1 compiler warnings
	mwifiex: pcie: Fix memory leak in mwifiex_pcie_init_evt_ring
	drm/drm_vblank: Change EINVAL by the correct errno
	media: cx88: Fix some error handling path in 'cx8800_initdev()'
	media: ti-vpe: vpe: Fix Motion Vector vpdma stride
	media: ti-vpe: vpe: fix a v4l2-compliance warning about invalid pixel format
	media: ti-vpe: vpe: fix a v4l2-compliance failure about frame sequence number
	media: ti-vpe: vpe: Make sure YUYV is set as default format
	media: ti-vpe: vpe: fix a v4l2-compliance failure causing a kernel panic
	media: ti-vpe: vpe: ensure buffers are cleaned up properly in abort cases
	media: ti-vpe: vpe: fix a v4l2-compliance failure about invalid sizeimage
	syscalls/x86: Use the correct function type in SYSCALL_DEFINE0
	drm/amd/display: Fix dongle_caps containing stale information.
	extcon: sm5502: Reset registers during initialization
	x86/mm: Use the correct function type for native_set_fixmap()
	ath10k: Correct error handling of dma_map_single()
	drm/bridge: dw-hdmi: Restore audio when setting a mode
	perf test: Report failure for mmap events
	perf report: Add warning when libunwind not compiled in
	usb: usbfs: Suppress problematic bind and unbind uevents.
	iio: adc: max1027: Reset the device at probe time
	Bluetooth: missed cpu_to_le16 conversion in hci_init4_req
	Bluetooth: Workaround directed advertising bug in Broadcom controllers
	Bluetooth: hci_core: fix init for HCI_USER_CHANNEL
	bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()
	x86/mce: Lower throttling MCE messages' priority to warning
	perf tests: Disable bp_signal testing for arm64
	drm/gma500: fix memory disclosures due to uninitialized bytes
	rtl8xxxu: fix RTL8723BU connection failure issue after warm reboot
	ipmi: Don't allow device module unload when in use
	x86/ioapic: Prevent inconsistent state when moving an interrupt
	media: smiapp: Register sensor after enabling runtime PM on the device
	md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit
	arm64: psci: Reduce the waiting time for cpu_psci_cpu_kill()
	i40e: initialize ITRN registers with correct values
	net: phy: dp83867: enable robust auto-mdix
	drm/tegra: sor: Use correct SOR index on Tegra210
	spi: sprd: adi: Add missing lock protection when rebooting
	ACPI: button: Add DMI quirk for Medion Akoya E2215T
	RDMA/qedr: Fix memory leak in user qp and mr
	gpu: host1x: Allocate gather copy for host1x
	net: dsa: LAN9303: select REGMAP when LAN9303 enable
	phy: qcom-usb-hs: Fix extcon double register after power cycle
	s390/time: ensure get_clock_monotonic() returns monotonic values
	s390/mm: add mm_pxd_folded() checks to pxd_free()
	net: hns3: add struct netdev_queue debug info for TX timeout
	libata: Ensure ata_port probe has completed before detach
	loop: fix no-unmap write-zeroes request behavior
	pinctrl: sh-pfc: sh7734: Fix duplicate TCLK1_B
	iio: dln2-adc: fix iio_triggered_buffer_postenable() position
	libbpf: Fix error handling in bpf_map__reuse_fd()
	Bluetooth: Fix advertising duplicated flags
	pinctrl: amd: fix __iomem annotation in amd_gpio_irq_handler()
	ixgbe: protect TX timestamping from API misuse
	media: rcar_drif: fix a memory disclosure
	media: v4l2-core: fix touch support in v4l_g_fmt
	nvmem: imx-ocotp: reset error status on probe
	rfkill: allocate static minor
	bnx2x: Fix PF-VF communication over multi-cos queues.
	spi: img-spfi: fix potential double release
	ALSA: timer: Limit max amount of slave instances
	rtlwifi: fix memory leak in rtl92c_set_fw_rsvdpagepkt()
	perf probe: Fix to find range-only function instance
	perf probe: Fix to list probe event with correct line number
	perf jevents: Fix resource leak in process_mapfile() and main()
	perf probe: Walk function lines in lexical blocks
	perf probe: Fix to probe an inline function which has no entry pc
	perf probe: Fix to show ranges of variables in functions without entry_pc
	perf probe: Fix to show inlined function callsite without entry_pc
	libsubcmd: Use -O0 with DEBUG=1
	perf probe: Fix to probe a function which has no entry pc
	perf tools: Splice events onto evlist even on error
	drm/amdgpu: disallow direct upload save restore list from gfx driver
	drm/amdgpu: fix potential double drop fence reference
	xen/gntdev: Use select for DMA_SHARED_BUFFER
	perf parse: If pmu configuration fails free terms
	perf probe: Skip overlapped location on searching variables
	perf probe: Return a better scope DIE if there is no best scope
	perf probe: Fix to show calling lines of inlined functions
	perf probe: Skip end-of-sequence and non statement lines
	perf probe: Filter out instances except for inlined subroutine and subprogram
	ath10k: fix get invalid tx rate for Mesh metric
	fsi: core: Fix small accesses and unaligned offsets via sysfs
	media: pvrusb2: Fix oops on tear-down when radio support is not present
	soundwire: intel: fix PDI/stream mapping for Bulk
	crypto: atmel - Fix authenc support when it is set to m
	ice: delay less
	media: si470x-i2c: add missed operations in remove
	EDAC/ghes: Fix grain calculation
	spi: pxa2xx: Add missed security checks
	ASoC: rt5677: Mark reg RT5677_PWR_ANLG2 as volatile
	iio: dac: ad5446: Add support for new AD5600 DAC
	ASoC: Intel: kbl_rt5663_rt5514_max98927: Add dmic format constraint
	s390/disassembler: don't hide instruction addresses
	nvme: Discard workaround for non-conformant devices
	parport: load lowlevel driver if ports not found
	bcache: fix static checker warning in bcache_device_free()
	cpufreq: Register drivers only after CPU devices have been registered
	x86/crash: Add a forward declaration of struct kimage
	tracing: use kvcalloc for tgid_map array allocation
	tracing/kprobe: Check whether the non-suffixed symbol is notrace
	bcache: fix deadlock in bcache_allocator
	iwlwifi: mvm: fix unaligned read of rx_pkt_status
	ASoC: wm8904: fix regcache handling
	spi: tegra20-slink: add missed clk_unprepare
	tun: fix data-race in gro_normal_list()
	crypto: virtio - deal with unsupported input sizes
	mmc: tmio: Add MMC_CAP_ERASE to allow erase/discard/trim requests
	btrfs: don't prematurely free work in end_workqueue_fn()
	btrfs: don't prematurely free work in run_ordered_work()
	ASoC: wm2200: add missed operations in remove and probe failure
	spi: st-ssc4: add missed pm_runtime_disable
	ASoC: wm5100: add missed pm_runtime_disable
	ASoC: Intel: bytcr_rt5640: Update quirk for Acer Switch 10 SW5-012 2-in-1
	x86/insn: Add some Intel instructions to the opcode map
	brcmfmac: remove monitor interface when detaching
	iwlwifi: check kasprintf() return value
	fbtft: Make sure string is NULL terminated
	net: ethernet: ti: ale: clean ale tbl on init and intf restart
	crypto: sun4i-ss - Fix 64-bit size_t warnings
	crypto: sun4i-ss - Fix 64-bit size_t warnings on sun4i-ss-hash.c
	mac80211: consider QoS Null frames for STA_NULLFUNC_ACKED
	crypto: vmx - Avoid weird build failures
	libtraceevent: Fix memory leakage in copy_filter_type
	mips: fix build when "48 bits virtual memory" is enabled
	drm/amdgpu: fix bad DMA from INTERRUPT_CNTL2
	net: phy: initialise phydev speed and duplex sanely
	btrfs: don't prematurely free work in reada_start_machine_worker()
	btrfs: don't prematurely free work in scrub_missing_raid56_worker()
	Revert "mmc: sdhci: Fix incorrect switch to HS mode"
	mmc: mediatek: fix CMD_TA to 2 for MT8173 HS200/HS400 mode
	can: kvaser_usb: kvaser_usb_leaf: Fix some info-leaks to USB devices
	usb: xhci: Fix build warning seen with CONFIG_PM=n
	drm/amdgpu: fix uninitialized variable pasid_mapping_needed
	s390/ftrace: fix endless recursion in function_graph tracer
	btrfs: return error pointer from alloc_test_extent_buffer
	usbip: Fix receive error in vhci-hcd when using scatter-gather
	usbip: Fix error path of vhci_recv_ret_submit()
	cpufreq: Avoid leaving stale IRQ work items during CPU offline
	USB: EHCI: Do not return -EPIPE when hub is disconnected
	intel_th: pci: Add Comet Lake PCH-V support
	intel_th: pci: Add Elkhart Lake SOC support
	platform/x86: hp-wmi: Make buffer for HPWMI_FEATURE2_QUERY 128 bytes
	staging: comedi: gsc_hpdi: check dma_alloc_coherent() return value
	ext4: fix ext4_empty_dir() for directories with holes
	ext4: check for directory entries too close to block end
	ext4: unlock on error in ext4_expand_extra_isize()
	KVM: arm64: Ensure 'params' is initialised when looking up sys register
	x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure()
	x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]
	powerpc/vcpu: Assume dedicated processors as non-preempt
	powerpc/irq: fix stack overflow verification
	mmc: sdhci-msm: Correct the offset and value for DDR_CONFIG register
	mmc: sdhci-of-esdhc: Revert "mmc: sdhci-of-esdhc: add erratum A-009204 support"
	mmc: sdhci: Update the tuning failed messages to pr_debug level
	mmc: sdhci-of-esdhc: fix P2020 errata handling
	mmc: sdhci: Workaround broken command queuing on Intel GLK
	mmc: sdhci: Add a quirk for broken command queuing
	nbd: fix shutdown and recv work deadlock v2
	perf probe: Fix to show function entry line as probe-able
	Linux 4.19.92

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic4c7f9c713549ebb3319cd0275e88678bfa0e53d
2019-12-31 17:11:54 +01:00
Rafael J. Wysocki
e82f0540a0 cpufreq: Avoid leaving stale IRQ work items during CPU offline
commit 85572c2c4a upstream.

The scheduler code calling cpufreq_update_util() may run during CPU
offline on the target CPU after the IRQ work lists have been flushed
for it, so the target CPU should be prevented from running code that
may queue up an IRQ work item on it at that point.

Unfortunately, that may not be the case if dvfs_possible_from_any_cpu
is set for at least one cpufreq policy in the system, because that
allows the CPU going offline to run the utilization update callback
of the cpufreq governor on behalf of another (online) CPU in some
cases.

If that happens, the cpufreq governor callback may queue up an IRQ
work on the CPU running it, which is going offline, and the IRQ work
may not be flushed after that point.  Moreover, that IRQ work cannot
be flushed until the "offlining" CPU goes back online, so if any
other CPU calls irq_work_sync() to wait for the completion of that
IRQ work, it will have to wait until the "offlining" CPU is back
online and that may not happen forever.  In particular, a system-wide
deadlock may occur during CPU online as a result of that.

The failing scenario is as follows.  CPU0 is the boot CPU, so it
creates a cpufreq policy and becomes the "leader" of it
(policy->cpu).  It cannot go offline, because it is the boot CPU.
Next, other CPUs join the cpufreq policy as they go online and they
leave it when they go offline.  The last CPU to go offline, say CPU3,
may queue up an IRQ work while running the governor callback on
behalf of CPU0 after leaving the cpufreq policy because of the
dvfs_possible_from_any_cpu effect described above.  Then, CPU0 is
the only online CPU in the system and the stale IRQ work is still
queued on CPU3.  When, say, CPU1 goes back online, it will run
irq_work_sync() to wait for that IRQ work to complete and so it
will wait for CPU3 to go back online (which may never happen even
in principle), but (worse yet) CPU0 is waiting for CPU1 at that
point too and a system-wide deadlock occurs.

To address this problem notice that CPUs which cannot run cpufreq
utilization update code for themselves (for example, because they
have left the cpufreq policies that they belonged to), should also
be prevented from running that code on behalf of the other CPUs that
belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so
in that case the cpufreq_update_util_data pointer of the CPU running
the code must not be NULL as well as for the CPU which is the target
of the cpufreq utilization update in progress.

Accordingly, change cpufreq_this_cpu_can_update() into a regular
function in kernel/sched/cpufreq.c (instead of a static inline in a
header file) and make it check the cpufreq_update_util_data pointer
of the local CPU if dvfs_possible_from_any_cpu is set for the target
cpufreq policy.

Also update the schedutil governor to do the
cpufreq_this_cpu_can_update() check in the non-fast-switch
case too to avoid the stale IRQ work issues.

Fixes: 99d14d0e16 ("cpufreq: Process remote callbacks from any CPU if the platform permits")
Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/
Reported-by: Anson Huang <anson.huang@nxp.com>
Tested-by: Anson Huang <anson.huang@nxp.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-31 16:36:22 +01:00
Greg Kroah-Hartman
291d853dff This is the 4.19.88 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl3owgEACgkQONu9yGCS
 aT43zw//SS1As83XXuHr4mdWIVDjXo6RMJ6Ib7YbRi/uhBmQuUuGVFcqGxUIA9Kl
 eSXu5Kt8TNmInzHq9AMYgegrELAEwPD2XfptALGDwiUHonQuiFaqOQn/bltJOm1L
 PsG15A7+/gFhuhPJDp2ZfNBmZGdpXdIwD27oUDqF1XD64dMa/HPbFUVgxWn3HHkd
 sm0J6Ez0eNA+BmLnHXYDiSaEYIiwvy1nN6XpyIfOyb2Tz6kPoe0vVWU00Cmy8KAU
 EIWB+TBRunspgMsShL5Cl1MSFOxf9QOmgnZxcrODAQfb1TbLMACB1FGMjK4nLm+3
 wPlSnC7L49ARl/pvmN5NOUrjHi8S8qq/Od9QW+UIckRI6KzOU832h99v4gFuHjSC
 KFiLi5K9+uTIMgNOETmINBiKKUcUzYXYVajvm4tuAUq3HO8wy6jeALtt34OiJZQZ
 DV8wyBdL9NDUFqBymFaMFA4Us/fGIREzvPgI0E0jth2ANuLFLtScrnStuWv8buwJ
 JT3V9xCxHZtZ3Ctevx/Jp6OaQtnbSnWjMjrO0UDzZ6N7+g5UKmh9/R3xL6sBpFVU
 Vu49J+qWU3VmbY3EIulel+yARNe7xS4ExK185JmNzpYFyOpXum14FHhhtQ6xNSeu
 dRqyITI0KYP7jWtBDKCgVAWF5jC9gHP1ksrHSZMhyGrv1dC1XZM=
 =KnJW
 -----END PGP SIGNATURE-----

Merge 4.19.88 into android-4.19

Changes in 4.19.88
	clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate
	clocksource/drivers/mediatek: Fix error handling
	ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX
	ASoC: compress: fix unsigned integer overflow check
	reset: Fix memory leak in reset_control_array_put()
	clk: samsung: exynos5433: Fix error paths
	ASoC: kirkwood: fix external clock probe defer
	ASoC: kirkwood: fix device remove ordering
	clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume
	pinctrl: cherryview: Allocate IRQ chip dynamic
	ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts
	reset: fix reset_control_ops kerneldoc comment
	clk: at91: avoid sleeping early
	clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup
	clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18
	ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend
	samples/bpf: fix build by setting HAVE_ATTR_TEST to zero
	powerpc/bpf: Fix tail call implementation
	idr: Fix integer overflow in idr_for_each_entry
	idr: Fix idr_alloc_u32 on 32-bit systems
	x86/resctrl: Prevent NULL pointer dereference when reading mondata
	clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call
	clk: ti: clkctrl: Fix failed to enable error with double udelay timeout
	net: fec: add missed clk_disable_unprepare in remove
	bridge: ebtables: don't crash when using dnat target in output chains
	can: peak_usb: report bus recovery as well
	can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open
	can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak
	can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max
	can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM
	can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors
	can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error
	can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error
	can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails
	can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition
	watchdog: meson: Fix the wrong value of left time
	ASoC: stm32: sai: add restriction on mmap support
	scripts/gdb: fix debugging modules compiled with hot/cold partitioning
	net: bcmgenet: use RGMII loopback for MAC reset
	net: bcmgenet: reapply manual settings to the PHY
	net: mscc: ocelot: fix __ocelot_rmw_ix prototype
	ceph: return -EINVAL if given fsc mount option on kernel w/o support
	net/fq_impl: Switch to kvmalloc() for memory allocation
	mac80211: fix station inactive_time shortly after boot
	block: drbd: remove a stray unlock in __drbd_send_protocol()
	pwm: bcm-iproc: Prevent unloading the driver module while in use
	scsi: target/tcmu: Fix queue_cmd_ring() declaration
	scsi: lpfc: Fix kernel Oops due to null pring pointers
	scsi: lpfc: Fix dif and first burst use in write commands
	ARM: dts: Fix up SQ201 flash access
	tracing: Lock event_mutex before synth_event_mutex
	ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed
	ARM: dts: imx51: Fix memory node duplication
	ARM: dts: imx53: Fix memory node duplication
	ARM: dts: imx31: Fix memory node duplication
	ARM: dts: imx35: Fix memory node duplication
	ARM: dts: imx7: Fix memory node duplication
	ARM: dts: imx6ul: Fix memory node duplication
	ARM: dts: imx6sx: Fix memory node duplication
	ARM: dts: imx6sl: Fix memory node duplication
	ARM: dts: imx50: Fix memory node duplication
	ARM: dts: imx23: Fix memory node duplication
	ARM: dts: imx1: Fix memory node duplication
	ARM: dts: imx27: Fix memory node duplication
	ARM: dts: imx25: Fix memory node duplication
	ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication
	parisc: Fix serio address output
	parisc: Fix HP SDC hpa address output
	ARM: dts: Fix hsi gdd range for omap4
	arm64: mm: Prevent mismatched 52-bit VA support
	arm64: smp: Handle errors reported by the firmware
	bus: ti-sysc: Check for no-reset and no-idle flags at the child level
	platform/x86: mlx-platform: Fix LED configuration
	ARM: OMAP1: fix USB configuration for device-only setups
	RDMA/hns: Fix the bug while use multi-hop of pbl
	arm64: preempt: Fix big-endian when checking preempt count in assembly
	RDMA/vmw_pvrdma: Use atomic memory allocation in create AH
	PM / AVS: SmartReflex: NULL check before some freeing functions is not needed
	xfs: zero length symlinks are not valid
	ARM: ks8695: fix section mismatch warning
	ACPI / LPSS: Ignore acpi_device_fix_up_power() return value
	scsi: lpfc: Enable Management features for IF_TYPE=6
	scsi: qla2xxx: Fix NPIV handling for FC-NVMe
	scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port
	nvme: provide fallback for discard alloc failure
	s390/zcrypt: make sysfs reset attribute trigger queue reset
	crypto: user - support incremental algorithm dumps
	arm64: dts: renesas: draak: Fix CVBS input
	mwifiex: fix potential NULL dereference and use after free
	mwifiex: debugfs: correct histogram spacing, formatting
	brcmfmac: set F2 watermark to 256 for 4373
	brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373
	rtl818x: fix potential use after free
	bcache: do not check if debug dentry is ERR or NULL explicitly on remove
	bcache: do not mark writeback_running too early
	xfs: require both realtime inodes to mount
	nvme: fix kernel paging oops
	ubifs: Fix default compression selection in ubifs
	ubi: Put MTD device after it is not used
	ubi: Do not drop UBI device reference before using
	microblaze: adjust the help to the real behavior
	microblaze: move "... is ready" messages to arch/microblaze/Makefile
	microblaze: fix multiple bugs in arch/microblaze/boot/Makefile
	iwlwifi: move iwl_nvm_check_version() into dvm
	iwlwifi: mvm: force TCM re-evaluation on TCM resume
	iwlwifi: pcie: fix erroneous print
	iwlwifi: pcie: set cmd_len in the correct place
	gpio: pca953x: Fix AI overflow on PCAL6524
	gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB
	kvm: vmx: Set IA32_TSC_AUX for legacy mode guests
	Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS"
	Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()"
	crypto/chelsio/chtls: listen fails with multiadapt
	VSOCK: bind to random port for VMADDR_PORT_ANY
	mmc: meson-gx: make sure the descriptor is stopped on errors
	mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET
	usb: ehci-omap: Fix deferred probe for phy handling
	btrfs: Check for missing device before bio submission in btrfs_map_bio
	btrfs: fix ncopies raid_attr for RAID56
	btrfs: dev-replace: set result code of cancel by status of scrub
	Btrfs: allow clear_extent_dirty() to receive a cached extent state record
	btrfs: only track ref_heads in delayed_ref_updates
	serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback
	HID: intel-ish-hid: fixes incorrect error handling
	gpio: raspberrypi-exp: decrease refcount on firmware dt node
	serial: 8250: Rate limit serial port rx interrupts during input overruns
	kprobes/x86/xen: blacklist non-attachable xen interrupt functions
	xen/pciback: Check dev_data before using it
	kprobes: Blacklist symbols in arch-defined prohibited area
	kprobes/x86: Show x86-64 specific blacklisted symbols correctly
	vfio-mdev/samples: Use u8 instead of char for handle functions
	memory: omap-gpmc: Get the header of the enum
	pinctrl: xway: fix gpio-hog related boot issues
	net/mlx5: Continue driver initialization despite debugfs failure
	netfilter: nf_nat_sip: fix RTP/RTCP source port translations
	exofs_mount(): fix leaks on failure exits
	bnxt_en: Return linux standard errors in bnxt_ethtool.c
	bnxt_en: Save ring statistics before reset.
	bnxt_en: query force speeds before disabling autoneg mode.
	KVM: s390: unregister debug feature on failing arch init
	pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width
	pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration
	pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10
	HID: doc: fix wrong data structure reference for UHID_OUTPUT
	dm flakey: Properly corrupt multi-page bios.
	gfs2: take jdata unstuff into account in do_grow
	dm raid: fix false -EBUSY when handling check/repair message
	xfs: Align compat attrlist_by_handle with native implementation.
	xfs: Fix bulkstat compat ioctls on x32 userspace.
	IB/qib: Fix an error code in qib_sdma_verbs_send()
	clocksource/drivers/fttmr010: Fix invalid interrupt register access
	vxlan: Fix error path in __vxlan_dev_create()
	powerpc/book3s/32: fix number of bats in p/v_block_mapped()
	powerpc/xmon: fix dump_segments()
	drivers/regulator: fix a missing check of return value
	Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading
	serial: max310x: Fix tx_empty() callback
	openrisc: Fix broken paths to arch/or32
	RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer
	scsi: qla2xxx: deadlock by configfs_depend_item
	scsi: csiostor: fix incorrect dma device in case of vport
	brcmfmac: Fix access point mode
	ath6kl: Only use match sets when firmware supports it
	ath6kl: Fix off by one error in scan completion
	powerpc/perf: Fix unit_sel/cache_sel checks
	powerpc/32: Avoid unsupported flags with clang
	powerpc/prom: fix early DEBUG messages
	powerpc/mm: Make NULL pointer deferences explicit on bad page faults.
	powerpc/44x/bamboo: Fix PCI range
	vfio/spapr_tce: Get rid of possible infinite loop
	powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
	drbd: ignore "all zero" peer volume sizes in handshake
	drbd: reject attach of unsuitable uuids even if connected
	drbd: do not block when adjusting "disk-options" while IO is frozen
	drbd: fix print_st_err()'s prototype to match the definition
	IB/rxe: Make counters thread safe
	bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't
	regulator: tps65910: fix a missing check of return value
	powerpc/83xx: handle machine check caused by watchdog timer
	powerpc/pseries: Fix node leak in update_lmb_associativity_index()
	powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y
	crypto: mxc-scc - fix build warnings on ARM64
	pwm: clps711x: Fix period calculation
	net/netlink_compat: Fix a missing check of nla_parse_nested
	net/net_namespace: Check the return value of register_pernet_subsys()
	f2fs: fix block address for __check_sit_bitmap
	f2fs: fix to dirty inode synchronously
	um: Include sys/uio.h to have writev()
	um: Make GCOV depend on !KCOV
	net: (cpts) fix a missing check of clk_prepare
	net: stmicro: fix a missing check of clk_prepare
	net: dsa: bcm_sf2: Propagate error value from mdio_write
	atl1e: checking the status of atl1e_write_phy_reg
	tipc: fix a missing check of genlmsg_put
	net: marvell: fix a missing check of acpi_match_device
	net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe()
	ocfs2: clear journal dirty flag after shutdown journal
	vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n
	mm/page_alloc.c: free order-0 pages through PCP in page_frag_free()
	mm/page_alloc.c: use a single function to free page
	mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free()
	tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures
	netfilter: nf_tables: fix a missing check of nla_put_failure
	xprtrdma: Prevent leak of rpcrdma_rep objects
	infiniband: bnxt_re: qplib: Check the return value of send_message
	infiniband/qedr: Potential null ptr dereference of qp
	firmware: arm_sdei: fix wrong of_node_put() in init function
	firmware: arm_sdei: Fix DT platform device creation
	lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk
	lib/genalloc.c: use vzalloc_node() to allocate the bitmap
	fork: fix some -Wmissing-prototypes warnings
	drivers/base/platform.c: kmemleak ignore a known leak
	lib/genalloc.c: include vmalloc.h
	mtd: Check add_mtd_device() ret code
	tipc: fix memory leak in tipc_nl_compat_publ_dump
	net/core/neighbour: tell kmemleak about hash tables
	ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs
	PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
	net/core/neighbour: fix kmemleak minimal reference count for hash tables
	serial: 8250: Fix serial8250 initialization crash
	gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change
	sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
	ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
	decnet: fix DN_IFREQ_SIZE
	net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
	net/smc: don't wait for send buffer space when data was already sent
	mm/hotplug: invalid PFNs from pfn_to_online_page()
	xfs: end sync buffer I/O properly on shutdown error
	net/smc: fix sender_free computation
	blktrace: Show requests without sector
	net/smc: fix byte_order for rx_curs_confirmed
	tipc: fix skb may be leaky in tipc_link_input
	ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI
	sfc: initialise found bitmap in efx_ef10_mtd_probe
	geneve: change NET_UDP_TUNNEL dependency to select
	net: fix possible overflow in __sk_mem_raise_allocated()
	net: ip_gre: do not report erspan_ver for gre or gretap
	net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap
	sctp: don't compare hb_timer expire date before starting it
	bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()
	mmc: core: align max segment size with logical block size
	net: dev: Use unsigned integer as an argument to left-shift
	kvm: properly check debugfs dentry before using it
	bpf: drop refcount if bpf_map_new_fd() fails in map_create()
	net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED
	net: hns3: fix PFC not setting problem for DCB module
	net: hns3: fix an issue for hclgevf_ae_get_hdev
	net: hns3: fix an issue for hns3_update_new_int_gl
	iommu/amd: Fix NULL dereference bug in match_hid_uid
	apparmor: delete the dentry in aafs_remove() to avoid a leak
	scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery
	ACPI / APEI: Don't wait to serialise with oops messages when panic()ing
	ACPI / APEI: Switch estatus pool to use vmalloc memory
	scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned
	scsi: libsas: Check SMP PHY control function result
	RDMA/hns: Fix the bug with updating rq head pointer when flush cqe
	RDMA/hns: Bugfix for the scene without receiver queue
	RDMA/hns: Fix the state of rereg mr
	RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
	ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board
	powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property()
	xdp: fix cpumap redirect SKB creation bug
	mtd: Remove a debug trace in mtdpart.c
	mm, gup: add missing refcount overflow checks on s390
	clk: at91: fix update bit maps on CFG_MOR write
	clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated()
	usb: dwc2: use a longer core rest timeout in dwc2_core_reset()
	staging: rtl8192e: fix potential use after free
	staging: rtl8723bs: Drop ACPI device ids
	staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids
	USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P
	mei: bus: prefix device names on bus with the bus name
	mei: me: add comet point V device id
	thunderbolt: Power cycle the router if NVM authentication fails
	xfrm: Fix memleak on xfrm state destroy
	media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE
	net: macb: fix error format in dev_err()
	pwm: Clear chip_data in pwm_put()
	media: atmel: atmel-isc: fix asd memory allocation
	media: atmel: atmel-isc: fix INIT_WORK misplacement
	macvlan: schedule bc_work even if error
	net: psample: fix skb_over_panic
	openvswitch: fix flow command message size
	sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook
	slip: Fix use-after-free Read in slip_open
	openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
	openvswitch: remove another BUG_ON()
	selftests: bpf: test_sockmap: handle file creation failures gracefully
	tipc: fix link name length check
	sctp: cache netns in sctp_ep_common
	net: sched: fix `tc -s class show` no bstats on class with nolock subqueues
	net: macb: add missed tasklet_kill
	ext4: add more paranoia checking in ext4_expand_extra_isize handling
	watchdog: sama5d4: fix WDD value to be always set to max
	net: macb: Fix SUBNS increment and increase resolution
	net: macb driver, check for SKBTX_HW_TSTAMP
	mtd: rawnand: atmel: Fix spelling mistake in error message
	mtd: rawnand: atmel: fix possible object reference leak
	mtd: spi-nor: cast to u64 to avoid uint overflows
	drm/atmel-hlcdc: revert shift by 8
	mailbox: stm32_ipcc: add spinlock to fix channels concurrent access
	tcp: exit if nothing to retransmit on RTO timeout
	HID: core: check whether Usage Page item is after Usage ID items
	crypto: stm32/hash - Fix hmac issue more than 256 bytes
	media: stm32-dcmi: fix DMA corruption when stopping streaming
	media: stm32-dcmi: fix check of pm_runtime_get_sync return value
	hwrng: stm32 - fix unbalanced pm_runtime_enable
	clk: stm32mp1: fix HSI divider flag
	clk: stm32mp1: fix mcu divider table
	clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks
	clk: stm32mp1: parent clocks update
	mailbox: mailbox-test: fix null pointer if no mmio
	pinctrl: stm32: fix memory leak issue
	ASoC: stm32: i2s: fix dma configuration
	ASoC: stm32: i2s: fix 16 bit format support
	ASoC: stm32: i2s: fix IRQ clearing
	ASoC: stm32: sai: add missing put_device()
	dmaengine: stm32-dma: check whether length is aligned on FIFO threshold
	platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer
	platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size
	net: fec: fix clock count mis-match
	Linux 4.19.88

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifd3801a77cb551be72788031e7fcfc8a1d4fd197
2019-12-05 12:02:49 +01:00
Yi Wang
fc38279ec5 fork: fix some -Wmissing-prototypes warnings
[ Upstream commit fb5bf31722 ]

We get a warning when building kernel with W=1:

  kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
  kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]

Add the missing declaration in head file to fix this.

Also, remove arch_release_thread_stack() completely because no arch
seems to implement it since bb9d81264 (arch: remove tile port).

Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05 09:21:04 +01:00
Greg Kroah-Hartman
ef55d5261c This is the 4.19.79 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2grBgACgkQONu9yGCS
 aT6xRBAA0pTW2W/VvzBHBLeVlmNtwQZb8x7civVb72iZkltKR9tTPim90PULpz/P
 iO7kh8KqkgVUqdgBE0VzkHGWUSThggfSTQiqzCqOgTwV8WQWqSF8ET0HU8zbglYB
 5pXSojoRYmurGVznd4Ll6aWa5brXIKwf1mDSrFHagOyOLxQmyggHaTRSLx36BSfj
 gunE2ideB1oTaPmd/2aTI03CU3jRwXmowe8rZIDa8pJEpplZPFdk0YOPXg2t6uRI
 bjJGO8bhfR/14r/3h76IwsEiVVXIcCeEVm0fos/H6NUypedfi7jlT0Ldzg1/zZti
 mUMkbPGHcJbOWfBYPQq8xQzviCa+MFraA4Tek5h/Lf7kf3NpjE20AnH3pb9TaqQf
 mJYUGziCoOOOz8k+0eNtIjIZiCysOnf9sI5rGhMYb9qfZoZGG6RiitqyVYNa+rzJ
 wvIUQZ4vSnYmQMAXqxyayfSZvFbMxv6pAdeH0NrXVRgFF6dnKG9TSsCnIuQaJxAE
 OQRaYEJktMUBs81hS0IjnJNDFLW3r++s87xEYvCt4L7XGSrxMJ3jW6xLZlmET68G
 4UIddJ81zIuqpGY1qoWdWZAp3nfRfSX4ehOnoNmIDyC9pRhiCKc+N6j5rX8gBNO/
 SO8YOaNf9RTphhEG6Op7u4ZbU+UR4pYP+rjKveyT2HKPH6D/Tv0=
 =wt6H
 -----END PGP SIGNATURE-----

Merge 4.19.79 into android-4.19

Changes in 4.19.79
	s390/process: avoid potential reading of freed stack
	KVM: s390: Test for bad access register and size at the start of S390_MEM_OP
	s390/topology: avoid firing events before kobjs are created
	s390/cio: exclude subchannels with no parent from pseudo check
	KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts
	KVM: PPC: Book3S HV: Check for MMU ready on piggybacked virtual cores
	KVM: PPC: Book3S HV: Don't lose pending doorbell request on migration on P9
	KVM: X86: Fix userspace set invalid CR4
	KVM: nVMX: handle page fault in vmread fix
	nbd: fix max number of supported devs
	PM / devfreq: tegra: Fix kHz to Hz conversion
	ASoC: Define a set of DAPM pre/post-up events
	ASoC: sgtl5000: Improve VAG power and mute control
	powerpc/mce: Fix MCE handling for huge pages
	powerpc/mce: Schedule work from irq_work
	powerpc/powernv: Restrict OPAL symbol map to only be readable by root
	powerpc/powernv/ioda: Fix race in TCE level allocation
	powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions
	can: mcp251x: mcp251x_hw_reset(): allow more time after a reset
	tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file
	crypto: qat - Silence smp_processor_id() warning
	crypto: skcipher - Unmap pages after an external error
	crypto: cavium/zip - Add missing single_release()
	crypto: caam - fix concurrency issue in givencrypt descriptor
	crypto: ccree - account for TEE not ready to report
	crypto: ccree - use the full crypt length value
	MIPS: Treat Loongson Extensions as ASEs
	power: supply: sbs-battery: use correct flags field
	power: supply: sbs-battery: only return health when battery present
	tracing: Make sure variable reference alias has correct var_ref_idx
	usercopy: Avoid HIGHMEM pfn warning
	timer: Read jiffies once when forwarding base clk
	PCI: vmd: Fix shadow offsets to reflect spec changes
	PCI: Restore Resizable BAR size bits correctly for 1MB BARs
	watchdog: imx2_wdt: fix min() calculation in imx2_wdt_set_timeout
	perf stat: Fix a segmentation fault when using repeat forever
	drm/omap: fix max fclk divider for omap36xx
	drm/msm/dsi: Fix return value check for clk_get_parent
	drm/nouveau/kms/nv50-: Don't create MSTMs for eDP connectors
	drm/i915/gvt: update vgpu workload head pointer correctly
	mmc: sdhci: improve ADMA error reporting
	mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
	Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
	xen/xenbus: fix self-deadlock after killing user process
	ieee802154: atusb: fix use-after-free at disconnect
	s390/cio: avoid calling strlen on null pointer
	cfg80211: initialize on-stack chandefs
	arm64: cpufeature: Detect SSBS and advertise to userspace
	ima: always return negative code for error
	ima: fix freeing ongoing ahash_request
	fs: nfs: Fix possible null-pointer dereferences in encode_attrs()
	9p: Transport error uninitialized
	9p: avoid attaching writeback_fid on mmap with type PRIVATE
	xen/pci: reserve MCFG areas earlier
	ceph: fix directories inode i_blkbits initialization
	ceph: reconnect connection if session hang in opening state
	watchdog: aspeed: Add support for AST2600
	netfilter: nf_tables: allow lookups in dynamic sets
	drm/amdgpu: Fix KFD-related kernel oops on Hawaii
	drm/amdgpu: Check for valid number of registers to read
	pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
	pwm: stm32-lp: Add check in case requested period cannot be achieved
	x86/purgatory: Disable the stackleak GCC plugin for the purgatory
	ntb: point to right memory window index
	thermal: Fix use-after-free when unregistering thermal zone device
	thermal_hwmon: Sanitize thermal_zone type
	libnvdimm/region: Initialize bad block for volatile namespaces
	fuse: fix memleak in cuse_channel_open
	libnvdimm/nfit_test: Fix acpi_handle redefinition
	sched/membarrier: Call sync_core only before usermode for same mm
	sched/membarrier: Fix private expedited registration check
	sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
	perf build: Add detection of java-11-openjdk-devel package
	kernel/elfcore.c: include proper prototypes
	perf unwind: Fix libunwind build failure on i386 systems
	nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
	drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed
	KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP
	KVM: nVMX: Fix consistency check on injected exception error code
	nbd: fix crash when the blksize is zero
	powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()
	powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag
	tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure
	tick: broadcast-hrtimer: Fix a race in bc_set_next
	perf tools: Fix segfault in cpu_cache_level__read()
	perf stat: Reset previous counts on repeat with interval
	riscv: Avoid interrupts being erroneously enabled in handle_exception()
	arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3
	KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe
	arm64: docs: Document SSBS HWCAP
	arm64: fix SSBS sanitization
	arm64: Add sysfs vulnerability show for spectre-v1
	arm64: add sysfs vulnerability show for meltdown
	arm64: enable generic CPU vulnerabilites support
	arm64: Always enable ssb vulnerability detection
	arm64: Provide a command line to disable spectre_v2 mitigation
	arm64: Advertise mitigation of Spectre-v2, or lack thereof
	arm64: Always enable spectre-v2 vulnerability detection
	arm64: add sysfs vulnerability show for spectre-v2
	arm64: add sysfs vulnerability show for speculative store bypass
	arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSB
	arm64: Force SSBS on context switch
	arm64: Use firmware to detect CPUs that are not affected by Spectre-v2
	arm64/speculation: Support 'mitigations=' cmdline option
	vfs: Fix EOVERFLOW testing in put_compat_statfs64
	coresight: etm4x: Use explicit barriers on enable/disable
	staging: erofs: fix an error handling in erofs_readdir()
	staging: erofs: some compressed cluster should be submitted for corrupted images
	staging: erofs: add two missing erofs_workgroup_put for corrupted images
	staging: erofs: detect potential multiref due to corrupted images
	cfg80211: add and use strongly typed element iteration macros
	cfg80211: Use const more consistently in for_each_element macros
	nl80211: validate beacon head
	Linux 4.19.79

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie4f85994b5f3e53658c42833d0dc712575d0902e
2019-10-11 19:13:57 +02:00
Mathieu Desnoyers
e250f2b6aa sched/membarrier: Call sync_core only before usermode for same mm
[ Upstream commit 2840cf02fa ]

When the prev and next task's mm change, switch_mm() provides the core
serializing guarantees before returning to usermode. The only case
where an explicit core serialization is needed is when the scheduler
keeps the same mm for prev and next.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-4-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11 18:21:21 +02:00
Greg Kroah-Hartman
844ecc4634 This is the 4.19.64 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1GibIACgkQONu9yGCS
 aT7z2hAAmv8AsH9IG43m7t6zLroJVswr/9594xk7yPBQgcY3/PW2aTFBCFbsdOL4
 yXcj2PSwRiq9K6qAJULrvOvncR9fIILHqzWzyXnoaZ30lR/FxaaFmuHZX/5Ix1tB
 e5EEE/EA49UAEjEDaMLq8g2IvibsReDxmSpnXyBJWoyRAdFIElVnMJ2+zvP/wRhF
 NKzQj/bj/qecCbis2lUCaVWJFZ6+P/52UbD8lvIwqR3nk2TKsGDcLU6eY3yg4KrB
 rEHl5T8KIPrkX3KNIEB8EcFREene+rdpZLLVe4fYwf+gOqfiFXSzZZvweauMkplq
 ehlVHkykvQvlsVM2tjBD379z3C4aasZDuMVNMCbAy2FlruLeBQ7gEn77mCJB9VH5
 /n/mlc2yizdoowtARCLWOUMfASpdSbqu2SQ7A/3kwG7l6GrpzKSIU2nQgm+41sUZ
 QJVtZ3IYsPoYjnU4B3JZzgJnf3M9jcRz/3JegviqhSEbF1gaScJX0cqN8C1idN/v
 ZAGCJK9S20/EEEsp5jn+bq2grUehvmD4TVDfot4P+5yRYyBIhMFpbM2RpjydOpwy
 +x8D1Q34LYPFgZfQ0vF62vcSBhMBiJ/7j41rUeo44K+Lg00F3yCOyL6FxK6S8h6j
 wsD0xLbllMrhV5KRYFizb3QbCHoHYiROIJk76uLvB+Tqq2Jg9VQ=
 =qIi2
 -----END PGP SIGNATURE-----

Merge 4.19.64 into android-4.19

Changes in 4.19.64
	hv_sock: Add support for delayed close
	vsock: correct removal of socket from the list
	NFS: Fix dentry revalidation on NFSv4 lookup
	NFS: Refactor nfs_lookup_revalidate()
	NFSv4: Fix lookup revalidate of regular files
	usb: dwc2: Disable all EP's on disconnect
	usb: dwc2: Fix disable all EP's on disconnect
	arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
	binder: fix possible UAF when freeing buffer
	ISDN: hfcsusb: checking idx of ep configuration
	media: au0828: fix null dereference in error path
	ath10k: Change the warning message string
	media: cpia2_usb: first wake up, then free in disconnect
	media: pvrusb2: use a different format for warnings
	NFS: Cleanup if nfs_match_client is interrupted
	media: radio-raremono: change devm_k*alloc to k*alloc
	iommu/vt-d: Don't queue_iova() if there is no flush queue
	iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
	Bluetooth: hci_uart: check for missing tty operations
	vhost: introduce vhost_exceeds_weight()
	vhost_net: fix possible infinite loop
	vhost: vsock: add weight support
	vhost: scsi: add weight support
	sched/fair: Don't free p->numa_faults with concurrent readers
	sched/fair: Use RCU accessors consistently for ->numa_group
	/proc/<pid>/cmdline: remove all the special cases
	/proc/<pid>/cmdline: add back the setproctitle() special case
	drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
	Fix allyesconfig output.
	ceph: hold i_ceph_lock when removing caps for freeing inode
	block, scsi: Change the preempt-only flag into a counter
	scsi: core: Avoid that a kernel warning appears during system resume
	ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
	Linux 4.19.64

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3e9055b677bd8ad9d5070307fae0bc765d444e9d
2019-08-04 09:37:11 +02:00
Jann Horn
48046e092a sched/fair: Don't free p->numa_faults with concurrent readers
commit 16d51a590a upstream.

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04 09:30:56 +02:00
Greg Kroah-Hartman
237d383597 This is the 4.19.54 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0Nx3oACgkQONu9yGCS
 aT7hJRAAzdu1EKr/VUiFJshvPL/+veH1MLdMgafW6gXcoQCQSdMuv1j3mv3axElC
 dz2e/nHXvIhMKMrhzgEvVwV8m8eKPwM4IZjTJ8ji8st9uy0R2UIdbPUHrLtRAQzI
 bK2BIk6697B9/9z2s8nDXhKex9EtZ2vj8DeQLT1AgQKMM0+abPA2YR9+cp9HEjQR
 Wa5NjajwWgpty1VVk+Q6GMD+GBnEILyqxtVGv30z/prWoVadFAXJEZrWsaXotUvh
 ySc2CS9Iu/muiIegTSStnNXaWaawZblA4JdDcdRJ235rPK8sbACYtDgZispYopi4
 y/9kUtuZzWEv61aFZkXd13jKZ8pJsrLZLoc+yDqew0IA6M2Hz7d7Pq/UWQ0m4qnt
 rCcU1vTFooSp/IHOToXN0oQQrTqD2oK5zO4V9S5McwQniuO/RqUPJfX/sKjtrvNT
 SI/yLVKMXImAwVXxNEfO4bE+T9F/QYptOHdpfqJkPvpKLzjxnPBPswG8poIKwQzJ
 w3p1AoQ1ISuW6b94a/nI5nSC28gVslVg+oTjRjF7kfIcOtvglX79XPvfdM5bB2DC
 qD51A8veEGCtAu57/tSyisGOomqTqqqbaiYXwuhgTI1NSszbqKDLNvjsw1GKj0rK
 mSmDzVMgQhxaHOagOeDqLYlj1zgTamx5R6+dretf8AOXyEE2RSg=
 =+MJy
 -----END PGP SIGNATURE-----

Merge 4.19.54 into android-4.19

Changes in 4.19.54
	ax25: fix inconsistent lock state in ax25_destroy_timer
	be2net: Fix number of Rx queues used for flow hashing
	hv_netvsc: Set probe mode to sync
	ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero
	lapb: fixed leak of control-blocks.
	neigh: fix use-after-free read in pneigh_get_next
	net: dsa: rtl8366: Fix up VLAN filtering
	net: openvswitch: do not free vport if register_netdevice() is failed.
	nfc: Ensure presence of required attributes in the deactivate_target handler
	sctp: Free cookie before we memdup a new one
	sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg
	tipc: purge deferredq list for each grp member in tipc_group_delete
	vsock/virtio: set SOCK_DONE on peer shutdown
	net/mlx5: Avoid reloading already removed devices
	net: mvpp2: prs: Fix parser range for VID filtering
	net: mvpp2: prs: Use the correct helpers when removing all VID filters
	Staging: vc04_services: Fix a couple error codes
	perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
	netfilter: nf_queue: fix reinject verdict handling
	ipvs: Fix use-after-free in ip_vs_in
	selftests: netfilter: missing error check when setting up veth interface
	clk: ti: clkctrl: Fix clkdm_clk handling
	powerpc/powernv: Return for invalid IMC domain
	usb: xhci: Fix a potential null pointer dereference in xhci_debugfs_create_endpoint()
	mISDN: make sure device name is NUL terminated
	x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
	perf/ring_buffer: Fix exposing a temporarily decreased data_head
	perf/ring_buffer: Add ordering to rb->nest increment
	perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data
	gpio: fix gpio-adp5588 build errors
	net: stmmac: update rx tail pointer register to fix rx dma hang issue.
	net: tulip: de4x5: Drop redundant MODULE_DEVICE_TABLE()
	ACPI/PCI: PM: Add missing wakeup.flags.valid checks
	drm/etnaviv: lock MMU while dumping core
	net: aquantia: tx clean budget logic error
	net: aquantia: fix LRO with FCS error
	i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr
	ALSA: hda - Force polling mode on CNL for fixing codec communication
	configfs: Fix use-after-free when accessing sd->s_dentry
	perf data: Fix 'strncat may truncate' build failure with recent gcc
	perf namespace: Protect reading thread's namespace
	perf record: Fix s390 missing module symbol and warning for non-root users
	ia64: fix build errors by exporting paddr_to_nid()
	xen/pvcalls: Remove set but not used variable
	xenbus: Avoid deadlock during suspend due to open transactions
	KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
	KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
	arm64: fix syscall_fn_t type
	arm64: use the correct function type in SYSCALL_DEFINE0
	arm64: use the correct function type for __arm64_sys_ni_syscall
	net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs
	net: phylink: ensure consistent phy interface mode
	net: phy: dp83867: Set up RGMII TX delay
	scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()
	scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask
	scsi: scsi_dh_alua: Fix possible null-ptr-deref
	scsi: libsas: delete sas port if expander discover failed
	mlxsw: spectrum: Prevent force of 56G
	ocfs2: fix error path kobject memory leak
	coredump: fix race condition between collapse_huge_page() and core dumping
	Abort file_remove_privs() for non-reg. files
	Linux 4.19.54

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-22 08:47:30 +02:00
Andrea Arcangeli
465ce9a50f coredump: fix race condition between collapse_huge_page() and core dumping
commit 59ea6d06cf upstream.

When fixing the race conditions between the coredump and the mmap_sem
holders outside the context of the process, we focused on
mmget_not_zero()/get_task_mm() callers in 04f5866e41 ("coredump: fix
race condition between mmget_not_zero()/get_task_mm() and core
dumping"), but those aren't the only cases where the mmap_sem can be
taken outside of the context of the process as Michal Hocko noticed
while backporting that commit to older -stable kernels.

If mmgrab() is called in the context of the process, but then the
mm_count reference is transferred outside the context of the process,
that can also be a problem if the mmap_sem has to be taken for writing
through that mm_count reference.

khugepaged registration calls mmgrab() in the context of the process,
but the mmap_sem for writing is taken later in the context of the
khugepaged kernel thread.

collapse_huge_page() after taking the mmap_sem for writing doesn't
modify any vma, so it's not obvious that it could cause a problem to the
coredump, but it happens to modify the pmd in a way that breaks an
invariant that pmd_trans_huge_lock() relies upon.  collapse_huge_page()
needs the mmap_sem for writing just to block concurrent page faults that
call pmd_trans_huge_lock().

Specifically the invariant that "!pmd_trans_huge()" cannot become a
"pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.

The coredump will call __get_user_pages() without mmap_sem for reading,
which eventually can invoke a lockless page fault which will need a
functional pmd_trans_huge_lock().

So collapse_huge_page() needs to use mmget_still_valid() to check it's
not running concurrently with the coredump...  as long as the coredump
can invoke page faults without holding the mmap_sem for reading.

This has "Fixes: khugepaged" to facilitate backporting, but in my view
it's more a bug in the coredump code that will eventually have to be
rewritten to stop invoking page faults without the mmap_sem for reading.
So the long term plan is still to drop all mmget_still_valid().

Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
Fixes: ba76149f47 ("thp: khugepaged")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-22 08:15:21 +02:00
Greg Kroah-Hartman
dda1dd3ba3 This is the 4.19.39 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzNPTYACgkQONu9yGCS
 aT5+LA//TEZMvKwfq8ASsp17ncB9jNrV9G5gwwTv4Aa+Zv16LN5pLhlTwqJE1JCc
 wJzisFxCGNil0LTZYlpTwGnPvjNB4MPqyMEU5B7GkLsbRCjCJCZUpCEhrsEhR6D+
 finLDnSAHh8+c/kvETIazvzTw9BP8YUDe7j933L54pXONfAHUvYDkz57ri5oEyIE
 9g+Haot6gytgnPH1iU3A+uSjMll2naBHT1ga7F6fqU6EuhIogtEepG1Y6TWT6zfQ
 IDO4HZwMhdi75ITkv06lh/1zdQaMaGWE+R7svKfQ5UoH89eIUvRjKl8Pkdd5JPha
 1LOz5HBBTxCZdzl70UF8n0ZzlNYP71rQ1CT4bo+U4kPNWNvKZb2MxJEOxB7/JqJo
 qNrv1lBE1ERBM5s8NlTSx1P+RSs3G6kLhDIgz+uB8ALl87zDkwUA7ynP13NFsP2c
 QY7E0eianWSYJxrqhkgWXQ/ToYRLsCBARvAHMZgDveVUNe83SyK1NFhZD77weDSC
 WtirUFzKpw0ZtVjs8Ox6zh4HRtAFXsyPvnlPbtrMdaU0GtEox2Skg7SKboEZfyUa
 8nu98ZdEdzqAKKUOs2o/3vVFnq4NKmw1khpo1OEW5Ob76KdDHYGMiiGOYH1vEGY1
 OxqNyNgw8/8OT6drCrAgx7+4X+BTXdiSXORUyp9AP7WpELwMy4Q=
 =35Eg
 -----END PGP SIGNATURE-----

Merge 4.19.39 into android-4.19

Changes in 4.19.39
	selinux: use kernel linux/socket.h for genheaders and mdp
	Revert "ACPICA: Clear status of GPEs before enabling them"
	mm: make page ref count overflow check tighter and more explicit
	mm: add 'try_get_page()' helper function
	mm: prevent get_user_pages() from overflowing page refcount
	fs: prevent page refcount overflow in pipe_buf_get
	ARM: dts: bcm283x: Fix hdmi hpd gpio pull
	s390: limit brk randomization to 32MB
	net: ieee802154: fix a potential NULL pointer dereference
	ieee802154: hwsim: propagate genlmsg_reply return code
	net: stmmac: don't set own bit too early for jumbo frames
	qlcnic: Avoid potential NULL pointer dereference
	xsk: fix umem memory leak on cleanup
	staging: axis-fifo: add CONFIG_OF dependency
	staging, mt7621-pci: fix build without pci support
	netfilter: nft_set_rbtree: check for inactive element after flag mismatch
	netfilter: bridge: set skb transport_header before entering NF_INET_PRE_ROUTING
	netfilter: fix NETFILTER_XT_TARGET_TEE dependencies
	netfilter: ip6t_srh: fix NULL pointer dereferences
	s390/qeth: fix race when initializing the IP address table
	ARM: imx51: fix a leaked reference by adding missing of_node_put
	sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init()
	serial: ar933x_uart: Fix build failure with disabled console
	KVM: arm64: Reset the PMU in preemptible context
	KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory
	KVM: arm/arm64: vgic-its: Take the srcu lock when parsing the memslots
	usb: dwc3: pci: add support for Comet Lake PCH ID
	usb: gadget: net2280: Fix overrun of OUT messages
	usb: gadget: net2280: Fix net2280_dequeue()
	usb: gadget: net2272: Fix net2272_dequeue()
	ARM: dts: pfla02: increase phy reset duration
	i2c: i801: Add support for Intel Comet Lake
	net: ks8851: Dequeue RX packets explicitly
	net: ks8851: Reassert reset pin if chip ID check fails
	net: ks8851: Delay requesting IRQ until opened
	net: ks8851: Set initial carrier state to down
	staging: rtl8188eu: Fix potential NULL pointer dereference of kcalloc
	staging: rtlwifi: rtl8822b: fix to avoid potential NULL pointer dereference
	staging: rtl8712: uninitialized memory in read_bbreg_hdl()
	staging: rtlwifi: Fix potential NULL pointer dereference of kzalloc
	net: macb: Add null check for PCLK and HCLK
	net/sched: don't dereference a->goto_chain to read the chain index
	ARM: dts: imx6qdl: Fix typo in imx6qdl-icore-rqs.dtsi
	drm/tegra: hub: Fix dereference before check
	NFS: Fix a typo in nfs_init_timeout_values()
	net: xilinx: fix possible object reference leak
	net: ibm: fix possible object reference leak
	net: ethernet: ti: fix possible object reference leak
	drm: Fix drm_release() and device unplug
	gpio: aspeed: fix a potential NULL pointer dereference
	drm/meson: Fix invalid pointer in meson_drv_unbind()
	drm/meson: Uninstall IRQ handler
	ARM: davinci: fix build failure with allnoconfig
	scsi: mpt3sas: Fix kernel panic during expander reset
	scsi: aacraid: Insure we don't access PCIe space during AER/EEH
	scsi: qla4xxx: fix a potential NULL pointer dereference
	usb: usb251xb: fix to avoid potential NULL pointer dereference
	leds: trigger: netdev: fix refcnt leak on interface rename
	x86/realmode: Don't leak the trampoline kernel address
	usb: u132-hcd: fix resource leak
	ceph: fix use-after-free on symlink traversal
	scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN
	x86/mm: Don't exceed the valid physical address space
	libata: fix using DMA buffers on stack
	gpio: of: Fix of_gpiochip_add() error path
	nvme-multipath: relax ANA state check
	perf machine: Update kernel map address and re-order properly
	kconfig/[mn]conf: handle backspace (^H) key
	iommu/amd: Reserve exclusion range in iova-domain
	ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK
	leds: pca9532: fix a potential NULL pointer dereference
	leds: trigger: netdev: use memcpy in device_name_store
	Linux 4.19.39

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-04 09:28:59 +02:00
Andrei Vagin
13a6a6dd3c ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK
[ Upstream commit fcfc2aa018 ]

There are a few system calls (pselect, ppoll, etc) which replace a task
sigmask while they are running in a kernel-space

When a task calls one of these syscalls, the kernel saves a current
sigmask in task->saved_sigmask and sets a syscall sigmask.

On syscall-exit-stop, ptrace traps a task before restoring the
saved_sigmask, so PTRACE_GETSIGMASK returns the syscall sigmask and
PTRACE_SETSIGMASK does nothing, because its sigmask is replaced by
saved_sigmask, when the task returns to user-space.

This patch fixes this problem.  PTRACE_GETSIGMASK returns saved_sigmask
if it's set.  PTRACE_SETSIGMASK drops the TIF_RESTORE_SIGMASK flag.

Link: http://lkml.kernel.org/r/20181120060616.6043-1-avagin@gmail.com
Fixes: 29000caecb ("ptrace: add ability to get/set signal-blocked mask")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-04 09:20:22 +02:00
Greg Kroah-Hartman
9bf5904866 This is the 4.19.37 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzEBokACgkQONu9yGCS
 aT7G7w/8C93URGM67H7ynkCHTo8y3hkRE2rUJPckJNdS+IJKuecmOphak4tF0h07
 qPWDPya70Q1S0cNu661TuVAGrhmE5jBx8/xfZaAOeaaU0xtZive+TfSHdAQQaHct
 tDk32O85N1aZ49rDEz9ibr7CGLVFDZtyhxV5gFMYQpjbqA7MzJC61zQg1jHyPSCz
 sKjQzW+uXMuSLru8jXHMvp41K5sFFp5gYdQbAVKlWtt79qPxWdxZPJbLbM0LBbtz
 XHt9E45Ink3ALF9P6tZ4e6gi4zzlNbh9yR92+X5NK5/8AP57yWba4W9JHWIfMBpC
 yyDYTOEAzdxqa2Jrgwr4WTdKH6U7FbQZFmWfTBB4VotbHLBWkVXj0OnF10qxP9eQ
 p5wGDTJAlWezhX1BTCfYroglDsvqhj+gHfwHzDRF1Del1dRgydRMQc0qLD1d9tul
 ovzwOkx1xyJrM2wq05I5gc0FoVyOL6/KCwqMrpVfKa3WKY7Uttjgf56bMqdIIkns
 i/6opzF+wtvwlLlCoXgYPXdm6kbWdgvS+skVHfWcHmZFMuGrFGGzJNwzXb7qnVjK
 T0hD1OestsfTyD/amnDNYkNeCkoOZqtHAi+xYOQR4kGY5cxP1lQJf85MgAy6RZSY
 h+rjys76Qf6+hTCtrowLr8SgksX4ACWxm+UarfAiiNnnDXwGfu8=
 =SrFV
 -----END PGP SIGNATURE-----

Merge 4.19.37 into android-4.19

Changes in 4.19.37
	bonding: fix event handling for stacked bonds
	failover: allow name change on IFF_UP slave interfaces
	net: atm: Fix potential Spectre v1 vulnerabilities
	net: bridge: fix per-port af_packet sockets
	net: bridge: multicast: use rcu to access port list from br_multicast_start_querier
	net: Fix missing meta data in skb with vlan packet
	net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv
	tcp: tcp_grow_window() needs to respect tcp_space()
	team: set slave to promisc if team is already in promisc mode
	tipc: missing entries in name table of publications
	vhost: reject zero size iova range
	ipv4: recompile ip options in ipv4_link_failure
	ipv4: ensure rcu_read_lock() in ipv4_link_failure()
	net: thunderx: raise XDP MTU to 1508
	net: thunderx: don't allow jumbo frames with XDP
	net/mlx5: FPGA, tls, hold rcu read lock a bit longer
	net/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded()
	net/mlx5: FPGA, tls, idr remove on flow delete
	route: Avoid crash from dereferencing NULL rt->from
	sch_cake: Use tc_skb_protocol() helper for getting packet protocol
	sch_cake: Make sure we can write the IP header before changing DSCP bits
	nfp: flower: replace CFI with vlan present
	nfp: flower: remove vlan CFI bit from push vlan action
	sch_cake: Simplify logic in cake_select_tin()
	net: IP defrag: encapsulate rbtree defrag code into callable functions
	net: IP6 defrag: use rbtrees for IPv6 defrag
	net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c
	CIFS: keep FileInfo handle live during oplock break
	cifs: Fix use-after-free in SMB2_write
	cifs: Fix use-after-free in SMB2_read
	cifs: fix handle leak in smb2_query_symlink()
	KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
	KVM: x86: svm: make sure NMI is injected after nmi_singlestep
	Staging: iio: meter: fixed typo
	staging: iio: ad7192: Fix ad7193 channel address
	iio: gyro: mpu3050: fix chip ID reading
	iio/gyro/bmg160: Use millidegrees for temperature scale
	iio:chemical:bme680: Fix, report temperature in millidegrees
	iio:chemical:bme680: Fix SPI read interface
	iio: cros_ec: Fix the maths for gyro scale calculation
	iio: ad_sigma_delta: select channel when reading register
	iio: dac: mcp4725: add missing powerdown bits in store eeprom
	iio: Fix scan mask selection
	iio: adc: at91: disable adc channel interrupt in timeout case
	iio: core: fix a possible circular locking dependency
	io: accel: kxcjk1013: restore the range after resume.
	staging: most: core: use device description as name
	staging: comedi: vmk80xx: Fix use of uninitialized semaphore
	staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf
	staging: comedi: ni_usb6501: Fix use of uninitialized mutex
	staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf
	ALSA: hda/realtek - add two more pin configuration sets to quirk table
	ALSA: core: Fix card races between register and disconnect
	Input: elan_i2c - add hardware ID for multiple Lenovo laptops
	serial: sh-sci: Fix HSCIF RX sampling point adjustment
	serial: sh-sci: Fix HSCIF RX sampling point calculation
	vt: fix cursor when clearing the screen
	scsi: core: set result when the command cannot be dispatched
	Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO"
	Revert "svm: Fix AVIC incomplete IPI emulation"
	coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
	ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier
	crypto: x86/poly1305 - fix overflow during partial reduction
	drm/ttm: fix out-of-bounds read in ttm_put_pages() v2
	arm64: futex: Restore oldval initialization to work around buggy compilers
	x86/kprobes: Verify stack frame on kretprobe
	kprobes: Mark ftrace mcount handler functions nokprobe
	kprobes: Fix error check when reusing optimized probes
	rt2x00: do not increment sequence number while re-transmitting
	mac80211: do not call driver wake_tx_queue op during reconfig
	drm/amdgpu/gmc9: fix VM_L2_CNTL3 programming
	perf/x86/amd: Add event map for AMD Family 17h
	x86/cpu/bugs: Use __initconst for 'const' init data
	perf/x86: Fix incorrect PEBS_REGS
	x86/speculation: Prevent deadlock on ssb_state::lock
	timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze()
	nfit/ars: Remove ars_start_flags
	nfit/ars: Introduce scrub_flags
	nfit/ars: Allow root to busy-poll the ARS state machine
	nfit/ars: Avoid stale ARS results
	mmc: sdhci: Fix data command CRC error handling
	mmc: sdhci: Rename SDHCI_ACMD12_ERR and SDHCI_INT_ACMD12ERR
	mmc: sdhci: Handle auto-command errors
	modpost: file2alias: go back to simple devtable lookup
	modpost: file2alias: check prototype of handler
	tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete
	tpm: Fix the type of the return value in calc_tpm2_event_size()
	Revert "kbuild: use -Oz instead of -Os when using clang"
	sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
	device_cgroup: fix RCU imbalance in error case
	mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
	ALSA: info: Fix racy addition/deletion of nodes
	percpu: stop printing kernel addresses
	tools include: Adopt linux/bits.h
	ASoC: rockchip: add missing INTERLEAVED PCM attribute
	i2c-hid: properly terminate i2c_hid_dmi_desc_override_table[] array
	Revert "locking/lockdep: Add debug_locks check in __lock_downgrade()"
	kernel/sysctl.c: fix out-of-bounds access when setting file-max
	Linux 4.19.37

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-30 12:53:00 +02:00
Andrea Arcangeli
6ff17bc593 coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
commit 04f5866e41 upstream.

The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3b4"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27 09:36:37 +02:00
Greg Kroah-Hartman
d885da678e This is the 4.19.34 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlynu40ACgkQONu9yGCS
 aT5X6g//Wkfm/+qSZ0GhLDQkPniiH1QkvzhOmVrrxu+KB0qsiwsEl8Srw33ZVkJK
 LT8+IPGiG9jEGu9dj+BYXTIfy9ZvfSsEL2N6GhYwDSXP0fok2rUaHbZvv1IB2g4W
 afhGdNwNAUCJ/j1UrUsi+SAFJ+xWbVxFpGstd0cqM9IbKdEV7RIukvuKckHiKOKR
 qI8FxC+G2PAr+BtnETfk5/suPDJ7B3ZicDoMhiWJGxJ6dfFTVmkSmasSoPDaMiHm
 4S3hN2lu+WTeRpRPPB17Dlk4MmIp0k+bGYBKAlaxAMCc/RZxvbT2pRYaMQbId2/L
 mNUfSnOQFGEAhlAPfb7wdbObphnyT34GhlkWfZBTrnhPO0/FomLOvU6xVdcNuakX
 Tv2JKfDzb+2ttcMZ+0T84Ru9RztoswFATSw8uFMVxW8oTS6MVWnHu96Kxfl7QO3J
 PdlIGcyqxSuWNE8OX1QVtdSruGZfwUDNs94S4nQJtkB8BViRwhGJlqaXuy4d9Wp6
 fGlI2W6qhjyosi2wBSMTjh/ytk/jq0vfs+z2XjR2gAYssvB/SOLR/AlSVguWsDnf
 WaoFBkXvCbuPvPlo0TrLpl5RW5WlOtLXHE3Vr3dKp458wLwpf/OZBGoZiknp7DrF
 PzBZs2ie5tmyqTxbAygl7WkbQPJ682pd5R4nf5CY+zvUaOMZv1g=
 =Iuup
 -----END PGP SIGNATURE-----

Merge 4.19.34 into android-4.19

Changes in 4.19.34
	arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
	ext4: cleanup bh release code in ext4_ind_remove_space()
	tty/serial: atmel: Add is_half_duplex helper
	tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
	CIFS: fix POSIX lock leak and invalid ptr deref
	h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
	f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
	f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
	tracing: kdb: Fix ftdump to not sleep
	net/mlx5: Avoid panic when setting vport rate
	net/mlx5: Avoid panic when setting vport mac, getting vport config
	gpio: gpio-omap: fix level interrupt idling
	include/linux/relay.h: fix percpu annotation in struct rchan
	sysctl: handle overflow for file-max
	net: stmmac: Avoid sometimes uninitialized Clang warnings
	enic: fix build warning without CONFIG_CPUMASK_OFFSTACK
	libbpf: force fixdep compilation at the start of the build
	scsi: hisi_sas: Set PHY linkrate when disconnected
	scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO
	iio: adc: fix warning in Qualcomm PM8xxx HK/XOADC driver
	x86/hyperv: Fix kernel panic when kexec on HyperV
	perf c2c: Fix c2c report for empty numa node
	mm/sparse: fix a bad comparison
	mm/cma.c: cma_declare_contiguous: correct err handling
	mm/page_ext.c: fix an imbalance with kmemleak
	mm, swap: bounds check swap_info array accesses to avoid NULL derefs
	mm,oom: don't kill global init via memory.oom.group
	memcg: killed threads should not invoke memcg OOM killer
	mm, mempolicy: fix uninit memory access
	mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
	mm/slab.c: kmemleak no scan alien caches
	ocfs2: fix a panic problem caused by o2cb_ctl
	f2fs: do not use mutex lock in atomic context
	fs/file.c: initialize init_files.resize_wait
	page_poison: play nicely with KASAN
	cifs: use correct format characters
	dm thin: add sanity checks to thin-pool and external snapshot creation
	f2fs: fix to check inline_xattr_size boundary correctly
	cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED
	cifs: Fix NULL pointer dereference of devname
	netfilter: nf_tables: check the result of dereferencing base_chain->stats
	netfilter: conntrack: tcp: only close if RST matches exact sequence
	jbd2: fix invalid descriptor block checksum
	fs: fix guard_bio_eod to check for real EOD errors
	tools lib traceevent: Fix buffer overflow in arg_eval
	PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()
	wil6210: check null pointer in _wil_cfg80211_merge_extra_ies
	mt76: fix a leaked reference by adding a missing of_node_put
	crypto: crypto4xx - add missing of_node_put after of_device_is_available
	crypto: cavium/zip - fix collision with generic cra_driver_name
	usb: chipidea: Grab the (legacy) USB PHY by phandle first
	powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables
	scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
	kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
	powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc
	coresight: etm4x: Add support to enable ETMv4.2
	serial: 8250_pxa: honor the port number from devicetree
	ARM: 8840/1: use a raw_spinlock_t in unwind
	iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables
	powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback
	btrfs: qgroup: Make qgroup async transaction commit more aggressive
	mmc: omap: fix the maximum timeout setting
	net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat
	e1000e: Fix -Wformat-truncation warnings
	mlxsw: spectrum: Avoid -Wformat-truncation warnings
	platform/x86: ideapad-laptop: Fix no_hw_rfkill_list for Lenovo RESCUER R720-15IKBN
	platform/mellanox: mlxreg-hotplug: Fix KASAN warning
	loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
	IB/mlx4: Increase the timeout for CM cache
	clk: fractional-divider: check parent rate only if flag is set
	perf annotate: Fix getting source line failure
	ASoC: qcom: Fix of-node refcount unbalance in qcom_snd_parse_of()
	cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies
	efi: cper: Fix possible out-of-bounds access
	s390/ism: ignore some errors during deregistration
	scsi: megaraid_sas: return error when create DMA pool failed
	scsi: fcoe: make use of fip_mode enum complete
	drm/amd/display: Clear stream->mode_changed after commit
	perf test: Fix failure of 'evsel-tp-sched' test on s390
	mwifiex: don't advertise IBSS features without FW support
	perf report: Don't shadow inlined symbol with different addr range
	SoC: imx-sgtl5000: add missing put_device()
	media: ov7740: fix runtime pm initialization
	media: sh_veu: Correct return type for mem2mem buffer helpers
	media: s5p-jpeg: Correct return type for mem2mem buffer helpers
	media: rockchip/rga: Correct return type for mem2mem buffer helpers
	media: s5p-g2d: Correct return type for mem2mem buffer helpers
	media: mx2_emmaprp: Correct return type for mem2mem buffer helpers
	media: mtk-jpeg: Correct return type for mem2mem buffer helpers
	mt76: usb: do not run mt76u_queues_deinit twice
	xen/gntdev: Do not destroy context while dma-bufs are in use
	vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
	HID: intel-ish-hid: avoid binding wrong ishtp_cl_device
	cgroup, rstat: Don't flush subtree root unless necessary
	jbd2: fix race when writing superblock
	leds: lp55xx: fix null deref on firmware load failure
	perf report: Add s390 diagnosic sampling descriptor size
	iwlwifi: pcie: fix emergency path
	ACPI / video: Refactor and fix dmi_is_desktop()
	selftests: skip seccomp get_metadata test if not real root
	kprobes: Prohibit probing on bsearch()
	kprobes: Prohibit probing on RCU debug routine
	netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm
	ARM: 8833/1: Ensure that NEON code always compiles with Clang
	ARM: dts: meson8b: fix the Ethernet data line signals in eth_rgmii_pins
	ALSA: PCM: check if ops are defined before suspending PCM
	ath10k: fix shadow register implementation for WCN3990
	usb: f_fs: Avoid crash due to out-of-scope stack ptr access
	sched/topology: Fix percpu data types in struct sd_data & struct s_data
	bcache: fix input overflow to cache set sysfs file io_error_halflife
	bcache: fix input overflow to sequential_cutoff
	bcache: fix potential div-zero error of writeback_rate_i_term_inverse
	bcache: improve sysfs_strtoul_clamp()
	genirq: Avoid summation loops for /proc/stat
	net: marvell: mvpp2: fix stuck in-band SGMII negotiation
	iw_cxgb4: fix srqidx leak during connection abort
	net: phy: consider latched link-down status in polling mode
	fbdev: fbmem: fix memory access if logo is bigger than the screen
	cdrom: Fix race condition in cdrom_sysctl_register
	drm: rcar-du: add missing of_node_put
	drm/amd/display: Don't re-program planes for DPMS changes
	drm/amd/display: Disconnect mpcc when changing tg
	perf/aux: Make perf_event accessible to setup_aux()
	e1000e: fix cyclic resets at link up with active tx
	e1000e: Exclude device from suspend direct complete optimization
	platform/x86: intel_pmc_core: Fix PCH IP sts reading
	i2c: of: Try to find an I2C adapter matching the parent
	staging: spi: mt7621: Add return code check on device_reset()
	iwlwifi: mvm: fix RFH config command with >=10 CPUs
	ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe
	sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
	efi/memattr: Don't bail on zero VA if it equals the region's PA
	sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
	drm/vkms: Bugfix extra vblank frame
	ARM: dts: lpc32xx: Remove leading 0x and 0s from bindings notation
	efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted
	soc: qcom: gsbi: Fix error handling in gsbi_probe()
	mt7601u: bump supported EEPROM version
	ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care of
	ARM: avoid Cortex-A9 livelock on tight dmb loops
	block, bfq: fix in-service-queue check for queue merging
	bpf: fix missing prototype warnings
	selftests/bpf: skip verifier tests for unsupported program types
	powerpc/64s: Clear on-stack exception marker upon exception return
	cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting
	backlight: pwm_bl: Use gpiod_get_value_cansleep() to get initial state
	tty: increase the default flip buffer limit to 2*640K
	powerpc/pseries: Perform full re-add of CPU for topology update post-migration
	drm/amd/display: Enable vblank interrupt during CRC capture
	ALSA: dice: add support for Solid State Logic Duende Classic/Mini
	usb: dwc3: gadget: Fix OTG events when gadget driver isn't loaded
	platform/x86: intel-hid: Missing power button release on some Dell models
	perf script python: Use PyBytes for attr in trace-event-python
	perf script python: Add trace_context extension module to sys.modules
	media: mt9m111: set initial frame size other than 0x0
	hwrng: virtio - Avoid repeated init of completion
	soc/tegra: fuse: Fix illegal free of IO base address
	HID: intel-ish: ipc: handle PIMR before ish_wakeup also clear PISR busy_clear bit
	f2fs: UBSAN: set boolean value iostat_enable correctly
	hpet: Fix missing '=' character in the __setup() code of hpet_mmap_enable
	cpu/hotplug: Mute hotplug lockdep during init
	dmaengine: imx-dma: fix warning comparison of distinct pointer types
	dmaengine: qcom_hidma: assign channel cookie correctly
	dmaengine: qcom_hidma: initialize tx flags in hidma_prep_dma_*
	netfilter: physdev: relax br_netfilter dependency
	media: rcar-vin: Allow independent VIN link enablement
	media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration
	regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting
	pinctrl: meson: meson8b: add the eth_rxd2 and eth_rxd3 pins
	drm: Auto-set allow_fb_modifiers when given modifiers at plane init
	drm/nouveau: Stop using drm_crtc_force_disable
	x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
	selinux: do not override context on context mounts
	brcmfmac: Use firmware_request_nowarn for the clm_blob
	wlcore: Fix memory leak in case wl12xx_fetch_firmware failure
	x86/build: Mark per-CPU symbols as absolute explicitly for LLD
	drm/fb-helper: fix leaks in error path of drm_fb_helper_fbdev_setup
	clk: meson: clean-up clock registration
	clk: rockchip: fix frac settings of GPLL clock for rk3328
	dmaengine: tegra: avoid overflow of byte tracking
	Input: soc_button_array - fix mapping of the 5th GPIO in a PNP0C40 device
	drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers
	net: stmmac: Avoid one more sometimes uninitialized Clang warning
	ACPI / video: Extend chassis-type detection with a "Lunch Box" check
	bcache: fix potential div-zero error of writeback_rate_p_term_inverse
	kprobes/x86: Blacklist non-attachable interrupt functions
	Linux 4.19.34

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-05 22:43:09 +02:00
Luc Van Oostenryck
845d4849b6 sched/topology: Fix percpu data types in struct sd_data & struct s_data
[ Upstream commit 99687cdbb3 ]

The percpu members of struct sd_data and s_data are declared as:

	struct ... ** __percpu member;

So their type is:

	__percpu pointer to pointer to struct ...

But looking at how they're used, their type should be:

	pointer to __percpu pointer to struct ...

and they should thus be declared as:

	struct ... * __percpu *member;

So fix the placement of '__percpu' in the definition of these
structures.

This addresses a bunch of Sparse's warnings like:

	warning: incorrect type in initializer (different address spaces)
	  expected void const [noderef] <asn:3> *__vpp_verify
	  got struct sched_domain **

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190118144936.79158-1-luc.vanoostenryck@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:33:09 +02:00
Johannes Weiner
4c9c09affa UPSTREAM: sched: loadavg: make calc_load_n() public
It's going to be used in a later patch. Keep the churn separate.

Link: http://lkml.kernel.org/r/20180828172258.3185-6-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 5c54f5b9ed)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I50e0cb0dbf20ced329a484493f82ff69ca1ae97a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-21 16:25:27 -07:00
Johannes Weiner
2ba18b41d3 BACKPORT: sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
There are several definitions of those functions/macros in places that
mess with fixed-point load averages.  Provide an official version.

[akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 8508cf3ffa)

Conflicts:
        block/blk-iolatency.c

(1. manual merge to replace stat->rqs.mean with stat.mean)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I716b4874491cff75a2355c6d95c64cf02d05e7ee
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-21 16:25:26 -07:00
Greg Kroah-Hartman
c28f73fe42 This is the 4.19.20 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxbC5gACgkQONu9yGCS
 aT4DYQ//Uqm/Q63KQuExgd7W+61FoP4NFHlYXZ31B5Rkydryyk2K5P2ONSdVd5n9
 k3wjzRrxvlPvjOwbh9PHv+pLkGxBDqpT1X8IAXPe36bYUkvXoH71BE4YSRPRUJAf
 sdzw/vs7WE/Kx41iT3SXiQih8ok0y3LoACBKmUsEXoLI1cJZCUnnSFpP++QNe1Iz
 B/y04BigL8R7OWR/jow6OPWe9uXOI8iEe9QKVX26g4oaakzly4vkp6OwROSwM31q
 0wut8jF/AtDcZpZXjJLjDCj10k5DRN8jwGcLD7iZeIKqexOabjUrsvfIHfbpUtXr
 e7pJw2aUM8BFb8Ba2lsB7gkqvdHQohqVKQE4Qy59aPyesm2G5miH4gAbncoixjCa
 u3eQV5ACpFLksUFR4RAMKq+10k7swsutyyJr5vG4qdbRpcTCNJirEwAGGqgI6IEP
 SDqtw6u8gMP8+SicwA9p71Wwntcq9RR6fx0gX/3wi2DQp6F8Txem00SqaciE7uQ1
 uIOUrhcpWzIq4m58SGhgTSQcBkm5qBD5S154/xRKIo0mvME+NwBub/x3fIsixN/u
 AzWQmQPXBajHbYXbKGC7t2jNHkU5d9FedZ4iDmJk/+ZZsWyFByY1bH1cg4Qnq89e
 tDxL114YmSujbZD/mFlbGWcqdmGNT355BmyetKDx6w0rNiU/RBU=
 =oprJ
 -----END PGP SIGNATURE-----

Merge 4.19.20 into android-4.19

Changes in 4.19.20
	Fix "net: ipv4: do not handle duplicate fragments as overlapping"
	drm/msm/gpu: fix building without debugfs
	ipv6: Consider sk_bound_dev_if when binding a socket to an address
	ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
	ipvlan, l3mdev: fix broken l3s mode wrt local routes
	l2tp: copy 4 more bytes to linear part if necessary
	l2tp: fix reading optional fields of L2TPv3
	net: ip_gre: always reports o_key to userspace
	net: ip_gre: use erspan key field for tunnel lookup
	net/mlx4_core: Add masking for a few queries on HCA caps
	netrom: switch to sock timer API
	net/rose: fix NULL ax25_cb kernel panic
	net: set default network namespace in init_dummy_netdev()
	ravb: expand rx descriptor data to accommodate hw checksum
	sctp: improve the events for sctp stream reset
	tun: move the call to tun_set_real_num_queues
	ucc_geth: Reset BQL queue when stopping device
	vhost: fix OOB in get_rx_bufs()
	net: ip6_gre: always reports o_key to userspace
	sctp: improve the events for sctp stream adding
	net/mlx5e: Allow MAC invalidation while spoofchk is ON
	ip6mr: Fix notifiers call on mroute_clean_tables()
	Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
	sctp: set chunk transport correctly when it's a new asoc
	sctp: set flow sport from saddr only when it's 0
	virtio_net: Don't enable NAPI when interface is down
	virtio_net: Don't call free_old_xmit_skbs for xdp_frames
	virtio_net: Fix not restoring real_num_rx_queues
	virtio_net: Fix out of bounds access of sq
	virtio_net: Don't process redirected XDP frames when XDP is disabled
	virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs
	virtio_net: Differentiate sk_buff and xdp_frame on freeing
	CIFS: Do not count -ENODATA as failure for query directory
	CIFS: Fix trace command logging for SMB2 reads and writes
	CIFS: Do not consider -ENODATA as stat failure for reads
	fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
	iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
	selftests/seccomp: Enhance per-arch ptrace syscall skip tests
	NFS: Fix up return value on fatal errors in nfs_page_async_flush()
	ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
	arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
	arm64: Do not issue IPIs for user executable ptes
	arm64: hyp-stub: Forbid kprobing of the hyp-stub
	arm64: hibernate: Clean the __hyp_text to PoC after resume
	gpio: altera-a10sr: Set proper output level for direction_output
	gpiolib: fix line event timestamps for nested irqs
	gpio: pcf857x: Fix interrupts on multiple instances
	gpio: sprd: Fix the incorrect data register
	gpio: sprd: Fix incorrect irq type setting for the async EIC
	gfs2: Revert "Fix loop in gfs2_rbm_find"
	mmc: bcm2835: Fix DMA channel leak on probe error
	mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay
	ALSA: usb-audio: Add Opus #3 to quirks for native DSD support
	ALSA: hda/realtek - Fixed hp_pin no value
	IB/hfi1: Remove overly conservative VM_EXEC flag check
	platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
	platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
	mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
	Btrfs: fix deadlock when allocating tree block during leaf/node split
	btrfs: On error always free subvol_name in btrfs_mount
	kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
	mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
	oom, oom_reaper: do not enqueue same task twice
	mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
	mm, oom: fix use-after-free in oom_kill_process
	mm: hwpoison: use do_send_sig_info() instead of force_sig()
	mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
	of: Convert to using %pOFn instead of device_node.name
	of: overlay: add tests to validate kfrees from overlay removal
	of: overlay: add missing of_node_get() in __of_attach_node_sysfs
	of: overlay: use prop add changeset entry for property in new nodes
	of: overlay: do not duplicate properties from overlay for new nodes
	md/raid5: fix 'out of memory' during raid cache recovery
	cifs: Always resolve hostname before reconnecting
	Linux 4.19.20

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-07 08:40:17 +01:00
Tetsuo Handa
7e70ddc332 oom, oom_reaper: do not enqueue same task twice
commit 9bcdeb51bd upstream.

Arkadiusz reported that enabling memcg's group oom killing causes
strange memcg statistics where there is no task in a memcg despite the
number of tasks in that memcg is not 0.  It turned out that there is a
bug in wake_oom_reaper() which allows enqueuing same task twice which
makes impossible to decrease the number of tasks in that memcg due to a
refcount leak.

This bug existed since the OOM reaper became invokable from
task_will_free_mem(current) path in out_of_memory() in Linux 4.7,

  T1@P1     |T2@P1     |T3@P1     |OOM reaper
  ----------+----------+----------+------------
                                   # Processing an OOM victim in a different memcg domain.
                        try_charge()
                          mem_cgroup_out_of_memory()
                            mutex_lock(&oom_lock)
             try_charge()
               mem_cgroup_out_of_memory()
                 mutex_lock(&oom_lock)
  try_charge()
    mem_cgroup_out_of_memory()
      mutex_lock(&oom_lock)
                            out_of_memory()
                              oom_kill_process(P1)
                                do_send_sig_info(SIGKILL, @P1)
                                mark_oom_victim(T1@P1)
                                wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
                            mutex_unlock(&oom_lock)
                 out_of_memory()
                   mark_oom_victim(T2@P1)
                   wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
                 mutex_unlock(&oom_lock)
      out_of_memory()
        mark_oom_victim(T1@P1)
        wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
      mutex_unlock(&oom_lock)
                                   # Completed processing an OOM victim in a different memcg domain.
                                   spin_lock(&oom_reaper_lock)
                                   # T1P1 is dequeued.
                                   spin_unlock(&oom_reaper_lock)

but memcg's group oom killing made it easier to trigger this bug by
calling wake_oom_reaper() on the same task from one out_of_memory()
request.

Fix this bug using an approach used by commit 855b018325 ("oom,
oom_reaper: disable oom_reaper for oom_kill_allocating_task").  As a
side effect of this patch, this patch also avoids enqueuing multiple
threads sharing memory via task_will_free_mem(current) path.

Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp
Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp
Fixes: af8e15cc85 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: Jay Kamat <jgkamat@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:30:14 +01:00
Quentin Perret
3ed3d89a25 Revert "FROMLIST: sched: Introduce a sysctl for Energy Aware Scheduling"
This reverts commit b78eec5fb7. It has not
been accepted upstream and is of no particular interest in Android since
EAS is always enabled anyway.

Bug: 120440300
Change-Id: I7d1f2ad206c05041d386fc99f18e29c4fa56cbc8
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2019-01-04 09:17:55 +00:00
Quentin Perret
81dd278ee4 ANDROID: sched: Align EAS with upstream
The core EAS patches have now been accepted upstream. The patches used
in Android are based on a slightly earlier version of the series. In
order to reduce the delta with mainline and ease backports, align the
EAS code paths with their upstream version.

This basically applies the output of git range-diff on the appropriate
commits, and fixes a conflict in schedutil regarding the integration of
schedtune.

Bug: 120440300
Change-Id: I208ebeb4207e3f4f4bbb5103c606b293a464c20f
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2019-01-04 09:17:54 +00:00
Greg Kroah-Hartman
c454ec1e21 This is the 4.19.7 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwIG48ACgkQONu9yGCS
 aT7g6Q//RkJ8ZWaRkykcCGaWIvwI6QF1tmKalIEWmToPdndDuQdUDGzWVwfE9G7P
 yLcnp3GMlXo4F82BBwG8lFSAm9zaeqaLabnJnXbCc5mZ3xi/2aNqIGHzBY1isNZl
 0fTzzcelnAKzjp0Aa/egRLOeraSLgVt/Cp7Ha3FXMP6RNxUMzs1pbQ2IFZ3m+P4G
 CAD3Iye6geOaZTu/kXiiooUEUGFQFbV4c3AZ4VW7dZDdrG+ekwtF4YHtkEPseWJQ
 Ugtrbr6S0IxYQ91o1Pk77kg4uwUFYo12jrk8Ni4gaPZE6mQCa08tr2Alg2oZkJGw
 PdXnt2ASYGRWFYK2JAuTvKzhHrTEJYhiC323dKYCAx7BgfFaqdo5F20oNzYxXFBB
 gGA3AzDDtLUD3OOO+lxrDxXMhpwXUx92WXsoJVsaSafdqIDAueq14sH19wqm0gUJ
 D1fC2dWTsFrPZKjkU8Z6rJAyO1XZED55h7v1YlqAt2ibjCeDKpjnW3yvUt8Ivpqc
 nlnmp8v/Yl2cdY55XtlgUadpknSc2jApFMwhSWetxAaqDCvha2dLQ28YMyPRJzat
 ZHOkizM/VUntXvlUzFvVTsqLQiX0sfLG6MKcUkzWehPomNKT+B8XL1wtzytv9QXb
 jOY8nRD5PiQo2p35cqdDCskBwqzEwY+WxDe7ji0yHZysBZLxoxQ=
 =OiCf
 -----END PGP SIGNATURE-----

Merge 4.19.7 into android-4.19

Changes in 4.19.7
	mm/huge_memory: rename freeze_page() to unmap_page()
	mm/huge_memory: splitting set mapping+index before unfreeze
	mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
	mm/khugepaged: collapse_shmem() stop if punched or truncated
	mm/khugepaged: fix crashes due to misaccounted holes
	mm/khugepaged: collapse_shmem() remember to clear holes
	mm/khugepaged: minor reorderings in collapse_shmem()
	mm/khugepaged: collapse_shmem() without freezing new_page
	mm/khugepaged: collapse_shmem() do not crash on Compound
	lan743x: Enable driver to work with LAN7431
	lan743x: fix return value for lan743x_tx_napi_poll
	net: don't keep lonely packets forever in the gro hash
	net: gemini: Fix copy/paste error
	net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
	packet: copy user buffers before orphan or clone
	rapidio/rionet: do not free skb before reading its length
	s390/qeth: fix length check in SNMP processing
	usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
	net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
	net: skb_scrub_packet(): Scrub offload_fwd_mark
	virtio-net: disable guest csum during XDP set
	virtio-net: fail XDP set if guest csum is negotiated
	net/dim: Update DIM start sample after each DIM iteration
	tcp: defer SACK compression after DupThresh
	net: phy: add workaround for issue where PHY driver doesn't bind to the device
	tipc: fix lockdep warning during node delete
	x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
	x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
	x86/speculation: Propagate information about RSB filling mitigation to sysfs
	x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
	x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support
	x86/retpoline: Remove minimal retpoline support
	x86/speculation: Update the TIF_SSBD comment
	x86/speculation: Clean up spectre_v2_parse_cmdline()
	x86/speculation: Remove unnecessary ret variable in cpu_show_common()
	x86/speculation: Move STIPB/IBPB string conditionals out of cpu_show_common()
	x86/speculation: Disable STIBP when enhanced IBRS is in use
	x86/speculation: Rename SSBD update functions
	x86/speculation: Reorganize speculation control MSRs update
	sched/smt: Make sched_smt_present track topology
	x86/Kconfig: Select SCHED_SMT if SMP enabled
	sched/smt: Expose sched_smt_present static key
	x86/speculation: Rework SMT state change
	x86/l1tf: Show actual SMT state
	x86/speculation: Reorder the spec_v2 code
	x86/speculation: Mark string arrays const correctly
	x86/speculataion: Mark command line parser data __initdata
	x86/speculation: Unify conditional spectre v2 print functions
	x86/speculation: Add command line control for indirect branch speculation
	x86/speculation: Prepare for per task indirect branch speculation control
	x86/process: Consolidate and simplify switch_to_xtra() code
	x86/speculation: Avoid __switch_to_xtra() calls
	x86/speculation: Prepare for conditional IBPB in switch_mm()
	ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
	x86/speculation: Split out TIF update
	x86/speculation: Prevent stale SPEC_CTRL msr content
	x86/speculation: Prepare arch_smt_update() for PRCTL mode
	x86/speculation: Add prctl() control for indirect branch speculation
	x86/speculation: Enable prctl mode for spectre_v2_user
	x86/speculation: Add seccomp Spectre v2 user space protection mode
	x86/speculation: Provide IBPB always command line options
	userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
	kvm: mmu: Fix race in emulated page table writes
	kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb
	KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset
	KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall
	KVM: LAPIC: Fix pv ipis use-before-initialization
	KVM: X86: Fix scan ioapic use-before-initialization
	KVM: VMX: re-add ple_gap module parameter
	xtensa: enable coprocessors that are being flushed
	xtensa: fix coprocessor context offset definitions
	xtensa: fix coprocessor part of ptrace_{get,set}xregs
	udf: Allow mounting volumes with incorrect identification strings
	btrfs: Always try all copies when reading extent buffers
	Btrfs: ensure path name is null terminated at btrfs_control_ioctl
	Btrfs: fix rare chances for data loss when doing a fast fsync
	Btrfs: fix race between enabling quotas and subvolume creation
	btrfs: relocation: set trans to be NULL after ending transaction
	PCI: layerscape: Fix wrong invocation of outbound window disable accessor
	PCI: dwc: Fix MSI-X EP framework address calculation bug
	PCI: Fix incorrect value returned from pcie_get_speed_cap()
	arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou.
	x86/MCE/AMD: Fix the thresholding machinery initialization order
	x86/fpu: Disable bottom halves while loading FPU registers
	perf/x86/intel: Move branch tracing setup to the Intel-specific source file
	perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts()
	perf/x86/intel: Disallow precise_ip on BTS events
	fs: fix lost error code in dio_complete
	ALSA: wss: Fix invalid snd_free_pages() at error path
	ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write
	ALSA: control: Fix race between adding and removing a user element
	ALSA: sparc: Fix invalid snd_free_pages() at error path
	ALSA: hda: Add ASRock N68C-S UCC the power_save blacklist
	ALSA: hda/realtek - Support ALC300
	ALSA: hda/realtek - fix headset mic detection for MSI MS-B171
	ALSA: hda/realtek - fix the pop noise on headphone for lenovo laptops
	ALSA: hda/realtek - Add auto-mute quirk for HP Spectre x360 laptop
	function_graph: Create function_graph_enter() to consolidate architecture code
	ARM: function_graph: Simplify with function_graph_enter()
	microblaze: function_graph: Simplify with function_graph_enter()
	x86/function_graph: Simplify with function_graph_enter()
	nds32: function_graph: Simplify with function_graph_enter()
	powerpc/function_graph: Simplify with function_graph_enter()
	sh/function_graph: Simplify with function_graph_enter()
	sparc/function_graph: Simplify with function_graph_enter()
	parisc: function_graph: Simplify with function_graph_enter()
	riscv/function_graph: Simplify with function_graph_enter()
	s390/function_graph: Simplify with function_graph_enter()
	arm64: function_graph: Simplify with function_graph_enter()
	MIPS: function_graph: Simplify with function_graph_enter()
	function_graph: Make ftrace_push_return_trace() static
	function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack
	function_graph: Have profiler use curr_ret_stack and not depth
	function_graph: Move return callback before update of curr_ret_stack
	function_graph: Reverse the order of pushing the ret_stack and the callback
	binder: fix race that allows malicious free of live buffer
	ext2: initialize opts.s_mount_opt as zero before using it
	ext2: fix potential use after free
	ASoC: intel: cht_bsw_max98090_ti: Add quirk for boards using pmc_plt_clk_0
	ASoC: pcm186x: Fix device reset-registers trigger value
	ARM: dts: rockchip: Remove @0 from the veyron memory node
	dmaengine: at_hdmac: fix memory leak in at_dma_xlate()
	dmaengine: at_hdmac: fix module unloading
	staging: most: use format specifier "%s" in snprintf
	staging: vchiq_arm: fix compat VCHIQ_IOC_AWAIT_COMPLETION
	staging: mt7621-dma: fix potentially dereferencing uninitialized 'tx_desc'
	staging: mt7621-pinctrl: fix uninitialized variable ngroups
	staging: rtl8723bs: Fix incorrect sense of ether_addr_equal
	staging: rtl8723bs: Add missing return for cfg80211_rtw_get_station
	USB: usb-storage: Add new IDs to ums-realtek
	usb: core: quirks: add RESET_RESUME quirk for Cherry G230 Stream series
	Revert "usb: dwc3: gadget: skip Set/Clear Halt when invalid"
	iio/hid-sensors: Fix IIO_CHAN_INFO_RAW returning wrong values for signed numbers
	iio:st_magn: Fix enable device after trigger
	lib/test_kmod.c: fix rmmod double free
	mm: cleancache: fix corruption on missed inode invalidation
	mm: use swp_offset as key in shmem_replace_page()
	Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl()
	misc: mic/scif: fix copy-paste error in scif_create_remote_lookup
	Linux 4.19.7

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-12-06 10:34:09 +01:00
Thomas Gleixner
f55e301ec4 x86/speculation: Rework SMT state change
commit a74cfffb03 upstream

arch_smt_update() is only called when the sysfs SMT control knob is
changed. This means that when SMT is enabled in the sysfs control knob the
system is considered to have SMT active even if all siblings are offline.

To allow finegrained control of the speculation mitigations, the actual SMT
state is more interesting than the fact that siblings could be enabled.

Rework the code, so arch_smt_update() is invoked from each individual CPU
hotplug function, and simplify the update function while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-05 19:32:02 +01:00
Thomas Gleixner
340693ee91 sched/smt: Expose sched_smt_present static key
commit 321a874a7e upstream

Make the scheduler's 'sched_smt_present' static key globaly available, so
it can be used in the x86 speculation control code.

Provide a query function and a stub for the CONFIG_SMP=n case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-05 19:32:02 +01:00
Jin Qian
3fd220679c ANDROID: taskstats: track fsync syscalls
This adds a counter to the taskstats extended accounting fields, which
tracks the number of times fsync is called, and then plumbs it through
to the uid_sys_stats driver.

Bug: 120442023
Change-Id: I6c138de5b2332eea70f57e098134d1d141247b3f
Signed-off-by: Jin Qian <jinqian@google.com>
[AmitP: Refactored changes to align with changes from upstream commit
        9a07000400 ("sched/headers: Move CONFIG_TASK_XACCT bits from <linux/sched.h> to <linux/sched/xacct.h>")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[tkjos: Needed for storaged fsync accounting ("storaged --uid" and
        "storaged --task").]
[astrachan: This is modifying a userspace interface and should probably
            be reworked]
Signed-off-by: Alistair Strachan <astrachan@google.com>
2018-12-05 09:48:12 -08:00
Brendan Jackman
88d05f9257 FROMLIST: sched/fair: Use wake_q length as a hint for wake_wide
This patch adds a parameter to select_task_rq, sibling_count_hint
allowing the caller, where it has this information, to inform the
sched_class the number of tasks that are being woken up as part of
the same event.

The wake_q mechanism is one case where this information is available.

select_task_rq_fair can then use the information to detect that it
needs to widen the search space for task placement in order to avoid
overloading the last-level cache domain's CPUs.

                               * * *

The reason I am investigating this change is the following use case
on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which
all repeatedly do X amount of work then
pthread_barrier_wait (i.e. sleep until the last task finishes its X
and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU
finish faster, and then those CPUs pull over the tasks that are still
running:

     v CPU v           ->time->

                    -------------
   0  (big)         11111  /333
                    -------------
   1  (big)         22222   /444|
                    -------------
   2  (LITTLE)      333333/
                    -------------
   3  (LITTLE)      444444/
                    -------------

Now when task 4 hits the barrier (at |) and wakes the others up,
there are 4 tasks with prev_cpu=<big> and 0 tasks with
prev_cpu=<little>. want_affine therefore means that we'll only look
in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled
on the bigs until the next load balance, something like this:

     v CPU v           ->time->

                    ------------------------
   0  (big)         11111  /333  31313\33333
                    ------------------------
   1  (big)         22222   /444|424\4444444
                    ------------------------
   2  (LITTLE)      333333/          \222222
                    ------------------------
   3  (LITTLE)      444444/            \1111
                    ------------------------
                                 ^^^
                           underutilization

So, I'm trying to get want_affine = 0 for these tasks.

I don't _think_ any incarnation of the wakee_flips mechanism can help
us here because which task is waker and which tasks are wakees
generally changes with each iteration.

However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the
nice property that we know exactly how many tasks are being woken, so
we can cheat.

It might be a disadvantage that we "widen" _every_ task that's woken in
an event, while select_idle_sibling would work fine for the first
sd_llc_size - 1 tasks.

IIUC, if wake_affine() behaves correctly this trick wouldn't be
necessary on SMP systems, so it might be best guarded by the presence
of SD_ASYM_CPUCAPACITY?

                               * * *

Final note..

In order to observe "perfect" behaviour for this use case, I also had
to disable the TTWU_QUEUE sched feature. Suppose during the wakeup
above we are working through the work queue and have placed tasks 3
and 2, and are about to place task 1:

     v CPU v           ->time->

                    --------------
   0  (big)         11111  /333  3
                    --------------
   1  (big)         22222   /444|4
                    --------------
   2  (LITTLE)      333333/      2
                    --------------
   3  (LITTLE)      444444/          <- Task 1 should go here
                    --------------

If TTWU_QUEUE is enabled, we will not yet have enqueued task
2 (having instead sent a reschedule IPI) or attached its load to CPU
2. So we are likely to also place task 1 on cpu 2. Disabling
TTWU_QUEUE means that we enqueue task 2 before placing task 1,
solving this issue. TTWU_QUEUE is there to minimise rq lock
contention, and I guess that this contention is less of an issue on
big.LITTLE systems since they have relatively few CPUs, which
suggests the trade-off makes sense here.

Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
( - Applied from https://patchwork.kernel.org/patch/9895261/
  - Fixed trivial conflict in kernel/sched/core.c
  - Fixed select_task_rq_idle, now in kernel/sched/idle.c
  - Fixed trivial conflict in select_task_rq_fair )
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Change-Id: I3cfc4bf48c3d7feef969db4d22449f4fbb4f795d
2018-10-26 12:15:52 +01:00
Chris Redpath
af362094df ANDROID: sched: Unconditionally honor sync flag for energy-aware wakeups
Since we don't do energy-aware wakeups when we are overutilized, always
honoring sync wakeups in this state does not prevent wake-wide mechanics
overruling the flag as normal.

This patch is based upon previous work to build EAS for android products.

sync-hint code taken from commit 4a5e890ec60d
"sched/fair: add tunable to force selection at cpu granularity" written
by Juri Lelli <juri.lelli@arm.com>

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry-picked from commit f1ec666a62dec1083ed52fe1ddef093b84373aaf)
[ Moved the feature to find_energy_efficient_cpu() ]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Change-Id: I4b3d79141fc8e53dc51cd63ac11096c2e3cb10f5
2018-10-26 12:14:32 +01:00
Quentin Perret
2e88529cf8 ANDROID: sched: Introduce sysctl_sched_cstate_aware
Introduce a new sysctl for this option, 'sched_cstate_aware'.
When this is enabled, the scheduler can make use of the idle state
indexes in order to break the tie between potential CPU candidates.

This patch is based on 7f6fb825d6bc ("ANDROID: sched: EAS: take cstate
into account when selecting idle core") from android-4.14. All the
credits goes to the authors.

Change-Id: Ia076cf32faff91e90905291fa6f7924dc3dd6458
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-10-26 11:58:00 +01:00
Quentin Perret
b78eec5fb7 FROMLIST: sched: Introduce a sysctl for Energy Aware Scheduling
In its current state, Energy Aware Scheduling (EAS) starts automatically
on asymmetric platforms having an Energy Model (EM). However, there are
users who want to have an EM (for thermal management for example), but
don't want EAS with it.

In order to let users disable EAS explicitly, introduce a new sysctl
called 'sched_energy_aware'. It is enabled by default so that EAS can
start automatically on platforms where it makes sense. Flipping it to 0
rebuilds the scheduling domains and disables EAS.

Change-Id: I55764e70bf5e90795d2269ec9135ae6e82794a2b
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Message-Id: <20181016101513.26919-11-quentin.perret@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-10-26 11:47:12 +01:00
Quentin Perret
1e6b1214f1 FROMLIST: sched/topology: Make Energy Aware Scheduling depend on schedutil
Energy Aware Scheduling (EAS) is designed with the assumption that
frequencies of CPUs follow their utilization value. When using a CPUFreq
governor other than schedutil, the chances of this assumption being true
are small, if any. When schedutil is being used, EAS' predictions are at
least consistent with the frequency requests. Although those requests
have no guarantees to be honored by the hardware, they should at least
guide DVFS in the right direction and provide some hope in regards to the
EAS model being accurate.

To make sure EAS is only used in a sane configuration, create a strong
dependency on schedutil being used. Since having sugov compiled-in does
not provide that guarantee, make CPUFreq call a scheduler function on
governor changes hence letting it rebuild the scheduling domains, check
the governors of the online CPUs, and enable/disable EAS accordingly.

Change-Id: I872949134f97d2772fc681b7393eaed7f0e224f2
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Message-Id: <20181016101513.26919-9-quentin.perret@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-10-26 11:47:11 +01:00
Quentin Perret
f3e45428b7 FROMLIST: sched/cpufreq: Prepare schedutil for Energy Aware Scheduling
Schedutil requests frequency by aggregating utilization signals from
the scheduler (CFS, RT, DL, IRQ) and applying a 25% margin on top of
them. Since Energy Aware Scheduling (EAS) needs to be able to predict
the frequency requests, it needs to forecast the decisions made by the
governor.

In order to prepare the introduction of EAS, introduce
schedutil_freq_util() to centralize the aforementioned signal
aggregation and make it available to both schedutil and EAS. Since
frequency selection and energy estimation still need to deal with RT and
DL signals slightly differently, schedutil_freq_util() is called with a
different 'type' parameter in those two contexts, and returns an
aggregated utilization signal accordingly. While at it, introduce the
map_util_freq() function which is designed to make schedutil's 25%
margin usable easily for both sugov and EAS.

As EAS will be able to predict schedutil's frequency requests more
accurately than any other governor by design, it'd be sensible to make
sure EAS cannot be used without schedutil. This will be done later, once
EAS has actually been introduced.

Change-Id: Idbeeb00926045507b73f9cba37630b38ae0816c0
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Message-Id: <20181016101513.26919-3-quentin.perret@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-10-26 11:47:10 +01:00
Quentin Perret
30f6e1b5c9 FROMLIST: sched: Relocate arch_scale_cpu_capacity
By default, arch_scale_cpu_capacity() is only visible from within the
kernel/sched folder. Relocate it to include/linux/sched/topology.h to
make it visible to other clients needing to know about the capacity of
CPUs, such as the Energy Model framework.

Change-Id: I144c7299e122201dbcadc431d55d0a6d24d90005
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Message-Id: <20181016101513.26919-2-quentin.perret@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2018-10-26 11:47:10 +01:00
Morten Rasmussen
f2d1d0fea1 UPSTREAM: sched/topology: Add SD_ASYM_CPUCAPACITY flag detection
The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
sched_domain in the hierarchy where all CPU capacities are visible for
any CPU's point of view on asymmetric CPU capacity systems. The
scheduler can then take to take capacity asymmetry into account when
balancing at this level. It also serves as an indicator for how wide
task placement heuristics have to search to consider all available CPU
capacities as asymmetric systems might often appear symmetric at
smallest level(s) of the sched_domain hierarchy.

The flag has been around for while but so far only been set by
out-of-tree code in Android kernels. One solution is to let each
architecture provide the flag through a custom sched_domain topology
array and associated mask and flag functions. However,
SD_ASYM_CPUCAPACITY is special in the sense that it depends on the
capacity and presence of all CPUs in the system, i.e. when hotplugging
all CPUs out except those with one particular CPU capacity the flag
should disappear even if the sched_domains don't collapse. Similarly,
the flag is affected by cpusets where load-balancing is turned off.
Detecting when the flags should be set therefore depends not only on
topology information but also the cpuset configuration and hotplug
state. The arch code doesn't have easy access to the cpuset
configuration.

Instead, this patch implements the flag detection in generic code where
cpusets and hotplug state is already taken care of. All the arch is
responsible for is to implement arch_scale_cpu_capacity() and force a
full rebuild of the sched_domain hierarchy if capacities are updated,
e.g. later in the boot process when cpufreq has initialized.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1532093554-30504-2-git-send-email-morten.rasmussen@arm.com
[ Fixed 'CPU' capitalization. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 05484e0984)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>

Change-Id: I1d5f695a95f8d023f1ecf14ecb71a558ceb67ed6
2018-10-26 11:44:43 +01:00
Linus Torvalds
cd9b44f907 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - the rest of MM

 - procfs updates

 - various misc things

 - more y2038 fixes

 - get_maintainer updates

 - lib/ updates

 - checkpatch updates

 - various epoll updates

 - autofs updates

 - hfsplus

 - some reiserfs work

 - fatfs updates

 - signal.c cleanups

 - ipc/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits)
  ipc/util.c: update return value of ipc_getref from int to bool
  ipc/util.c: further variable name cleanups
  ipc: simplify ipc initialization
  ipc: get rid of ids->tables_initialized hack
  lib/rhashtable: guarantee initial hashtable allocation
  lib/rhashtable: simplify bucket_table_alloc()
  ipc: drop ipc_lock()
  ipc/util.c: correct comment in ipc_obtain_object_check
  ipc: rename ipcctl_pre_down_nolock()
  ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()
  ipc: reorganize initialization of kern_ipc_perm.seq
  ipc: compute kern_ipc_perm.id under the ipc lock
  init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
  fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp
  adfs: use timespec64 for time conversion
  kernel/sysctl.c: fix typos in comments
  drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md
  fork: don't copy inconsistent signal handler state to child
  signal: make get_signal() return bool
  signal: make sigkill_pending() return bool
  ...
2018-08-22 12:34:08 -07:00
Christian Brauner
52cba1a274 signal: make force_sigsegv() void
Patch series "signal: refactor some functions", v3.

This series refactors a bunch of functions in signal.c to simplify parts
of the code.

The greatest single change is declaring the static do_sigpending() helper
as void which makes it possible to remove a bunch of unnecessary checks in
the syscalls later on.

This patch (of 17):

force_sigsegv() returned 0 unconditionally so it doesn't make sense to have
it return at all. In addition, there are no callers that check
force_sigsegv()'s return value.

Link: http://lkml.kernel.org/r/20180602103653.18181-2-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:50 -07:00
Dmitry Vyukov
a2e5144538 kernel/hung_task.c: allow to set checking interval separately from timeout
Currently task hung checking interval is equal to timeout, as the result
hung is detected anywhere between timeout and 2*timeout.  This is fine for
most interactive environments, but this hurts automated testing setups
(syzbot).  In an automated setup we need to strictly order CPU lockup <
RCU stall < workqueue lockup < task hung < silent loss, so that RCU stall
is not detected as task hung and task hung is not detected as silent
machine loss.  The large variance in task hung detection timeout requires
setting silent machine loss timeout to a very large value (e.g.  if task
hung is 3 mins, then silent loss need to be set to ~7 mins).  The
additional 3 minutes significantly reduce testing efficiency because
usually we crash kernel within a minute, and this can add hours to bug
localization process as it needs to do dozens of tests.

Allow setting checking interval separately from timeout.  This allows to
set timeout to, say, 3 minutes, but checking interval to 10 secs.

The interval is controlled via a new hung_task_check_interval_secs sysctl,
similar to the existing hung_task_timeout_secs sysctl.  The default value
of 0 results in the current behavior: checking interval is equal to
timeout.

[akpm@linux-foundation.org: update hung_task_timeout_max's comment]
Link: http://lkml.kernel.org/r/20180611111004.203513-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:47 -07:00
Sebastian Andrzej Siewior
fc37191272 userns: use refcount_t for reference counting instead atomic_t
refcount_t type and corresponding API should be used instead of atomic_t
wh en the variable is used as a reference counter.  This avoids accidental
refcounter overflows that might lead to use-after-free situations.

Link: http://lkml.kernel.org/r/20180703200141.28415-6-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:46 -07:00
Linus Torvalds
0214f46b3a Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull core signal handling updates from Eric Biederman:
 "It was observed that a periodic timer in combination with a
  sufficiently expensive fork could prevent fork from every completing.
  This contains the changes to remove the need for that restart.

  This set of changes is split into several parts:

   - The first part makes PIDTYPE_TGID a proper pid type instead
     something only for very special cases. The part starts using
     PIDTYPE_TGID enough so that in __send_signal where signals are
     actually delivered we know if the signal is being sent to a a group
     of processes or just a single process.

   - With that prep work out of the way the logic in fork is modified so
     that fork logically makes signals received while it is running
     appear to be received after the fork completes"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits)
  signal: Don't send signals to tasks that don't exist
  signal: Don't restart fork when signals come in.
  fork: Have new threads join on-going signal group stops
  fork: Skip setting TIF_SIGPENDING in ptrace_init_task
  signal: Add calculate_sigpending()
  fork: Unconditionally exit if a fatal signal is pending
  fork: Move and describe why the code examines PIDNS_ADDING
  signal: Push pid type down into complete_signal.
  signal: Push pid type down into __send_signal
  signal: Push pid type down into send_signal
  signal: Pass pid type into do_send_sig_info
  signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task
  signal: Pass pid type into group_send_sig_info
  signal: Pass pid and pid type into send_sigqueue
  posix-timers: Noralize good_sigevent
  signal: Use PIDTYPE_TGID to clearly store where file signals will be sent
  pid: Implement PIDTYPE_TGID
  pids: Move the pgrp and session pid pointers from task_struct to signal_struct
  kvm: Don't open code task_pid in kvm_vcpu_ioctl
  pids: Compute task_tgid using signal->leader_pid
  ...
2018-08-21 13:47:29 -07:00
Shakeel Butt
d46eb14b73 fs: fsnotify: account fsnotify metadata to kmemcg
Patch series "Directed kmem charging", v8.

The Linux kernel's memory cgroup allows limiting the memory usage of the
jobs running on the system to provide isolation between the jobs.  All
the kernel memory allocated in the context of the job and marked with
__GFP_ACCOUNT will also be included in the memory usage and be limited
by the job's limit.

The kernel memory can only be charged to the memcg of the process in
whose context kernel memory was allocated.  However there are cases
where the allocated kernel memory should be charged to the memcg
different from the current processes's memcg.  This patch series
contains two such concrete use-cases i.e.  fsnotify and buffer_head.

The fsnotify event objects can consume a lot of system memory for large
or unlimited queues if there is either no or slow listener.  The events
are allocated in the context of the event producer.  However they should
be charged to the event consumer.  Similarly the buffer_head objects can
be allocated in a memcg different from the memcg of the page for which
buffer_head objects are being allocated.

To solve this issue, this patch series introduces mechanism to charge
kernel memory to a given memcg.  In case of fsnotify events, the memcg
of the consumer can be used for charging and for buffer_head, the memcg
of the page can be charged.  For directed charging, the caller can use
the scope API memalloc_[un]use_memcg() to specify the memcg to charge
for all the __GFP_ACCOUNT allocations within the scope.

This patch (of 2):

A lot of memory can be consumed by the events generated for the huge or
unlimited queues if there is either no or slow listener.  This can cause
system level memory pressure or OOMs.  So, it's better to account the
fsnotify kmem caches to the memcg of the listener.

However the listener can be in a different memcg than the memcg of the
producer and these allocations happen in the context of the event
producer.  This patch introduces remote memcg charging API which the
producer can use to charge the allocations to the memcg of the listener.

There are seven fsnotify kmem caches and among them allocations from
dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
inotify_inode_mark_cachep happens in the context of syscall from the
listener.  So, SLAB_ACCOUNT is enough for these caches.

The objects from fsnotify_mark_connector_cachep are not accounted as
they are small compared to the notification mark or events and it is
unclear whom to account connector to since it is shared by all events
attached to the inode.

The allocations from the event caches happen in the context of the event
producer.  For such caches we will need to remote charge the allocations
to the listener's memcg.  Thus we save the memcg reference in the
fsnotify_group structure of the listener.

This patch has also moved the members of fsnotify_group to keep the size
same, at least for 64 bit build, even with additional member by filling
the holes.

[shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
  Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:30 -07:00
Eric W. Biederman
c3ad2c3b02 signal: Don't restart fork when signals come in.
Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn>
report that a periodic signal received during fork can cause fork to
continually restart preventing an application from making progress.

The code was being overly pessimistic.  Fork needs to guarantee that a
signal sent to multiple processes is logically delivered before the
fork and just to the forking process or logically delivered after the
fork to both the forking process and it's newly spawned child.  For
signals like periodic timers that are always delivered to a single
process fork can safely complete and let them appear to logically
delivered after the fork().

While examining this issue I also discovered that fork today will miss
signals delivered to multiple processes during the fork and handled by
another thread.  Similarly the current code will also miss blocked
signals that are delivered to multiple process, as those signals will
not appear pending during fork.

Add a list of each thread that is currently forking, and keep on that
list a signal set that records all of the signals sent to multiple
processes.  When fork completes initialize the new processes
shared_pending signal set with it.  The calculate_sigpending function
will see those signals and set TIF_SIGPENDING causing the new task to
take the slow path to userspace to handle those signals.  Making it
appear as if those signals were received immediately after the fork.

It is not possible to send real time signals to multiple processes and
exceptions don't go to multiple processes, which means that that are
no signals sent to multiple processes that require siginfo.  This
means it is safe to not bother collecting siginfo on signals sent
during fork.

The sigaction of a child of fork is initially the same as the
sigaction of the parent process.  So a signal the parent ignores the
child will also initially ignore.  Therefore it is safe to ignore
signals sent to multiple processes and ignored by the forking process.

Signals sent to only a single process or only a single thread and delivered
during fork are treated as if they are received after the fork, and generally
not dealt with.  They won't cause any problems.

V2: Added removal from the multiprocess list on failure.
V3: Use -ERESTARTNOINTR directly
V4: - Don't queue both SIGCONT and SIGSTOP
    - Initialize signal_struct.multiprocess in init_task
    - Move setting of shared_pending to before the new task
      is visible to signals.  This prevents signals from comming
      in before shared_pending.signal is set to delayed.signal
      and being lost.
V5: - rework list add and delete to account for idle threads
v6: - Use sigdelsetmask when removing stop signals

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447
Reported-by: Wen Yang <wen.yang99@zte.com.cn> and
Reported-by: majiang <ma.jiang@zte.com.cn>
Fixes: 4a2c7a7837 ("[PATCH] make fork() atomic wrt pgrp/session signals")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-09 13:07:01 -05:00
Eric W. Biederman
924de3b8c9 fork: Have new threads join on-going signal group stops
There are only two signals that are delivered to every member of a
signal group: SIGSTOP and SIGKILL.  Signal delivery requires every
signal appear to be delivered either before or after a clone syscall.
SIGKILL terminates the clone so does not need to be considered.  Which
leaves only SIGSTOP that needs to be considered when creating new
threads.

Today in the event of a group stop TIF_SIGPENDING will get set and the
fork will restart ensuring the fork syscall participates in the group
stop.

A fork (especially of a process with a lot of memory) is one of the
most expensive system so we really only want to restart a fork when
necessary.

It is easy so check to see if a SIGSTOP is ongoing and have the new
thread join it immediate after the clone completes.  Making it appear
the clone completed happened just before the SIGSTOP.

The calculate_sigpending function will see the bits set in jobctl and
set TIF_SIGPENDING to ensure the new task takes the slow path to userspace.

V2: The call to task_join_group_stop was moved before the new task is
    added to the thread group list.  This should not matter as
    sighand->siglock is held over both the addition of the threads,
    the call to task_join_group_stop and do_signal_stop.  But the change
    is trivial and it is one less thing to worry about when reading
    the code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03 20:20:14 -05:00
Eric W. Biederman
088fe47ce9 signal: Add calculate_sigpending()
Add a function calculate_sigpending to test to see if any signals are
pending for a new task immediately following fork.  Signals have to
happen either before or after fork.  Today our practice is to push
all of the signals to before the fork, but that has the downside that
frequent or periodic signals can make fork take much much longer than
normal or prevent fork from completing entirely.

So we need move signals that we can after the fork to prevent that.

This updates the code to set TIF_SIGPENDING on a new task if there
are signals or other activities that have moved so that they appear
to happen after the fork.

As the code today restarts if it sees any such activity this won't
immediately have an effect, as there will be no reason for it
to set TIF_SIGPENDING immediately after the fork.

Adding calculate_sigpending means the code in fork can safely be
changed to not always restart if a signal is pending.

The new calculate_sigpending function sets sigpending if there
are pending bits in jobctl, pending signals, the freezer needs
to freeze the new task or the live kernel patching framework
need the new thread to take the slow path to userspace.

I have verified that setting TIF_SIGPENDING does make a new process
take the slow path to userspace before it executes it's first userspace
instruction.

I have looked at the callers of signal_wake_up and the code paths
setting TIF_SIGPENDING and I don't see anything else that needs to be
handled.  The code probably doesn't need to set TIF_SIGPENDING for the
kernel live patching as it uses a separate thread flag as well.  But
at this point it seems safer reuse the recalc_sigpending logic and get
the kernel live patching folks to sort out their story later.

V2: I have moved the test into schedule_tail where siglock can
    be grabbed and recalc_sigpending can be reused directly.
    Further as the last action of setting up a new task this
    guarantees that TIF_SIGPENDING will be properly set in the
    new process.

    The helper calculate_sigpending takes the siglock and
    uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending
    clear TIF_SIGPENDING if it is unnecessary.  This allows reusing
    the existing code and keeps maintenance of the conditions simple.

    Oleg Nesterov <oleg@redhat.com>  suggested the movement
    and pointed out the need to take siglock if this code
    was going to be called while the new task is discoverable.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03 20:10:31 -05:00
Ingo Molnar
4765096f4f Merge branch 'sched/urgent' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:29:58 +02:00
Al Viro
f88a333b44 alpha: fix osf_wait4() breakage
kernel_wait4() expects a userland address for status - it's only
rusage that goes as a kernel one (and needs a copyout afterwards)

[ Also, fix the prototype of kernel_wait4() to have that __user
  annotation   - Linus ]

Fixes: 92ebce5ac5 ("osf_wait4: switch to kernel_wait4()")
Cc: stable@kernel.org # v4.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-22 11:51:30 -07:00
Eric W. Biederman
24122c7f49 signal: Pass pid and pid type into send_sigqueue
Make the code more maintainable by performing more of the signal
related work in send_sigqueue.

A quick inspection of do_timer_create will show that this code path
does not lookup a thread group by a thread's pid.  Making it safe
to find the task pointed to by it_pid with "pid_task(it_pid, type)";

This supports the changes needed in fork to tell if a signal was sent
to a single process or a group of processes.

Having the pid to task transition in signal.c will also make it easier
to sort out races with de_thread and and the thread group leader
exiting when it comes time to address that.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21 10:43:12 -05:00
Eric W. Biederman
6883f81aac pid: Implement PIDTYPE_TGID
Everywhere except in the pid array we distinguish between a tasks pid and
a tasks tgid (thread group id).  Even in the enumeration we want that
distinction sometimes so we have added __PIDTYPE_TGID.  With leader_pid
we almost have an implementation of PIDTYPE_TGID in struct signal_struct.

Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
into the pids array.  Then remove the __PIDTYPE_TGID special case and the
leader_pid in signal_struct.

The net size increase is just an extra pointer added to struct pid and
an extra pair of pointers of an hlist_node added to task_struct.

The effect on code maintenance is the removal of a number of special
cases today and the potential to remove many more special cases as
PIDTYPE_TGID gets used to it's fullest.  The long term potential
is allowing zombie thread group leaders to exit, which will remove
a lot more special cases in the code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21 10:43:12 -05:00