Tasks without a user-defined clamp value are considered not clamped
and by default their utilization can have any value in the
[0..SCHED_CAPACITY_SCALE] range.
Tasks with a user-defined clamp value are allowed to request any value
in that range, and the required clamp is unconditionally enforced.
However, a "System Management Software" could be interested in limiting
the range of clamp values allowed for all tasks.
Add a privileged interface to define a system default configuration via:
/proc/sys/kernel/sched_uclamp_util_{min,max}
which works as an unconditional clamp range restriction for all tasks.
With the default configuration, the full SCHED_CAPACITY_SCALE range of
values is allowed for each clamp index. Otherwise, the task-specific
clamp is capped by the corresponding system default value.
Do that by tracking, for each task, the "effective" clamp value and
bucket the task has been refcounted in at enqueue time. This
allows to lazy aggregate "requested" and "system default" values at
enqueue time and simplifies refcounting updates at dequeue time.
The cached bucket ids are used to avoid (relatively) more expensive
integer divisions every time a task is enqueued.
An active flag is used to report when the "effective" value is valid and
thus the task is actually refcounted in the corresponding rq's bucket.
Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-5-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e8f14172c6)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I4f014c5ec9c312aaad606518f6e205fd0cfbcaa2
Signed-off-by: Quentin Perret <qperret@google.com>
Utilization clamping allows to clamp the CPU's utilization within a
[util_min, util_max] range, depending on the set of RUNNABLE tasks on
that CPU. Each task references two "clamp buckets" defining its minimum
and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp
bucket is active if there is at least one RUNNABLE tasks enqueued on
that CPU and refcounting that bucket.
When a task is {en,de}queued {on,from} a rq, the set of active clamp
buckets on that CPU can change. If the set of active clamp buckets
changes for a CPU a new "aggregated" clamp value is computed for that
CPU. This is because each clamp bucket enforces a different utilization
clamp value.
Clamp values are always MAX aggregated for both util_min and util_max.
This ensures that no task can affect the performance of other
co-scheduled tasks which are more boosted (i.e. with higher util_min
clamp) or less capped (i.e. with higher util_max clamp).
A task has:
task_struct::uclamp[clamp_id]::bucket_id
to track the "bucket index" of the CPU's clamp bucket it refcounts while
enqueued, for each clamp index (clamp_id).
A runqueue has:
rq::uclamp[clamp_id]::bucket[bucket_id].tasks
to track how many RUNNABLE tasks on that CPU refcount each
clamp bucket (bucket_id) of a clamp index (clamp_id).
It also has a:
rq::uclamp[clamp_id]::bucket[bucket_id].value
to track the clamp value of each clamp bucket (bucket_id) of a clamp
index (clamp_id).
The rq::uclamp::bucket[clamp_id][] array is scanned every time it's
needed to find a new MAX aggregated clamp value for a clamp_id. This
operation is required only when it's dequeued the last task of a clamp
bucket tracking the current MAX aggregated clamp value. In this case,
the CPU is either entering IDLE or going to schedule a less boosted or
more clamped task.
The expected number of different clamp values configured at build time
is small enough to fit the full unordered array into a single cache
line, for configurations of up to 7 buckets.
Add to struct rq the basic data structures required to refcount the
number of RUNNABLE tasks for each clamp bucket. Add also the max
aggregation required to update the rq's clamp value at each
enqueue/dequeue event.
Use a simple linear mapping of clamp values into clamp buckets.
Pre-compute and cache bucket_id to avoid integer divisions at
enqueue/dequeue time.
Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 69842cba9a)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I2c2c23572fb82e004f815cc9c783881355df6836
Signed-off-by: Quentin Perret <qperret@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4La9gACgkQONu9yGCS
aT6hlA//TDpj9rdEwkaKyg/Ge4TCOJSOiwlp2/5lg2Sroiuizz527hVybGOOYAHl
gMA2Syt73PWStyfgl5B3AimcBvPADX8h/b1KiSoIdHFkq5rPFyneB6aEj+5jSK1V
63UnnTV0T49wt0Jvs6nN0FxI4ZCXbfjzaSVz4BGIflz6h9UUkPAu91CJTKtPmrAp
pliH20cMOykxyS/KfKa6zDcpIfU0k+DxL5U0Y5F1YRDKc1iPg8e6I3cNLgwKSja6
21BgdoTyZdvbC85HxSY7V6Dswp4YQPBY3y8crp8npZ9apbYV7eNU3L1+WVQvxpFg
kahhyjalqwqkKq+cTEsIFj7cjPksSlH/qytTS+lnN3BScXbFPp8GdzIazhQNSCv3
S/7T51CcvNoVcs9Qeu+nwyvx+H1LH4MYO4C7RYWZhPnMcA+/MxvT5WXNKfjf2ekM
N5h8xNATllzDuDkX+zVwW8i80SCyhVqQIKbXLn8ugGYW3G5TNdy8Ysh0kdrq26Y+
LAELsbQhK/Kt8WF+XNBpb9LLbeUGn1GTwhnbEuD7IKI+bVxnmsGk8QUu3h+a9xFh
lI7bsj8Ku9T+59/9xqAnoStEto+0tdTPB9Cx1jNdWlLiVdkewiDKiUbloFpDFS1n
L3SvqB68DC/IznQcK970g3aIx9zbkb2KZRdj2Fu7apaY5D9q85I=
=W+5k
-----END PGP SIGNATURE-----
Merge 4.19.92 into android-4.19
Changes in 4.19.92
af_packet: set defaule value for tmo
fjes: fix missed check in fjes_acpi_add
mod_devicetable: fix PHY module format
net: dst: Force 4-byte alignment of dst_metrics
net: gemini: Fix memory leak in gmac_setup_txqs
net: hisilicon: Fix a BUG trigered by wrong bytes_compl
net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive()
net: qlogic: Fix error paths in ql_alloc_large_buffers()
net: usb: lan78xx: Fix suspend/resume PHY register access error
qede: Disable hardware gro when xdp prog is installed
qede: Fix multicast mac configuration
sctp: fully initialize v4 addr in some functions
selftests: forwarding: Delete IPv6 address at the end
btrfs: don't double lock the subvol_sem for rename exchange
btrfs: do not call synchronize_srcu() in inode_tree_del
Btrfs: fix missing data checksums after replaying a log tree
btrfs: send: remove WARN_ON for readonly mount
btrfs: abort transaction after failed inode updates in create_subvol
btrfs: skip log replay on orphaned roots
btrfs: do not leak reloc root if we fail to read the fs root
btrfs: handle ENOENT in btrfs_uuid_tree_iterate
Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
ALSA: pcm: Avoid possible info leaks from PCM stream buffers
ALSA: hda/ca0132 - Keep power on during processing DSP response
ALSA: hda/ca0132 - Avoid endless loop
ALSA: hda/ca0132 - Fix work handling in delayed HP detection
drm: mst: Fix query_payload ack reply struct
drm/panel: Add missing drm_panel_init() in panel drivers
drm/bridge: analogix-anx78xx: silence -EPROBE_DEFER warnings
iio: light: bh1750: Resolve compiler warning and make code more readable
drm/amdgpu: grab the id mgr lock while accessing passid_mapping
spi: Add call to spi_slave_abort() function when spidev driver is released
staging: rtl8192u: fix multiple memory leaks on error path
staging: rtl8188eu: fix possible null dereference
rtlwifi: prevent memory leak in rtl_usb_probe
libertas: fix a potential NULL pointer dereference
ath10k: fix backtrace on coredump
IB/iser: bound protection_sg size by data_sg size
media: am437x-vpfe: Setting STD to current value is not an error
media: i2c: ov2659: fix s_stream return value
media: ov6650: Fix crop rectangle alignment not passed back
media: i2c: ov2659: Fix missing 720p register config
media: ov6650: Fix stored frame format not in sync with hardware
media: ov6650: Fix stored crop rectangle not in sync with hardware
tools/power/cpupower: Fix initializer override in hsw_ext_cstates
media: venus: core: Fix msm8996 frequency table
ath10k: fix offchannel tx failure when no ath10k_mac_tx_frm_has_freq
pinctrl: devicetree: Avoid taking direct reference to device name string
drm/amdkfd: fix a potential NULL pointer dereference (v2)
selftests/bpf: Correct path to include msg + path
media: venus: Fix occasionally failures to suspend
usb: renesas_usbhs: add suspend event support in gadget mode
hwrng: omap3-rom - Call clk_disable_unprepare() on exit only if not idled
regulator: max8907: Fix the usage of uninitialized variable in max8907_regulator_probe()
media: flexcop-usb: fix NULL-ptr deref in flexcop_usb_transfer_init()
media: cec-funcs.h: add status_req checks
drm/bridge: dw-hdmi: Refuse DDC/CI transfers on the internal I2C controller
samples: pktgen: fix proc_cmd command result check logic
block: Fix writeback throttling W=1 compiler warnings
mwifiex: pcie: Fix memory leak in mwifiex_pcie_init_evt_ring
drm/drm_vblank: Change EINVAL by the correct errno
media: cx88: Fix some error handling path in 'cx8800_initdev()'
media: ti-vpe: vpe: Fix Motion Vector vpdma stride
media: ti-vpe: vpe: fix a v4l2-compliance warning about invalid pixel format
media: ti-vpe: vpe: fix a v4l2-compliance failure about frame sequence number
media: ti-vpe: vpe: Make sure YUYV is set as default format
media: ti-vpe: vpe: fix a v4l2-compliance failure causing a kernel panic
media: ti-vpe: vpe: ensure buffers are cleaned up properly in abort cases
media: ti-vpe: vpe: fix a v4l2-compliance failure about invalid sizeimage
syscalls/x86: Use the correct function type in SYSCALL_DEFINE0
drm/amd/display: Fix dongle_caps containing stale information.
extcon: sm5502: Reset registers during initialization
x86/mm: Use the correct function type for native_set_fixmap()
ath10k: Correct error handling of dma_map_single()
drm/bridge: dw-hdmi: Restore audio when setting a mode
perf test: Report failure for mmap events
perf report: Add warning when libunwind not compiled in
usb: usbfs: Suppress problematic bind and unbind uevents.
iio: adc: max1027: Reset the device at probe time
Bluetooth: missed cpu_to_le16 conversion in hci_init4_req
Bluetooth: Workaround directed advertising bug in Broadcom controllers
Bluetooth: hci_core: fix init for HCI_USER_CHANNEL
bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()
x86/mce: Lower throttling MCE messages' priority to warning
perf tests: Disable bp_signal testing for arm64
drm/gma500: fix memory disclosures due to uninitialized bytes
rtl8xxxu: fix RTL8723BU connection failure issue after warm reboot
ipmi: Don't allow device module unload when in use
x86/ioapic: Prevent inconsistent state when moving an interrupt
media: smiapp: Register sensor after enabling runtime PM on the device
md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit
arm64: psci: Reduce the waiting time for cpu_psci_cpu_kill()
i40e: initialize ITRN registers with correct values
net: phy: dp83867: enable robust auto-mdix
drm/tegra: sor: Use correct SOR index on Tegra210
spi: sprd: adi: Add missing lock protection when rebooting
ACPI: button: Add DMI quirk for Medion Akoya E2215T
RDMA/qedr: Fix memory leak in user qp and mr
gpu: host1x: Allocate gather copy for host1x
net: dsa: LAN9303: select REGMAP when LAN9303 enable
phy: qcom-usb-hs: Fix extcon double register after power cycle
s390/time: ensure get_clock_monotonic() returns monotonic values
s390/mm: add mm_pxd_folded() checks to pxd_free()
net: hns3: add struct netdev_queue debug info for TX timeout
libata: Ensure ata_port probe has completed before detach
loop: fix no-unmap write-zeroes request behavior
pinctrl: sh-pfc: sh7734: Fix duplicate TCLK1_B
iio: dln2-adc: fix iio_triggered_buffer_postenable() position
libbpf: Fix error handling in bpf_map__reuse_fd()
Bluetooth: Fix advertising duplicated flags
pinctrl: amd: fix __iomem annotation in amd_gpio_irq_handler()
ixgbe: protect TX timestamping from API misuse
media: rcar_drif: fix a memory disclosure
media: v4l2-core: fix touch support in v4l_g_fmt
nvmem: imx-ocotp: reset error status on probe
rfkill: allocate static minor
bnx2x: Fix PF-VF communication over multi-cos queues.
spi: img-spfi: fix potential double release
ALSA: timer: Limit max amount of slave instances
rtlwifi: fix memory leak in rtl92c_set_fw_rsvdpagepkt()
perf probe: Fix to find range-only function instance
perf probe: Fix to list probe event with correct line number
perf jevents: Fix resource leak in process_mapfile() and main()
perf probe: Walk function lines in lexical blocks
perf probe: Fix to probe an inline function which has no entry pc
perf probe: Fix to show ranges of variables in functions without entry_pc
perf probe: Fix to show inlined function callsite without entry_pc
libsubcmd: Use -O0 with DEBUG=1
perf probe: Fix to probe a function which has no entry pc
perf tools: Splice events onto evlist even on error
drm/amdgpu: disallow direct upload save restore list from gfx driver
drm/amdgpu: fix potential double drop fence reference
xen/gntdev: Use select for DMA_SHARED_BUFFER
perf parse: If pmu configuration fails free terms
perf probe: Skip overlapped location on searching variables
perf probe: Return a better scope DIE if there is no best scope
perf probe: Fix to show calling lines of inlined functions
perf probe: Skip end-of-sequence and non statement lines
perf probe: Filter out instances except for inlined subroutine and subprogram
ath10k: fix get invalid tx rate for Mesh metric
fsi: core: Fix small accesses and unaligned offsets via sysfs
media: pvrusb2: Fix oops on tear-down when radio support is not present
soundwire: intel: fix PDI/stream mapping for Bulk
crypto: atmel - Fix authenc support when it is set to m
ice: delay less
media: si470x-i2c: add missed operations in remove
EDAC/ghes: Fix grain calculation
spi: pxa2xx: Add missed security checks
ASoC: rt5677: Mark reg RT5677_PWR_ANLG2 as volatile
iio: dac: ad5446: Add support for new AD5600 DAC
ASoC: Intel: kbl_rt5663_rt5514_max98927: Add dmic format constraint
s390/disassembler: don't hide instruction addresses
nvme: Discard workaround for non-conformant devices
parport: load lowlevel driver if ports not found
bcache: fix static checker warning in bcache_device_free()
cpufreq: Register drivers only after CPU devices have been registered
x86/crash: Add a forward declaration of struct kimage
tracing: use kvcalloc for tgid_map array allocation
tracing/kprobe: Check whether the non-suffixed symbol is notrace
bcache: fix deadlock in bcache_allocator
iwlwifi: mvm: fix unaligned read of rx_pkt_status
ASoC: wm8904: fix regcache handling
spi: tegra20-slink: add missed clk_unprepare
tun: fix data-race in gro_normal_list()
crypto: virtio - deal with unsupported input sizes
mmc: tmio: Add MMC_CAP_ERASE to allow erase/discard/trim requests
btrfs: don't prematurely free work in end_workqueue_fn()
btrfs: don't prematurely free work in run_ordered_work()
ASoC: wm2200: add missed operations in remove and probe failure
spi: st-ssc4: add missed pm_runtime_disable
ASoC: wm5100: add missed pm_runtime_disable
ASoC: Intel: bytcr_rt5640: Update quirk for Acer Switch 10 SW5-012 2-in-1
x86/insn: Add some Intel instructions to the opcode map
brcmfmac: remove monitor interface when detaching
iwlwifi: check kasprintf() return value
fbtft: Make sure string is NULL terminated
net: ethernet: ti: ale: clean ale tbl on init and intf restart
crypto: sun4i-ss - Fix 64-bit size_t warnings
crypto: sun4i-ss - Fix 64-bit size_t warnings on sun4i-ss-hash.c
mac80211: consider QoS Null frames for STA_NULLFUNC_ACKED
crypto: vmx - Avoid weird build failures
libtraceevent: Fix memory leakage in copy_filter_type
mips: fix build when "48 bits virtual memory" is enabled
drm/amdgpu: fix bad DMA from INTERRUPT_CNTL2
net: phy: initialise phydev speed and duplex sanely
btrfs: don't prematurely free work in reada_start_machine_worker()
btrfs: don't prematurely free work in scrub_missing_raid56_worker()
Revert "mmc: sdhci: Fix incorrect switch to HS mode"
mmc: mediatek: fix CMD_TA to 2 for MT8173 HS200/HS400 mode
can: kvaser_usb: kvaser_usb_leaf: Fix some info-leaks to USB devices
usb: xhci: Fix build warning seen with CONFIG_PM=n
drm/amdgpu: fix uninitialized variable pasid_mapping_needed
s390/ftrace: fix endless recursion in function_graph tracer
btrfs: return error pointer from alloc_test_extent_buffer
usbip: Fix receive error in vhci-hcd when using scatter-gather
usbip: Fix error path of vhci_recv_ret_submit()
cpufreq: Avoid leaving stale IRQ work items during CPU offline
USB: EHCI: Do not return -EPIPE when hub is disconnected
intel_th: pci: Add Comet Lake PCH-V support
intel_th: pci: Add Elkhart Lake SOC support
platform/x86: hp-wmi: Make buffer for HPWMI_FEATURE2_QUERY 128 bytes
staging: comedi: gsc_hpdi: check dma_alloc_coherent() return value
ext4: fix ext4_empty_dir() for directories with holes
ext4: check for directory entries too close to block end
ext4: unlock on error in ext4_expand_extra_isize()
KVM: arm64: Ensure 'params' is initialised when looking up sys register
x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure()
x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]
powerpc/vcpu: Assume dedicated processors as non-preempt
powerpc/irq: fix stack overflow verification
mmc: sdhci-msm: Correct the offset and value for DDR_CONFIG register
mmc: sdhci-of-esdhc: Revert "mmc: sdhci-of-esdhc: add erratum A-009204 support"
mmc: sdhci: Update the tuning failed messages to pr_debug level
mmc: sdhci-of-esdhc: fix P2020 errata handling
mmc: sdhci: Workaround broken command queuing on Intel GLK
mmc: sdhci: Add a quirk for broken command queuing
nbd: fix shutdown and recv work deadlock v2
perf probe: Fix to show function entry line as probe-able
Linux 4.19.92
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic4c7f9c713549ebb3319cd0275e88678bfa0e53d
commit 85572c2c4a upstream.
The scheduler code calling cpufreq_update_util() may run during CPU
offline on the target CPU after the IRQ work lists have been flushed
for it, so the target CPU should be prevented from running code that
may queue up an IRQ work item on it at that point.
Unfortunately, that may not be the case if dvfs_possible_from_any_cpu
is set for at least one cpufreq policy in the system, because that
allows the CPU going offline to run the utilization update callback
of the cpufreq governor on behalf of another (online) CPU in some
cases.
If that happens, the cpufreq governor callback may queue up an IRQ
work on the CPU running it, which is going offline, and the IRQ work
may not be flushed after that point. Moreover, that IRQ work cannot
be flushed until the "offlining" CPU goes back online, so if any
other CPU calls irq_work_sync() to wait for the completion of that
IRQ work, it will have to wait until the "offlining" CPU is back
online and that may not happen forever. In particular, a system-wide
deadlock may occur during CPU online as a result of that.
The failing scenario is as follows. CPU0 is the boot CPU, so it
creates a cpufreq policy and becomes the "leader" of it
(policy->cpu). It cannot go offline, because it is the boot CPU.
Next, other CPUs join the cpufreq policy as they go online and they
leave it when they go offline. The last CPU to go offline, say CPU3,
may queue up an IRQ work while running the governor callback on
behalf of CPU0 after leaving the cpufreq policy because of the
dvfs_possible_from_any_cpu effect described above. Then, CPU0 is
the only online CPU in the system and the stale IRQ work is still
queued on CPU3. When, say, CPU1 goes back online, it will run
irq_work_sync() to wait for that IRQ work to complete and so it
will wait for CPU3 to go back online (which may never happen even
in principle), but (worse yet) CPU0 is waiting for CPU1 at that
point too and a system-wide deadlock occurs.
To address this problem notice that CPUs which cannot run cpufreq
utilization update code for themselves (for example, because they
have left the cpufreq policies that they belonged to), should also
be prevented from running that code on behalf of the other CPUs that
belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so
in that case the cpufreq_update_util_data pointer of the CPU running
the code must not be NULL as well as for the CPU which is the target
of the cpufreq utilization update in progress.
Accordingly, change cpufreq_this_cpu_can_update() into a regular
function in kernel/sched/cpufreq.c (instead of a static inline in a
header file) and make it check the cpufreq_update_util_data pointer
of the local CPU if dvfs_possible_from_any_cpu is set for the target
cpufreq policy.
Also update the schedutil governor to do the
cpufreq_this_cpu_can_update() check in the non-fast-switch
case too to avoid the stale IRQ work issues.
Fixes: 99d14d0e16 ("cpufreq: Process remote callbacks from any CPU if the platform permits")
Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/
Reported-by: Anson Huang <anson.huang@nxp.com>
Tested-by: Anson Huang <anson.huang@nxp.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl3owgEACgkQONu9yGCS
aT43zw//SS1As83XXuHr4mdWIVDjXo6RMJ6Ib7YbRi/uhBmQuUuGVFcqGxUIA9Kl
eSXu5Kt8TNmInzHq9AMYgegrELAEwPD2XfptALGDwiUHonQuiFaqOQn/bltJOm1L
PsG15A7+/gFhuhPJDp2ZfNBmZGdpXdIwD27oUDqF1XD64dMa/HPbFUVgxWn3HHkd
sm0J6Ez0eNA+BmLnHXYDiSaEYIiwvy1nN6XpyIfOyb2Tz6kPoe0vVWU00Cmy8KAU
EIWB+TBRunspgMsShL5Cl1MSFOxf9QOmgnZxcrODAQfb1TbLMACB1FGMjK4nLm+3
wPlSnC7L49ARl/pvmN5NOUrjHi8S8qq/Od9QW+UIckRI6KzOU832h99v4gFuHjSC
KFiLi5K9+uTIMgNOETmINBiKKUcUzYXYVajvm4tuAUq3HO8wy6jeALtt34OiJZQZ
DV8wyBdL9NDUFqBymFaMFA4Us/fGIREzvPgI0E0jth2ANuLFLtScrnStuWv8buwJ
JT3V9xCxHZtZ3Ctevx/Jp6OaQtnbSnWjMjrO0UDzZ6N7+g5UKmh9/R3xL6sBpFVU
Vu49J+qWU3VmbY3EIulel+yARNe7xS4ExK185JmNzpYFyOpXum14FHhhtQ6xNSeu
dRqyITI0KYP7jWtBDKCgVAWF5jC9gHP1ksrHSZMhyGrv1dC1XZM=
=KnJW
-----END PGP SIGNATURE-----
Merge 4.19.88 into android-4.19
Changes in 4.19.88
clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate
clocksource/drivers/mediatek: Fix error handling
ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX
ASoC: compress: fix unsigned integer overflow check
reset: Fix memory leak in reset_control_array_put()
clk: samsung: exynos5433: Fix error paths
ASoC: kirkwood: fix external clock probe defer
ASoC: kirkwood: fix device remove ordering
clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume
pinctrl: cherryview: Allocate IRQ chip dynamic
ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts
reset: fix reset_control_ops kerneldoc comment
clk: at91: avoid sleeping early
clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup
clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18
ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend
samples/bpf: fix build by setting HAVE_ATTR_TEST to zero
powerpc/bpf: Fix tail call implementation
idr: Fix integer overflow in idr_for_each_entry
idr: Fix idr_alloc_u32 on 32-bit systems
x86/resctrl: Prevent NULL pointer dereference when reading mondata
clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call
clk: ti: clkctrl: Fix failed to enable error with double udelay timeout
net: fec: add missed clk_disable_unprepare in remove
bridge: ebtables: don't crash when using dnat target in output chains
can: peak_usb: report bus recovery as well
can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open
can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak
can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max
can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM
can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors
can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error
can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error
can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails
can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition
watchdog: meson: Fix the wrong value of left time
ASoC: stm32: sai: add restriction on mmap support
scripts/gdb: fix debugging modules compiled with hot/cold partitioning
net: bcmgenet: use RGMII loopback for MAC reset
net: bcmgenet: reapply manual settings to the PHY
net: mscc: ocelot: fix __ocelot_rmw_ix prototype
ceph: return -EINVAL if given fsc mount option on kernel w/o support
net/fq_impl: Switch to kvmalloc() for memory allocation
mac80211: fix station inactive_time shortly after boot
block: drbd: remove a stray unlock in __drbd_send_protocol()
pwm: bcm-iproc: Prevent unloading the driver module while in use
scsi: target/tcmu: Fix queue_cmd_ring() declaration
scsi: lpfc: Fix kernel Oops due to null pring pointers
scsi: lpfc: Fix dif and first burst use in write commands
ARM: dts: Fix up SQ201 flash access
tracing: Lock event_mutex before synth_event_mutex
ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed
ARM: dts: imx51: Fix memory node duplication
ARM: dts: imx53: Fix memory node duplication
ARM: dts: imx31: Fix memory node duplication
ARM: dts: imx35: Fix memory node duplication
ARM: dts: imx7: Fix memory node duplication
ARM: dts: imx6ul: Fix memory node duplication
ARM: dts: imx6sx: Fix memory node duplication
ARM: dts: imx6sl: Fix memory node duplication
ARM: dts: imx50: Fix memory node duplication
ARM: dts: imx23: Fix memory node duplication
ARM: dts: imx1: Fix memory node duplication
ARM: dts: imx27: Fix memory node duplication
ARM: dts: imx25: Fix memory node duplication
ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication
parisc: Fix serio address output
parisc: Fix HP SDC hpa address output
ARM: dts: Fix hsi gdd range for omap4
arm64: mm: Prevent mismatched 52-bit VA support
arm64: smp: Handle errors reported by the firmware
bus: ti-sysc: Check for no-reset and no-idle flags at the child level
platform/x86: mlx-platform: Fix LED configuration
ARM: OMAP1: fix USB configuration for device-only setups
RDMA/hns: Fix the bug while use multi-hop of pbl
arm64: preempt: Fix big-endian when checking preempt count in assembly
RDMA/vmw_pvrdma: Use atomic memory allocation in create AH
PM / AVS: SmartReflex: NULL check before some freeing functions is not needed
xfs: zero length symlinks are not valid
ARM: ks8695: fix section mismatch warning
ACPI / LPSS: Ignore acpi_device_fix_up_power() return value
scsi: lpfc: Enable Management features for IF_TYPE=6
scsi: qla2xxx: Fix NPIV handling for FC-NVMe
scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port
nvme: provide fallback for discard alloc failure
s390/zcrypt: make sysfs reset attribute trigger queue reset
crypto: user - support incremental algorithm dumps
arm64: dts: renesas: draak: Fix CVBS input
mwifiex: fix potential NULL dereference and use after free
mwifiex: debugfs: correct histogram spacing, formatting
brcmfmac: set F2 watermark to 256 for 4373
brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373
rtl818x: fix potential use after free
bcache: do not check if debug dentry is ERR or NULL explicitly on remove
bcache: do not mark writeback_running too early
xfs: require both realtime inodes to mount
nvme: fix kernel paging oops
ubifs: Fix default compression selection in ubifs
ubi: Put MTD device after it is not used
ubi: Do not drop UBI device reference before using
microblaze: adjust the help to the real behavior
microblaze: move "... is ready" messages to arch/microblaze/Makefile
microblaze: fix multiple bugs in arch/microblaze/boot/Makefile
iwlwifi: move iwl_nvm_check_version() into dvm
iwlwifi: mvm: force TCM re-evaluation on TCM resume
iwlwifi: pcie: fix erroneous print
iwlwifi: pcie: set cmd_len in the correct place
gpio: pca953x: Fix AI overflow on PCAL6524
gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB
kvm: vmx: Set IA32_TSC_AUX for legacy mode guests
Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS"
Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()"
crypto/chelsio/chtls: listen fails with multiadapt
VSOCK: bind to random port for VMADDR_PORT_ANY
mmc: meson-gx: make sure the descriptor is stopped on errors
mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET
usb: ehci-omap: Fix deferred probe for phy handling
btrfs: Check for missing device before bio submission in btrfs_map_bio
btrfs: fix ncopies raid_attr for RAID56
btrfs: dev-replace: set result code of cancel by status of scrub
Btrfs: allow clear_extent_dirty() to receive a cached extent state record
btrfs: only track ref_heads in delayed_ref_updates
serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback
HID: intel-ish-hid: fixes incorrect error handling
gpio: raspberrypi-exp: decrease refcount on firmware dt node
serial: 8250: Rate limit serial port rx interrupts during input overruns
kprobes/x86/xen: blacklist non-attachable xen interrupt functions
xen/pciback: Check dev_data before using it
kprobes: Blacklist symbols in arch-defined prohibited area
kprobes/x86: Show x86-64 specific blacklisted symbols correctly
vfio-mdev/samples: Use u8 instead of char for handle functions
memory: omap-gpmc: Get the header of the enum
pinctrl: xway: fix gpio-hog related boot issues
net/mlx5: Continue driver initialization despite debugfs failure
netfilter: nf_nat_sip: fix RTP/RTCP source port translations
exofs_mount(): fix leaks on failure exits
bnxt_en: Return linux standard errors in bnxt_ethtool.c
bnxt_en: Save ring statistics before reset.
bnxt_en: query force speeds before disabling autoneg mode.
KVM: s390: unregister debug feature on failing arch init
pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width
pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration
pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10
HID: doc: fix wrong data structure reference for UHID_OUTPUT
dm flakey: Properly corrupt multi-page bios.
gfs2: take jdata unstuff into account in do_grow
dm raid: fix false -EBUSY when handling check/repair message
xfs: Align compat attrlist_by_handle with native implementation.
xfs: Fix bulkstat compat ioctls on x32 userspace.
IB/qib: Fix an error code in qib_sdma_verbs_send()
clocksource/drivers/fttmr010: Fix invalid interrupt register access
vxlan: Fix error path in __vxlan_dev_create()
powerpc/book3s/32: fix number of bats in p/v_block_mapped()
powerpc/xmon: fix dump_segments()
drivers/regulator: fix a missing check of return value
Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading
serial: max310x: Fix tx_empty() callback
openrisc: Fix broken paths to arch/or32
RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer
scsi: qla2xxx: deadlock by configfs_depend_item
scsi: csiostor: fix incorrect dma device in case of vport
brcmfmac: Fix access point mode
ath6kl: Only use match sets when firmware supports it
ath6kl: Fix off by one error in scan completion
powerpc/perf: Fix unit_sel/cache_sel checks
powerpc/32: Avoid unsupported flags with clang
powerpc/prom: fix early DEBUG messages
powerpc/mm: Make NULL pointer deferences explicit on bad page faults.
powerpc/44x/bamboo: Fix PCI range
vfio/spapr_tce: Get rid of possible infinite loop
powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
drbd: ignore "all zero" peer volume sizes in handshake
drbd: reject attach of unsuitable uuids even if connected
drbd: do not block when adjusting "disk-options" while IO is frozen
drbd: fix print_st_err()'s prototype to match the definition
IB/rxe: Make counters thread safe
bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't
regulator: tps65910: fix a missing check of return value
powerpc/83xx: handle machine check caused by watchdog timer
powerpc/pseries: Fix node leak in update_lmb_associativity_index()
powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y
crypto: mxc-scc - fix build warnings on ARM64
pwm: clps711x: Fix period calculation
net/netlink_compat: Fix a missing check of nla_parse_nested
net/net_namespace: Check the return value of register_pernet_subsys()
f2fs: fix block address for __check_sit_bitmap
f2fs: fix to dirty inode synchronously
um: Include sys/uio.h to have writev()
um: Make GCOV depend on !KCOV
net: (cpts) fix a missing check of clk_prepare
net: stmicro: fix a missing check of clk_prepare
net: dsa: bcm_sf2: Propagate error value from mdio_write
atl1e: checking the status of atl1e_write_phy_reg
tipc: fix a missing check of genlmsg_put
net: marvell: fix a missing check of acpi_match_device
net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe()
ocfs2: clear journal dirty flag after shutdown journal
vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n
mm/page_alloc.c: free order-0 pages through PCP in page_frag_free()
mm/page_alloc.c: use a single function to free page
mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free()
tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures
netfilter: nf_tables: fix a missing check of nla_put_failure
xprtrdma: Prevent leak of rpcrdma_rep objects
infiniband: bnxt_re: qplib: Check the return value of send_message
infiniband/qedr: Potential null ptr dereference of qp
firmware: arm_sdei: fix wrong of_node_put() in init function
firmware: arm_sdei: Fix DT platform device creation
lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk
lib/genalloc.c: use vzalloc_node() to allocate the bitmap
fork: fix some -Wmissing-prototypes warnings
drivers/base/platform.c: kmemleak ignore a known leak
lib/genalloc.c: include vmalloc.h
mtd: Check add_mtd_device() ret code
tipc: fix memory leak in tipc_nl_compat_publ_dump
net/core/neighbour: tell kmemleak about hash tables
ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs
PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
net/core/neighbour: fix kmemleak minimal reference count for hash tables
serial: 8250: Fix serial8250 initialization crash
gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change
sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
decnet: fix DN_IFREQ_SIZE
net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
net/smc: don't wait for send buffer space when data was already sent
mm/hotplug: invalid PFNs from pfn_to_online_page()
xfs: end sync buffer I/O properly on shutdown error
net/smc: fix sender_free computation
blktrace: Show requests without sector
net/smc: fix byte_order for rx_curs_confirmed
tipc: fix skb may be leaky in tipc_link_input
ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI
sfc: initialise found bitmap in efx_ef10_mtd_probe
geneve: change NET_UDP_TUNNEL dependency to select
net: fix possible overflow in __sk_mem_raise_allocated()
net: ip_gre: do not report erspan_ver for gre or gretap
net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap
sctp: don't compare hb_timer expire date before starting it
bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()
mmc: core: align max segment size with logical block size
net: dev: Use unsigned integer as an argument to left-shift
kvm: properly check debugfs dentry before using it
bpf: drop refcount if bpf_map_new_fd() fails in map_create()
net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED
net: hns3: fix PFC not setting problem for DCB module
net: hns3: fix an issue for hclgevf_ae_get_hdev
net: hns3: fix an issue for hns3_update_new_int_gl
iommu/amd: Fix NULL dereference bug in match_hid_uid
apparmor: delete the dentry in aafs_remove() to avoid a leak
scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery
ACPI / APEI: Don't wait to serialise with oops messages when panic()ing
ACPI / APEI: Switch estatus pool to use vmalloc memory
scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned
scsi: libsas: Check SMP PHY control function result
RDMA/hns: Fix the bug with updating rq head pointer when flush cqe
RDMA/hns: Bugfix for the scene without receiver queue
RDMA/hns: Fix the state of rereg mr
RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board
powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property()
xdp: fix cpumap redirect SKB creation bug
mtd: Remove a debug trace in mtdpart.c
mm, gup: add missing refcount overflow checks on s390
clk: at91: fix update bit maps on CFG_MOR write
clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated()
usb: dwc2: use a longer core rest timeout in dwc2_core_reset()
staging: rtl8192e: fix potential use after free
staging: rtl8723bs: Drop ACPI device ids
staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids
USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P
mei: bus: prefix device names on bus with the bus name
mei: me: add comet point V device id
thunderbolt: Power cycle the router if NVM authentication fails
xfrm: Fix memleak on xfrm state destroy
media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE
net: macb: fix error format in dev_err()
pwm: Clear chip_data in pwm_put()
media: atmel: atmel-isc: fix asd memory allocation
media: atmel: atmel-isc: fix INIT_WORK misplacement
macvlan: schedule bc_work even if error
net: psample: fix skb_over_panic
openvswitch: fix flow command message size
sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook
slip: Fix use-after-free Read in slip_open
openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
openvswitch: remove another BUG_ON()
selftests: bpf: test_sockmap: handle file creation failures gracefully
tipc: fix link name length check
sctp: cache netns in sctp_ep_common
net: sched: fix `tc -s class show` no bstats on class with nolock subqueues
net: macb: add missed tasklet_kill
ext4: add more paranoia checking in ext4_expand_extra_isize handling
watchdog: sama5d4: fix WDD value to be always set to max
net: macb: Fix SUBNS increment and increase resolution
net: macb driver, check for SKBTX_HW_TSTAMP
mtd: rawnand: atmel: Fix spelling mistake in error message
mtd: rawnand: atmel: fix possible object reference leak
mtd: spi-nor: cast to u64 to avoid uint overflows
drm/atmel-hlcdc: revert shift by 8
mailbox: stm32_ipcc: add spinlock to fix channels concurrent access
tcp: exit if nothing to retransmit on RTO timeout
HID: core: check whether Usage Page item is after Usage ID items
crypto: stm32/hash - Fix hmac issue more than 256 bytes
media: stm32-dcmi: fix DMA corruption when stopping streaming
media: stm32-dcmi: fix check of pm_runtime_get_sync return value
hwrng: stm32 - fix unbalanced pm_runtime_enable
clk: stm32mp1: fix HSI divider flag
clk: stm32mp1: fix mcu divider table
clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks
clk: stm32mp1: parent clocks update
mailbox: mailbox-test: fix null pointer if no mmio
pinctrl: stm32: fix memory leak issue
ASoC: stm32: i2s: fix dma configuration
ASoC: stm32: i2s: fix 16 bit format support
ASoC: stm32: i2s: fix IRQ clearing
ASoC: stm32: sai: add missing put_device()
dmaengine: stm32-dma: check whether length is aligned on FIFO threshold
platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer
platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size
net: fec: fix clock count mis-match
Linux 4.19.88
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifd3801a77cb551be72788031e7fcfc8a1d4fd197
[ Upstream commit fb5bf31722 ]
We get a warning when building kernel with W=1:
kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]
Add the missing declaration in head file to fix this.
Also, remove arch_release_thread_stack() completely because no arch
seems to implement it since bb9d81264 (arch: remove tile port).
Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2grBgACgkQONu9yGCS
aT6xRBAA0pTW2W/VvzBHBLeVlmNtwQZb8x7civVb72iZkltKR9tTPim90PULpz/P
iO7kh8KqkgVUqdgBE0VzkHGWUSThggfSTQiqzCqOgTwV8WQWqSF8ET0HU8zbglYB
5pXSojoRYmurGVznd4Ll6aWa5brXIKwf1mDSrFHagOyOLxQmyggHaTRSLx36BSfj
gunE2ideB1oTaPmd/2aTI03CU3jRwXmowe8rZIDa8pJEpplZPFdk0YOPXg2t6uRI
bjJGO8bhfR/14r/3h76IwsEiVVXIcCeEVm0fos/H6NUypedfi7jlT0Ldzg1/zZti
mUMkbPGHcJbOWfBYPQq8xQzviCa+MFraA4Tek5h/Lf7kf3NpjE20AnH3pb9TaqQf
mJYUGziCoOOOz8k+0eNtIjIZiCysOnf9sI5rGhMYb9qfZoZGG6RiitqyVYNa+rzJ
wvIUQZ4vSnYmQMAXqxyayfSZvFbMxv6pAdeH0NrXVRgFF6dnKG9TSsCnIuQaJxAE
OQRaYEJktMUBs81hS0IjnJNDFLW3r++s87xEYvCt4L7XGSrxMJ3jW6xLZlmET68G
4UIddJ81zIuqpGY1qoWdWZAp3nfRfSX4ehOnoNmIDyC9pRhiCKc+N6j5rX8gBNO/
SO8YOaNf9RTphhEG6Op7u4ZbU+UR4pYP+rjKveyT2HKPH6D/Tv0=
=wt6H
-----END PGP SIGNATURE-----
Merge 4.19.79 into android-4.19
Changes in 4.19.79
s390/process: avoid potential reading of freed stack
KVM: s390: Test for bad access register and size at the start of S390_MEM_OP
s390/topology: avoid firing events before kobjs are created
s390/cio: exclude subchannels with no parent from pseudo check
KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts
KVM: PPC: Book3S HV: Check for MMU ready on piggybacked virtual cores
KVM: PPC: Book3S HV: Don't lose pending doorbell request on migration on P9
KVM: X86: Fix userspace set invalid CR4
KVM: nVMX: handle page fault in vmread fix
nbd: fix max number of supported devs
PM / devfreq: tegra: Fix kHz to Hz conversion
ASoC: Define a set of DAPM pre/post-up events
ASoC: sgtl5000: Improve VAG power and mute control
powerpc/mce: Fix MCE handling for huge pages
powerpc/mce: Schedule work from irq_work
powerpc/powernv: Restrict OPAL symbol map to only be readable by root
powerpc/powernv/ioda: Fix race in TCE level allocation
powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions
can: mcp251x: mcp251x_hw_reset(): allow more time after a reset
tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file
crypto: qat - Silence smp_processor_id() warning
crypto: skcipher - Unmap pages after an external error
crypto: cavium/zip - Add missing single_release()
crypto: caam - fix concurrency issue in givencrypt descriptor
crypto: ccree - account for TEE not ready to report
crypto: ccree - use the full crypt length value
MIPS: Treat Loongson Extensions as ASEs
power: supply: sbs-battery: use correct flags field
power: supply: sbs-battery: only return health when battery present
tracing: Make sure variable reference alias has correct var_ref_idx
usercopy: Avoid HIGHMEM pfn warning
timer: Read jiffies once when forwarding base clk
PCI: vmd: Fix shadow offsets to reflect spec changes
PCI: Restore Resizable BAR size bits correctly for 1MB BARs
watchdog: imx2_wdt: fix min() calculation in imx2_wdt_set_timeout
perf stat: Fix a segmentation fault when using repeat forever
drm/omap: fix max fclk divider for omap36xx
drm/msm/dsi: Fix return value check for clk_get_parent
drm/nouveau/kms/nv50-: Don't create MSTMs for eDP connectors
drm/i915/gvt: update vgpu workload head pointer correctly
mmc: sdhci: improve ADMA error reporting
mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
xen/xenbus: fix self-deadlock after killing user process
ieee802154: atusb: fix use-after-free at disconnect
s390/cio: avoid calling strlen on null pointer
cfg80211: initialize on-stack chandefs
arm64: cpufeature: Detect SSBS and advertise to userspace
ima: always return negative code for error
ima: fix freeing ongoing ahash_request
fs: nfs: Fix possible null-pointer dereferences in encode_attrs()
9p: Transport error uninitialized
9p: avoid attaching writeback_fid on mmap with type PRIVATE
xen/pci: reserve MCFG areas earlier
ceph: fix directories inode i_blkbits initialization
ceph: reconnect connection if session hang in opening state
watchdog: aspeed: Add support for AST2600
netfilter: nf_tables: allow lookups in dynamic sets
drm/amdgpu: Fix KFD-related kernel oops on Hawaii
drm/amdgpu: Check for valid number of registers to read
pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
pwm: stm32-lp: Add check in case requested period cannot be achieved
x86/purgatory: Disable the stackleak GCC plugin for the purgatory
ntb: point to right memory window index
thermal: Fix use-after-free when unregistering thermal zone device
thermal_hwmon: Sanitize thermal_zone type
libnvdimm/region: Initialize bad block for volatile namespaces
fuse: fix memleak in cuse_channel_open
libnvdimm/nfit_test: Fix acpi_handle redefinition
sched/membarrier: Call sync_core only before usermode for same mm
sched/membarrier: Fix private expedited registration check
sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
perf build: Add detection of java-11-openjdk-devel package
kernel/elfcore.c: include proper prototypes
perf unwind: Fix libunwind build failure on i386 systems
nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed
KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP
KVM: nVMX: Fix consistency check on injected exception error code
nbd: fix crash when the blksize is zero
powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()
powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag
tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure
tick: broadcast-hrtimer: Fix a race in bc_set_next
perf tools: Fix segfault in cpu_cache_level__read()
perf stat: Reset previous counts on repeat with interval
riscv: Avoid interrupts being erroneously enabled in handle_exception()
arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3
KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe
arm64: docs: Document SSBS HWCAP
arm64: fix SSBS sanitization
arm64: Add sysfs vulnerability show for spectre-v1
arm64: add sysfs vulnerability show for meltdown
arm64: enable generic CPU vulnerabilites support
arm64: Always enable ssb vulnerability detection
arm64: Provide a command line to disable spectre_v2 mitigation
arm64: Advertise mitigation of Spectre-v2, or lack thereof
arm64: Always enable spectre-v2 vulnerability detection
arm64: add sysfs vulnerability show for spectre-v2
arm64: add sysfs vulnerability show for speculative store bypass
arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSB
arm64: Force SSBS on context switch
arm64: Use firmware to detect CPUs that are not affected by Spectre-v2
arm64/speculation: Support 'mitigations=' cmdline option
vfs: Fix EOVERFLOW testing in put_compat_statfs64
coresight: etm4x: Use explicit barriers on enable/disable
staging: erofs: fix an error handling in erofs_readdir()
staging: erofs: some compressed cluster should be submitted for corrupted images
staging: erofs: add two missing erofs_workgroup_put for corrupted images
staging: erofs: detect potential multiref due to corrupted images
cfg80211: add and use strongly typed element iteration macros
cfg80211: Use const more consistently in for_each_element macros
nl80211: validate beacon head
Linux 4.19.79
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie4f85994b5f3e53658c42833d0dc712575d0902e
[ Upstream commit 2840cf02fa ]
When the prev and next task's mm change, switch_mm() provides the core
serializing guarantees before returning to usermode. The only case
where an explicit core serialization is needed is when the scheduler
keeps the same mm for prev and next.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-4-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1GibIACgkQONu9yGCS
aT7z2hAAmv8AsH9IG43m7t6zLroJVswr/9594xk7yPBQgcY3/PW2aTFBCFbsdOL4
yXcj2PSwRiq9K6qAJULrvOvncR9fIILHqzWzyXnoaZ30lR/FxaaFmuHZX/5Ix1tB
e5EEE/EA49UAEjEDaMLq8g2IvibsReDxmSpnXyBJWoyRAdFIElVnMJ2+zvP/wRhF
NKzQj/bj/qecCbis2lUCaVWJFZ6+P/52UbD8lvIwqR3nk2TKsGDcLU6eY3yg4KrB
rEHl5T8KIPrkX3KNIEB8EcFREene+rdpZLLVe4fYwf+gOqfiFXSzZZvweauMkplq
ehlVHkykvQvlsVM2tjBD379z3C4aasZDuMVNMCbAy2FlruLeBQ7gEn77mCJB9VH5
/n/mlc2yizdoowtARCLWOUMfASpdSbqu2SQ7A/3kwG7l6GrpzKSIU2nQgm+41sUZ
QJVtZ3IYsPoYjnU4B3JZzgJnf3M9jcRz/3JegviqhSEbF1gaScJX0cqN8C1idN/v
ZAGCJK9S20/EEEsp5jn+bq2grUehvmD4TVDfot4P+5yRYyBIhMFpbM2RpjydOpwy
+x8D1Q34LYPFgZfQ0vF62vcSBhMBiJ/7j41rUeo44K+Lg00F3yCOyL6FxK6S8h6j
wsD0xLbllMrhV5KRYFizb3QbCHoHYiROIJk76uLvB+Tqq2Jg9VQ=
=qIi2
-----END PGP SIGNATURE-----
Merge 4.19.64 into android-4.19
Changes in 4.19.64
hv_sock: Add support for delayed close
vsock: correct removal of socket from the list
NFS: Fix dentry revalidation on NFSv4 lookup
NFS: Refactor nfs_lookup_revalidate()
NFSv4: Fix lookup revalidate of regular files
usb: dwc2: Disable all EP's on disconnect
usb: dwc2: Fix disable all EP's on disconnect
arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
binder: fix possible UAF when freeing buffer
ISDN: hfcsusb: checking idx of ep configuration
media: au0828: fix null dereference in error path
ath10k: Change the warning message string
media: cpia2_usb: first wake up, then free in disconnect
media: pvrusb2: use a different format for warnings
NFS: Cleanup if nfs_match_client is interrupted
media: radio-raremono: change devm_k*alloc to k*alloc
iommu/vt-d: Don't queue_iova() if there is no flush queue
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
Bluetooth: hci_uart: check for missing tty operations
vhost: introduce vhost_exceeds_weight()
vhost_net: fix possible infinite loop
vhost: vsock: add weight support
vhost: scsi: add weight support
sched/fair: Don't free p->numa_faults with concurrent readers
sched/fair: Use RCU accessors consistently for ->numa_group
/proc/<pid>/cmdline: remove all the special cases
/proc/<pid>/cmdline: add back the setproctitle() special case
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
Fix allyesconfig output.
ceph: hold i_ceph_lock when removing caps for freeing inode
block, scsi: Change the preempt-only flag into a counter
scsi: core: Avoid that a kernel warning appears during system resume
ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
Linux 4.19.64
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3e9055b677bd8ad9d5070307fae0bc765d444e9d
commit 16d51a590a upstream.
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.
During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.
Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0Nx3oACgkQONu9yGCS
aT7hJRAAzdu1EKr/VUiFJshvPL/+veH1MLdMgafW6gXcoQCQSdMuv1j3mv3axElC
dz2e/nHXvIhMKMrhzgEvVwV8m8eKPwM4IZjTJ8ji8st9uy0R2UIdbPUHrLtRAQzI
bK2BIk6697B9/9z2s8nDXhKex9EtZ2vj8DeQLT1AgQKMM0+abPA2YR9+cp9HEjQR
Wa5NjajwWgpty1VVk+Q6GMD+GBnEILyqxtVGv30z/prWoVadFAXJEZrWsaXotUvh
ySc2CS9Iu/muiIegTSStnNXaWaawZblA4JdDcdRJ235rPK8sbACYtDgZispYopi4
y/9kUtuZzWEv61aFZkXd13jKZ8pJsrLZLoc+yDqew0IA6M2Hz7d7Pq/UWQ0m4qnt
rCcU1vTFooSp/IHOToXN0oQQrTqD2oK5zO4V9S5McwQniuO/RqUPJfX/sKjtrvNT
SI/yLVKMXImAwVXxNEfO4bE+T9F/QYptOHdpfqJkPvpKLzjxnPBPswG8poIKwQzJ
w3p1AoQ1ISuW6b94a/nI5nSC28gVslVg+oTjRjF7kfIcOtvglX79XPvfdM5bB2DC
qD51A8veEGCtAu57/tSyisGOomqTqqqbaiYXwuhgTI1NSszbqKDLNvjsw1GKj0rK
mSmDzVMgQhxaHOagOeDqLYlj1zgTamx5R6+dretf8AOXyEE2RSg=
=+MJy
-----END PGP SIGNATURE-----
Merge 4.19.54 into android-4.19
Changes in 4.19.54
ax25: fix inconsistent lock state in ax25_destroy_timer
be2net: Fix number of Rx queues used for flow hashing
hv_netvsc: Set probe mode to sync
ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero
lapb: fixed leak of control-blocks.
neigh: fix use-after-free read in pneigh_get_next
net: dsa: rtl8366: Fix up VLAN filtering
net: openvswitch: do not free vport if register_netdevice() is failed.
nfc: Ensure presence of required attributes in the deactivate_target handler
sctp: Free cookie before we memdup a new one
sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg
tipc: purge deferredq list for each grp member in tipc_group_delete
vsock/virtio: set SOCK_DONE on peer shutdown
net/mlx5: Avoid reloading already removed devices
net: mvpp2: prs: Fix parser range for VID filtering
net: mvpp2: prs: Use the correct helpers when removing all VID filters
Staging: vc04_services: Fix a couple error codes
perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
netfilter: nf_queue: fix reinject verdict handling
ipvs: Fix use-after-free in ip_vs_in
selftests: netfilter: missing error check when setting up veth interface
clk: ti: clkctrl: Fix clkdm_clk handling
powerpc/powernv: Return for invalid IMC domain
usb: xhci: Fix a potential null pointer dereference in xhci_debugfs_create_endpoint()
mISDN: make sure device name is NUL terminated
x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
perf/ring_buffer: Fix exposing a temporarily decreased data_head
perf/ring_buffer: Add ordering to rb->nest increment
perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data
gpio: fix gpio-adp5588 build errors
net: stmmac: update rx tail pointer register to fix rx dma hang issue.
net: tulip: de4x5: Drop redundant MODULE_DEVICE_TABLE()
ACPI/PCI: PM: Add missing wakeup.flags.valid checks
drm/etnaviv: lock MMU while dumping core
net: aquantia: tx clean budget logic error
net: aquantia: fix LRO with FCS error
i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr
ALSA: hda - Force polling mode on CNL for fixing codec communication
configfs: Fix use-after-free when accessing sd->s_dentry
perf data: Fix 'strncat may truncate' build failure with recent gcc
perf namespace: Protect reading thread's namespace
perf record: Fix s390 missing module symbol and warning for non-root users
ia64: fix build errors by exporting paddr_to_nid()
xen/pvcalls: Remove set but not used variable
xenbus: Avoid deadlock during suspend due to open transactions
KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
arm64: fix syscall_fn_t type
arm64: use the correct function type in SYSCALL_DEFINE0
arm64: use the correct function type for __arm64_sys_ni_syscall
net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs
net: phylink: ensure consistent phy interface mode
net: phy: dp83867: Set up RGMII TX delay
scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()
scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask
scsi: scsi_dh_alua: Fix possible null-ptr-deref
scsi: libsas: delete sas port if expander discover failed
mlxsw: spectrum: Prevent force of 56G
ocfs2: fix error path kobject memory leak
coredump: fix race condition between collapse_huge_page() and core dumping
Abort file_remove_privs() for non-reg. files
Linux 4.19.54
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 59ea6d06cf upstream.
When fixing the race conditions between the coredump and the mmap_sem
holders outside the context of the process, we focused on
mmget_not_zero()/get_task_mm() callers in 04f5866e41 ("coredump: fix
race condition between mmget_not_zero()/get_task_mm() and core
dumping"), but those aren't the only cases where the mmap_sem can be
taken outside of the context of the process as Michal Hocko noticed
while backporting that commit to older -stable kernels.
If mmgrab() is called in the context of the process, but then the
mm_count reference is transferred outside the context of the process,
that can also be a problem if the mmap_sem has to be taken for writing
through that mm_count reference.
khugepaged registration calls mmgrab() in the context of the process,
but the mmap_sem for writing is taken later in the context of the
khugepaged kernel thread.
collapse_huge_page() after taking the mmap_sem for writing doesn't
modify any vma, so it's not obvious that it could cause a problem to the
coredump, but it happens to modify the pmd in a way that breaks an
invariant that pmd_trans_huge_lock() relies upon. collapse_huge_page()
needs the mmap_sem for writing just to block concurrent page faults that
call pmd_trans_huge_lock().
Specifically the invariant that "!pmd_trans_huge()" cannot become a
"pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.
The coredump will call __get_user_pages() without mmap_sem for reading,
which eventually can invoke a lockless page fault which will need a
functional pmd_trans_huge_lock().
So collapse_huge_page() needs to use mmget_still_valid() to check it's
not running concurrently with the coredump... as long as the coredump
can invoke page faults without holding the mmap_sem for reading.
This has "Fixes: khugepaged" to facilitate backporting, but in my view
it's more a bug in the coredump code that will eventually have to be
rewritten to stop invoking page faults without the mmap_sem for reading.
So the long term plan is still to drop all mmget_still_valid().
Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
Fixes: ba76149f47 ("thp: khugepaged")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fcfc2aa018 ]
There are a few system calls (pselect, ppoll, etc) which replace a task
sigmask while they are running in a kernel-space
When a task calls one of these syscalls, the kernel saves a current
sigmask in task->saved_sigmask and sets a syscall sigmask.
On syscall-exit-stop, ptrace traps a task before restoring the
saved_sigmask, so PTRACE_GETSIGMASK returns the syscall sigmask and
PTRACE_SETSIGMASK does nothing, because its sigmask is replaced by
saved_sigmask, when the task returns to user-space.
This patch fixes this problem. PTRACE_GETSIGMASK returns saved_sigmask
if it's set. PTRACE_SETSIGMASK drops the TIF_RESTORE_SIGMASK flag.
Link: http://lkml.kernel.org/r/20181120060616.6043-1-avagin@gmail.com
Fixes: 29000caecb ("ptrace: add ability to get/set signal-blocked mask")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzEBokACgkQONu9yGCS
aT7G7w/8C93URGM67H7ynkCHTo8y3hkRE2rUJPckJNdS+IJKuecmOphak4tF0h07
qPWDPya70Q1S0cNu661TuVAGrhmE5jBx8/xfZaAOeaaU0xtZive+TfSHdAQQaHct
tDk32O85N1aZ49rDEz9ibr7CGLVFDZtyhxV5gFMYQpjbqA7MzJC61zQg1jHyPSCz
sKjQzW+uXMuSLru8jXHMvp41K5sFFp5gYdQbAVKlWtt79qPxWdxZPJbLbM0LBbtz
XHt9E45Ink3ALF9P6tZ4e6gi4zzlNbh9yR92+X5NK5/8AP57yWba4W9JHWIfMBpC
yyDYTOEAzdxqa2Jrgwr4WTdKH6U7FbQZFmWfTBB4VotbHLBWkVXj0OnF10qxP9eQ
p5wGDTJAlWezhX1BTCfYroglDsvqhj+gHfwHzDRF1Del1dRgydRMQc0qLD1d9tul
ovzwOkx1xyJrM2wq05I5gc0FoVyOL6/KCwqMrpVfKa3WKY7Uttjgf56bMqdIIkns
i/6opzF+wtvwlLlCoXgYPXdm6kbWdgvS+skVHfWcHmZFMuGrFGGzJNwzXb7qnVjK
T0hD1OestsfTyD/amnDNYkNeCkoOZqtHAi+xYOQR4kGY5cxP1lQJf85MgAy6RZSY
h+rjys76Qf6+hTCtrowLr8SgksX4ACWxm+UarfAiiNnnDXwGfu8=
=SrFV
-----END PGP SIGNATURE-----
Merge 4.19.37 into android-4.19
Changes in 4.19.37
bonding: fix event handling for stacked bonds
failover: allow name change on IFF_UP slave interfaces
net: atm: Fix potential Spectre v1 vulnerabilities
net: bridge: fix per-port af_packet sockets
net: bridge: multicast: use rcu to access port list from br_multicast_start_querier
net: Fix missing meta data in skb with vlan packet
net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv
tcp: tcp_grow_window() needs to respect tcp_space()
team: set slave to promisc if team is already in promisc mode
tipc: missing entries in name table of publications
vhost: reject zero size iova range
ipv4: recompile ip options in ipv4_link_failure
ipv4: ensure rcu_read_lock() in ipv4_link_failure()
net: thunderx: raise XDP MTU to 1508
net: thunderx: don't allow jumbo frames with XDP
net/mlx5: FPGA, tls, hold rcu read lock a bit longer
net/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded()
net/mlx5: FPGA, tls, idr remove on flow delete
route: Avoid crash from dereferencing NULL rt->from
sch_cake: Use tc_skb_protocol() helper for getting packet protocol
sch_cake: Make sure we can write the IP header before changing DSCP bits
nfp: flower: replace CFI with vlan present
nfp: flower: remove vlan CFI bit from push vlan action
sch_cake: Simplify logic in cake_select_tin()
net: IP defrag: encapsulate rbtree defrag code into callable functions
net: IP6 defrag: use rbtrees for IPv6 defrag
net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c
CIFS: keep FileInfo handle live during oplock break
cifs: Fix use-after-free in SMB2_write
cifs: Fix use-after-free in SMB2_read
cifs: fix handle leak in smb2_query_symlink()
KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
KVM: x86: svm: make sure NMI is injected after nmi_singlestep
Staging: iio: meter: fixed typo
staging: iio: ad7192: Fix ad7193 channel address
iio: gyro: mpu3050: fix chip ID reading
iio/gyro/bmg160: Use millidegrees for temperature scale
iio:chemical:bme680: Fix, report temperature in millidegrees
iio:chemical:bme680: Fix SPI read interface
iio: cros_ec: Fix the maths for gyro scale calculation
iio: ad_sigma_delta: select channel when reading register
iio: dac: mcp4725: add missing powerdown bits in store eeprom
iio: Fix scan mask selection
iio: adc: at91: disable adc channel interrupt in timeout case
iio: core: fix a possible circular locking dependency
io: accel: kxcjk1013: restore the range after resume.
staging: most: core: use device description as name
staging: comedi: vmk80xx: Fix use of uninitialized semaphore
staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf
staging: comedi: ni_usb6501: Fix use of uninitialized mutex
staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf
ALSA: hda/realtek - add two more pin configuration sets to quirk table
ALSA: core: Fix card races between register and disconnect
Input: elan_i2c - add hardware ID for multiple Lenovo laptops
serial: sh-sci: Fix HSCIF RX sampling point adjustment
serial: sh-sci: Fix HSCIF RX sampling point calculation
vt: fix cursor when clearing the screen
scsi: core: set result when the command cannot be dispatched
Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO"
Revert "svm: Fix AVIC incomplete IPI emulation"
coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier
crypto: x86/poly1305 - fix overflow during partial reduction
drm/ttm: fix out-of-bounds read in ttm_put_pages() v2
arm64: futex: Restore oldval initialization to work around buggy compilers
x86/kprobes: Verify stack frame on kretprobe
kprobes: Mark ftrace mcount handler functions nokprobe
kprobes: Fix error check when reusing optimized probes
rt2x00: do not increment sequence number while re-transmitting
mac80211: do not call driver wake_tx_queue op during reconfig
drm/amdgpu/gmc9: fix VM_L2_CNTL3 programming
perf/x86/amd: Add event map for AMD Family 17h
x86/cpu/bugs: Use __initconst for 'const' init data
perf/x86: Fix incorrect PEBS_REGS
x86/speculation: Prevent deadlock on ssb_state::lock
timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze()
nfit/ars: Remove ars_start_flags
nfit/ars: Introduce scrub_flags
nfit/ars: Allow root to busy-poll the ARS state machine
nfit/ars: Avoid stale ARS results
mmc: sdhci: Fix data command CRC error handling
mmc: sdhci: Rename SDHCI_ACMD12_ERR and SDHCI_INT_ACMD12ERR
mmc: sdhci: Handle auto-command errors
modpost: file2alias: go back to simple devtable lookup
modpost: file2alias: check prototype of handler
tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete
tpm: Fix the type of the return value in calc_tpm2_event_size()
Revert "kbuild: use -Oz instead of -Os when using clang"
sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
device_cgroup: fix RCU imbalance in error case
mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
ALSA: info: Fix racy addition/deletion of nodes
percpu: stop printing kernel addresses
tools include: Adopt linux/bits.h
ASoC: rockchip: add missing INTERLEAVED PCM attribute
i2c-hid: properly terminate i2c_hid_dmi_desc_override_table[] array
Revert "locking/lockdep: Add debug_locks check in __lock_downgrade()"
kernel/sysctl.c: fix out-of-bounds access when setting file-max
Linux 4.19.37
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 04f5866e41 upstream.
The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it. Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier. For example in Hugh's post from Jul 2017:
https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
"Not strictly relevant here, but a related note: I was very surprised
to discover, only quite recently, how handle_mm_fault() may be called
without down_read(mmap_sem) - when core dumping. That seems a
misguided optimization to me, which would also be nice to correct"
In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.
Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.
Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.
For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs. Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.
Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.
In order to facilitate the backporting I added "Fixes: 86039bd3b4"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.
Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.
Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlynu40ACgkQONu9yGCS
aT5X6g//Wkfm/+qSZ0GhLDQkPniiH1QkvzhOmVrrxu+KB0qsiwsEl8Srw33ZVkJK
LT8+IPGiG9jEGu9dj+BYXTIfy9ZvfSsEL2N6GhYwDSXP0fok2rUaHbZvv1IB2g4W
afhGdNwNAUCJ/j1UrUsi+SAFJ+xWbVxFpGstd0cqM9IbKdEV7RIukvuKckHiKOKR
qI8FxC+G2PAr+BtnETfk5/suPDJ7B3ZicDoMhiWJGxJ6dfFTVmkSmasSoPDaMiHm
4S3hN2lu+WTeRpRPPB17Dlk4MmIp0k+bGYBKAlaxAMCc/RZxvbT2pRYaMQbId2/L
mNUfSnOQFGEAhlAPfb7wdbObphnyT34GhlkWfZBTrnhPO0/FomLOvU6xVdcNuakX
Tv2JKfDzb+2ttcMZ+0T84Ru9RztoswFATSw8uFMVxW8oTS6MVWnHu96Kxfl7QO3J
PdlIGcyqxSuWNE8OX1QVtdSruGZfwUDNs94S4nQJtkB8BViRwhGJlqaXuy4d9Wp6
fGlI2W6qhjyosi2wBSMTjh/ytk/jq0vfs+z2XjR2gAYssvB/SOLR/AlSVguWsDnf
WaoFBkXvCbuPvPlo0TrLpl5RW5WlOtLXHE3Vr3dKp458wLwpf/OZBGoZiknp7DrF
PzBZs2ie5tmyqTxbAygl7WkbQPJ682pd5R4nf5CY+zvUaOMZv1g=
=Iuup
-----END PGP SIGNATURE-----
Merge 4.19.34 into android-4.19
Changes in 4.19.34
arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
ext4: cleanup bh release code in ext4_ind_remove_space()
tty/serial: atmel: Add is_half_duplex helper
tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
CIFS: fix POSIX lock leak and invalid ptr deref
h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
tracing: kdb: Fix ftdump to not sleep
net/mlx5: Avoid panic when setting vport rate
net/mlx5: Avoid panic when setting vport mac, getting vport config
gpio: gpio-omap: fix level interrupt idling
include/linux/relay.h: fix percpu annotation in struct rchan
sysctl: handle overflow for file-max
net: stmmac: Avoid sometimes uninitialized Clang warnings
enic: fix build warning without CONFIG_CPUMASK_OFFSTACK
libbpf: force fixdep compilation at the start of the build
scsi: hisi_sas: Set PHY linkrate when disconnected
scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO
iio: adc: fix warning in Qualcomm PM8xxx HK/XOADC driver
x86/hyperv: Fix kernel panic when kexec on HyperV
perf c2c: Fix c2c report for empty numa node
mm/sparse: fix a bad comparison
mm/cma.c: cma_declare_contiguous: correct err handling
mm/page_ext.c: fix an imbalance with kmemleak
mm, swap: bounds check swap_info array accesses to avoid NULL derefs
mm,oom: don't kill global init via memory.oom.group
memcg: killed threads should not invoke memcg OOM killer
mm, mempolicy: fix uninit memory access
mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
mm/slab.c: kmemleak no scan alien caches
ocfs2: fix a panic problem caused by o2cb_ctl
f2fs: do not use mutex lock in atomic context
fs/file.c: initialize init_files.resize_wait
page_poison: play nicely with KASAN
cifs: use correct format characters
dm thin: add sanity checks to thin-pool and external snapshot creation
f2fs: fix to check inline_xattr_size boundary correctly
cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED
cifs: Fix NULL pointer dereference of devname
netfilter: nf_tables: check the result of dereferencing base_chain->stats
netfilter: conntrack: tcp: only close if RST matches exact sequence
jbd2: fix invalid descriptor block checksum
fs: fix guard_bio_eod to check for real EOD errors
tools lib traceevent: Fix buffer overflow in arg_eval
PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()
wil6210: check null pointer in _wil_cfg80211_merge_extra_ies
mt76: fix a leaked reference by adding a missing of_node_put
crypto: crypto4xx - add missing of_node_put after of_device_is_available
crypto: cavium/zip - fix collision with generic cra_driver_name
usb: chipidea: Grab the (legacy) USB PHY by phandle first
powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables
scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc
coresight: etm4x: Add support to enable ETMv4.2
serial: 8250_pxa: honor the port number from devicetree
ARM: 8840/1: use a raw_spinlock_t in unwind
iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables
powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback
btrfs: qgroup: Make qgroup async transaction commit more aggressive
mmc: omap: fix the maximum timeout setting
net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat
e1000e: Fix -Wformat-truncation warnings
mlxsw: spectrum: Avoid -Wformat-truncation warnings
platform/x86: ideapad-laptop: Fix no_hw_rfkill_list for Lenovo RESCUER R720-15IKBN
platform/mellanox: mlxreg-hotplug: Fix KASAN warning
loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
IB/mlx4: Increase the timeout for CM cache
clk: fractional-divider: check parent rate only if flag is set
perf annotate: Fix getting source line failure
ASoC: qcom: Fix of-node refcount unbalance in qcom_snd_parse_of()
cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies
efi: cper: Fix possible out-of-bounds access
s390/ism: ignore some errors during deregistration
scsi: megaraid_sas: return error when create DMA pool failed
scsi: fcoe: make use of fip_mode enum complete
drm/amd/display: Clear stream->mode_changed after commit
perf test: Fix failure of 'evsel-tp-sched' test on s390
mwifiex: don't advertise IBSS features without FW support
perf report: Don't shadow inlined symbol with different addr range
SoC: imx-sgtl5000: add missing put_device()
media: ov7740: fix runtime pm initialization
media: sh_veu: Correct return type for mem2mem buffer helpers
media: s5p-jpeg: Correct return type for mem2mem buffer helpers
media: rockchip/rga: Correct return type for mem2mem buffer helpers
media: s5p-g2d: Correct return type for mem2mem buffer helpers
media: mx2_emmaprp: Correct return type for mem2mem buffer helpers
media: mtk-jpeg: Correct return type for mem2mem buffer helpers
mt76: usb: do not run mt76u_queues_deinit twice
xen/gntdev: Do not destroy context while dma-bufs are in use
vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
HID: intel-ish-hid: avoid binding wrong ishtp_cl_device
cgroup, rstat: Don't flush subtree root unless necessary
jbd2: fix race when writing superblock
leds: lp55xx: fix null deref on firmware load failure
perf report: Add s390 diagnosic sampling descriptor size
iwlwifi: pcie: fix emergency path
ACPI / video: Refactor and fix dmi_is_desktop()
selftests: skip seccomp get_metadata test if not real root
kprobes: Prohibit probing on bsearch()
kprobes: Prohibit probing on RCU debug routine
netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm
ARM: 8833/1: Ensure that NEON code always compiles with Clang
ARM: dts: meson8b: fix the Ethernet data line signals in eth_rgmii_pins
ALSA: PCM: check if ops are defined before suspending PCM
ath10k: fix shadow register implementation for WCN3990
usb: f_fs: Avoid crash due to out-of-scope stack ptr access
sched/topology: Fix percpu data types in struct sd_data & struct s_data
bcache: fix input overflow to cache set sysfs file io_error_halflife
bcache: fix input overflow to sequential_cutoff
bcache: fix potential div-zero error of writeback_rate_i_term_inverse
bcache: improve sysfs_strtoul_clamp()
genirq: Avoid summation loops for /proc/stat
net: marvell: mvpp2: fix stuck in-band SGMII negotiation
iw_cxgb4: fix srqidx leak during connection abort
net: phy: consider latched link-down status in polling mode
fbdev: fbmem: fix memory access if logo is bigger than the screen
cdrom: Fix race condition in cdrom_sysctl_register
drm: rcar-du: add missing of_node_put
drm/amd/display: Don't re-program planes for DPMS changes
drm/amd/display: Disconnect mpcc when changing tg
perf/aux: Make perf_event accessible to setup_aux()
e1000e: fix cyclic resets at link up with active tx
e1000e: Exclude device from suspend direct complete optimization
platform/x86: intel_pmc_core: Fix PCH IP sts reading
i2c: of: Try to find an I2C adapter matching the parent
staging: spi: mt7621: Add return code check on device_reset()
iwlwifi: mvm: fix RFH config command with >=10 CPUs
ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe
sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
efi/memattr: Don't bail on zero VA if it equals the region's PA
sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
drm/vkms: Bugfix extra vblank frame
ARM: dts: lpc32xx: Remove leading 0x and 0s from bindings notation
efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted
soc: qcom: gsbi: Fix error handling in gsbi_probe()
mt7601u: bump supported EEPROM version
ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care of
ARM: avoid Cortex-A9 livelock on tight dmb loops
block, bfq: fix in-service-queue check for queue merging
bpf: fix missing prototype warnings
selftests/bpf: skip verifier tests for unsupported program types
powerpc/64s: Clear on-stack exception marker upon exception return
cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting
backlight: pwm_bl: Use gpiod_get_value_cansleep() to get initial state
tty: increase the default flip buffer limit to 2*640K
powerpc/pseries: Perform full re-add of CPU for topology update post-migration
drm/amd/display: Enable vblank interrupt during CRC capture
ALSA: dice: add support for Solid State Logic Duende Classic/Mini
usb: dwc3: gadget: Fix OTG events when gadget driver isn't loaded
platform/x86: intel-hid: Missing power button release on some Dell models
perf script python: Use PyBytes for attr in trace-event-python
perf script python: Add trace_context extension module to sys.modules
media: mt9m111: set initial frame size other than 0x0
hwrng: virtio - Avoid repeated init of completion
soc/tegra: fuse: Fix illegal free of IO base address
HID: intel-ish: ipc: handle PIMR before ish_wakeup also clear PISR busy_clear bit
f2fs: UBSAN: set boolean value iostat_enable correctly
hpet: Fix missing '=' character in the __setup() code of hpet_mmap_enable
cpu/hotplug: Mute hotplug lockdep during init
dmaengine: imx-dma: fix warning comparison of distinct pointer types
dmaengine: qcom_hidma: assign channel cookie correctly
dmaengine: qcom_hidma: initialize tx flags in hidma_prep_dma_*
netfilter: physdev: relax br_netfilter dependency
media: rcar-vin: Allow independent VIN link enablement
media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration
regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting
pinctrl: meson: meson8b: add the eth_rxd2 and eth_rxd3 pins
drm: Auto-set allow_fb_modifiers when given modifiers at plane init
drm/nouveau: Stop using drm_crtc_force_disable
x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
selinux: do not override context on context mounts
brcmfmac: Use firmware_request_nowarn for the clm_blob
wlcore: Fix memory leak in case wl12xx_fetch_firmware failure
x86/build: Mark per-CPU symbols as absolute explicitly for LLD
drm/fb-helper: fix leaks in error path of drm_fb_helper_fbdev_setup
clk: meson: clean-up clock registration
clk: rockchip: fix frac settings of GPLL clock for rk3328
dmaengine: tegra: avoid overflow of byte tracking
Input: soc_button_array - fix mapping of the 5th GPIO in a PNP0C40 device
drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers
net: stmmac: Avoid one more sometimes uninitialized Clang warning
ACPI / video: Extend chassis-type detection with a "Lunch Box" check
bcache: fix potential div-zero error of writeback_rate_p_term_inverse
kprobes/x86: Blacklist non-attachable interrupt functions
Linux 4.19.34
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 99687cdbb3 ]
The percpu members of struct sd_data and s_data are declared as:
struct ... ** __percpu member;
So their type is:
__percpu pointer to pointer to struct ...
But looking at how they're used, their type should be:
pointer to __percpu pointer to struct ...
and they should thus be declared as:
struct ... * __percpu *member;
So fix the placement of '__percpu' in the definition of these
structures.
This addresses a bunch of Sparse's warnings like:
warning: incorrect type in initializer (different address spaces)
expected void const [noderef] <asn:3> *__vpp_verify
got struct sched_domain **
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190118144936.79158-1-luc.vanoostenryck@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
There are several definitions of those functions/macros in places that
mess with fixed-point load averages. Provide an official version.
[akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8508cf3ffa)
Conflicts:
block/blk-iolatency.c
(1. manual merge to replace stat->rqs.mean with stat.mean)
Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I716b4874491cff75a2355c6d95c64cf02d05e7ee
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxbC5gACgkQONu9yGCS
aT4DYQ//Uqm/Q63KQuExgd7W+61FoP4NFHlYXZ31B5Rkydryyk2K5P2ONSdVd5n9
k3wjzRrxvlPvjOwbh9PHv+pLkGxBDqpT1X8IAXPe36bYUkvXoH71BE4YSRPRUJAf
sdzw/vs7WE/Kx41iT3SXiQih8ok0y3LoACBKmUsEXoLI1cJZCUnnSFpP++QNe1Iz
B/y04BigL8R7OWR/jow6OPWe9uXOI8iEe9QKVX26g4oaakzly4vkp6OwROSwM31q
0wut8jF/AtDcZpZXjJLjDCj10k5DRN8jwGcLD7iZeIKqexOabjUrsvfIHfbpUtXr
e7pJw2aUM8BFb8Ba2lsB7gkqvdHQohqVKQE4Qy59aPyesm2G5miH4gAbncoixjCa
u3eQV5ACpFLksUFR4RAMKq+10k7swsutyyJr5vG4qdbRpcTCNJirEwAGGqgI6IEP
SDqtw6u8gMP8+SicwA9p71Wwntcq9RR6fx0gX/3wi2DQp6F8Txem00SqaciE7uQ1
uIOUrhcpWzIq4m58SGhgTSQcBkm5qBD5S154/xRKIo0mvME+NwBub/x3fIsixN/u
AzWQmQPXBajHbYXbKGC7t2jNHkU5d9FedZ4iDmJk/+ZZsWyFByY1bH1cg4Qnq89e
tDxL114YmSujbZD/mFlbGWcqdmGNT355BmyetKDx6w0rNiU/RBU=
=oprJ
-----END PGP SIGNATURE-----
Merge 4.19.20 into android-4.19
Changes in 4.19.20
Fix "net: ipv4: do not handle duplicate fragments as overlapping"
drm/msm/gpu: fix building without debugfs
ipv6: Consider sk_bound_dev_if when binding a socket to an address
ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
ipvlan, l3mdev: fix broken l3s mode wrt local routes
l2tp: copy 4 more bytes to linear part if necessary
l2tp: fix reading optional fields of L2TPv3
net: ip_gre: always reports o_key to userspace
net: ip_gre: use erspan key field for tunnel lookup
net/mlx4_core: Add masking for a few queries on HCA caps
netrom: switch to sock timer API
net/rose: fix NULL ax25_cb kernel panic
net: set default network namespace in init_dummy_netdev()
ravb: expand rx descriptor data to accommodate hw checksum
sctp: improve the events for sctp stream reset
tun: move the call to tun_set_real_num_queues
ucc_geth: Reset BQL queue when stopping device
vhost: fix OOB in get_rx_bufs()
net: ip6_gre: always reports o_key to userspace
sctp: improve the events for sctp stream adding
net/mlx5e: Allow MAC invalidation while spoofchk is ON
ip6mr: Fix notifiers call on mroute_clean_tables()
Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
sctp: set chunk transport correctly when it's a new asoc
sctp: set flow sport from saddr only when it's 0
virtio_net: Don't enable NAPI when interface is down
virtio_net: Don't call free_old_xmit_skbs for xdp_frames
virtio_net: Fix not restoring real_num_rx_queues
virtio_net: Fix out of bounds access of sq
virtio_net: Don't process redirected XDP frames when XDP is disabled
virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs
virtio_net: Differentiate sk_buff and xdp_frame on freeing
CIFS: Do not count -ENODATA as failure for query directory
CIFS: Fix trace command logging for SMB2 reads and writes
CIFS: Do not consider -ENODATA as stat failure for reads
fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
selftests/seccomp: Enhance per-arch ptrace syscall skip tests
NFS: Fix up return value on fatal errors in nfs_page_async_flush()
ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
arm64: Do not issue IPIs for user executable ptes
arm64: hyp-stub: Forbid kprobing of the hyp-stub
arm64: hibernate: Clean the __hyp_text to PoC after resume
gpio: altera-a10sr: Set proper output level for direction_output
gpiolib: fix line event timestamps for nested irqs
gpio: pcf857x: Fix interrupts on multiple instances
gpio: sprd: Fix the incorrect data register
gpio: sprd: Fix incorrect irq type setting for the async EIC
gfs2: Revert "Fix loop in gfs2_rbm_find"
mmc: bcm2835: Fix DMA channel leak on probe error
mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay
ALSA: usb-audio: Add Opus #3 to quirks for native DSD support
ALSA: hda/realtek - Fixed hp_pin no value
IB/hfi1: Remove overly conservative VM_EXEC flag check
platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
Btrfs: fix deadlock when allocating tree block during leaf/node split
btrfs: On error always free subvol_name in btrfs_mount
kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
oom, oom_reaper: do not enqueue same task twice
mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
mm, oom: fix use-after-free in oom_kill_process
mm: hwpoison: use do_send_sig_info() instead of force_sig()
mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
of: Convert to using %pOFn instead of device_node.name
of: overlay: add tests to validate kfrees from overlay removal
of: overlay: add missing of_node_get() in __of_attach_node_sysfs
of: overlay: use prop add changeset entry for property in new nodes
of: overlay: do not duplicate properties from overlay for new nodes
md/raid5: fix 'out of memory' during raid cache recovery
cifs: Always resolve hostname before reconnecting
Linux 4.19.20
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 9bcdeb51bd upstream.
Arkadiusz reported that enabling memcg's group oom killing causes
strange memcg statistics where there is no task in a memcg despite the
number of tasks in that memcg is not 0. It turned out that there is a
bug in wake_oom_reaper() which allows enqueuing same task twice which
makes impossible to decrease the number of tasks in that memcg due to a
refcount leak.
This bug existed since the OOM reaper became invokable from
task_will_free_mem(current) path in out_of_memory() in Linux 4.7,
T1@P1 |T2@P1 |T3@P1 |OOM reaper
----------+----------+----------+------------
# Processing an OOM victim in a different memcg domain.
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
out_of_memory()
oom_kill_process(P1)
do_send_sig_info(SIGKILL, @P1)
mark_oom_victim(T1@P1)
wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
mutex_unlock(&oom_lock)
out_of_memory()
mark_oom_victim(T2@P1)
wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
mutex_unlock(&oom_lock)
out_of_memory()
mark_oom_victim(T1@P1)
wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
mutex_unlock(&oom_lock)
# Completed processing an OOM victim in a different memcg domain.
spin_lock(&oom_reaper_lock)
# T1P1 is dequeued.
spin_unlock(&oom_reaper_lock)
but memcg's group oom killing made it easier to trigger this bug by
calling wake_oom_reaper() on the same task from one out_of_memory()
request.
Fix this bug using an approach used by commit 855b018325 ("oom,
oom_reaper: disable oom_reaper for oom_kill_allocating_task"). As a
side effect of this patch, this patch also avoids enqueuing multiple
threads sharing memory via task_will_free_mem(current) path.
Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp
Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp
Fixes: af8e15cc85 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: Jay Kamat <jgkamat@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit b78eec5fb7. It has not
been accepted upstream and is of no particular interest in Android since
EAS is always enabled anyway.
Bug: 120440300
Change-Id: I7d1f2ad206c05041d386fc99f18e29c4fa56cbc8
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
The core EAS patches have now been accepted upstream. The patches used
in Android are based on a slightly earlier version of the series. In
order to reduce the delta with mainline and ease backports, align the
EAS code paths with their upstream version.
This basically applies the output of git range-diff on the appropriate
commits, and fixes a conflict in schedutil regarding the integration of
schedtune.
Bug: 120440300
Change-Id: I208ebeb4207e3f4f4bbb5103c606b293a464c20f
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwIG48ACgkQONu9yGCS
aT7g6Q//RkJ8ZWaRkykcCGaWIvwI6QF1tmKalIEWmToPdndDuQdUDGzWVwfE9G7P
yLcnp3GMlXo4F82BBwG8lFSAm9zaeqaLabnJnXbCc5mZ3xi/2aNqIGHzBY1isNZl
0fTzzcelnAKzjp0Aa/egRLOeraSLgVt/Cp7Ha3FXMP6RNxUMzs1pbQ2IFZ3m+P4G
CAD3Iye6geOaZTu/kXiiooUEUGFQFbV4c3AZ4VW7dZDdrG+ekwtF4YHtkEPseWJQ
Ugtrbr6S0IxYQ91o1Pk77kg4uwUFYo12jrk8Ni4gaPZE6mQCa08tr2Alg2oZkJGw
PdXnt2ASYGRWFYK2JAuTvKzhHrTEJYhiC323dKYCAx7BgfFaqdo5F20oNzYxXFBB
gGA3AzDDtLUD3OOO+lxrDxXMhpwXUx92WXsoJVsaSafdqIDAueq14sH19wqm0gUJ
D1fC2dWTsFrPZKjkU8Z6rJAyO1XZED55h7v1YlqAt2ibjCeDKpjnW3yvUt8Ivpqc
nlnmp8v/Yl2cdY55XtlgUadpknSc2jApFMwhSWetxAaqDCvha2dLQ28YMyPRJzat
ZHOkizM/VUntXvlUzFvVTsqLQiX0sfLG6MKcUkzWehPomNKT+B8XL1wtzytv9QXb
jOY8nRD5PiQo2p35cqdDCskBwqzEwY+WxDe7ji0yHZysBZLxoxQ=
=OiCf
-----END PGP SIGNATURE-----
Merge 4.19.7 into android-4.19
Changes in 4.19.7
mm/huge_memory: rename freeze_page() to unmap_page()
mm/huge_memory: splitting set mapping+index before unfreeze
mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
mm/khugepaged: collapse_shmem() stop if punched or truncated
mm/khugepaged: fix crashes due to misaccounted holes
mm/khugepaged: collapse_shmem() remember to clear holes
mm/khugepaged: minor reorderings in collapse_shmem()
mm/khugepaged: collapse_shmem() without freezing new_page
mm/khugepaged: collapse_shmem() do not crash on Compound
lan743x: Enable driver to work with LAN7431
lan743x: fix return value for lan743x_tx_napi_poll
net: don't keep lonely packets forever in the gro hash
net: gemini: Fix copy/paste error
net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
packet: copy user buffers before orphan or clone
rapidio/rionet: do not free skb before reading its length
s390/qeth: fix length check in SNMP processing
usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
net: skb_scrub_packet(): Scrub offload_fwd_mark
virtio-net: disable guest csum during XDP set
virtio-net: fail XDP set if guest csum is negotiated
net/dim: Update DIM start sample after each DIM iteration
tcp: defer SACK compression after DupThresh
net: phy: add workaround for issue where PHY driver doesn't bind to the device
tipc: fix lockdep warning during node delete
x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
x86/speculation: Propagate information about RSB filling mitigation to sysfs
x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support
x86/retpoline: Remove minimal retpoline support
x86/speculation: Update the TIF_SSBD comment
x86/speculation: Clean up spectre_v2_parse_cmdline()
x86/speculation: Remove unnecessary ret variable in cpu_show_common()
x86/speculation: Move STIPB/IBPB string conditionals out of cpu_show_common()
x86/speculation: Disable STIBP when enhanced IBRS is in use
x86/speculation: Rename SSBD update functions
x86/speculation: Reorganize speculation control MSRs update
sched/smt: Make sched_smt_present track topology
x86/Kconfig: Select SCHED_SMT if SMP enabled
sched/smt: Expose sched_smt_present static key
x86/speculation: Rework SMT state change
x86/l1tf: Show actual SMT state
x86/speculation: Reorder the spec_v2 code
x86/speculation: Mark string arrays const correctly
x86/speculataion: Mark command line parser data __initdata
x86/speculation: Unify conditional spectre v2 print functions
x86/speculation: Add command line control for indirect branch speculation
x86/speculation: Prepare for per task indirect branch speculation control
x86/process: Consolidate and simplify switch_to_xtra() code
x86/speculation: Avoid __switch_to_xtra() calls
x86/speculation: Prepare for conditional IBPB in switch_mm()
ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
x86/speculation: Split out TIF update
x86/speculation: Prevent stale SPEC_CTRL msr content
x86/speculation: Prepare arch_smt_update() for PRCTL mode
x86/speculation: Add prctl() control for indirect branch speculation
x86/speculation: Enable prctl mode for spectre_v2_user
x86/speculation: Add seccomp Spectre v2 user space protection mode
x86/speculation: Provide IBPB always command line options
userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
kvm: mmu: Fix race in emulated page table writes
kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb
KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset
KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall
KVM: LAPIC: Fix pv ipis use-before-initialization
KVM: X86: Fix scan ioapic use-before-initialization
KVM: VMX: re-add ple_gap module parameter
xtensa: enable coprocessors that are being flushed
xtensa: fix coprocessor context offset definitions
xtensa: fix coprocessor part of ptrace_{get,set}xregs
udf: Allow mounting volumes with incorrect identification strings
btrfs: Always try all copies when reading extent buffers
Btrfs: ensure path name is null terminated at btrfs_control_ioctl
Btrfs: fix rare chances for data loss when doing a fast fsync
Btrfs: fix race between enabling quotas and subvolume creation
btrfs: relocation: set trans to be NULL after ending transaction
PCI: layerscape: Fix wrong invocation of outbound window disable accessor
PCI: dwc: Fix MSI-X EP framework address calculation bug
PCI: Fix incorrect value returned from pcie_get_speed_cap()
arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou.
x86/MCE/AMD: Fix the thresholding machinery initialization order
x86/fpu: Disable bottom halves while loading FPU registers
perf/x86/intel: Move branch tracing setup to the Intel-specific source file
perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts()
perf/x86/intel: Disallow precise_ip on BTS events
fs: fix lost error code in dio_complete
ALSA: wss: Fix invalid snd_free_pages() at error path
ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write
ALSA: control: Fix race between adding and removing a user element
ALSA: sparc: Fix invalid snd_free_pages() at error path
ALSA: hda: Add ASRock N68C-S UCC the power_save blacklist
ALSA: hda/realtek - Support ALC300
ALSA: hda/realtek - fix headset mic detection for MSI MS-B171
ALSA: hda/realtek - fix the pop noise on headphone for lenovo laptops
ALSA: hda/realtek - Add auto-mute quirk for HP Spectre x360 laptop
function_graph: Create function_graph_enter() to consolidate architecture code
ARM: function_graph: Simplify with function_graph_enter()
microblaze: function_graph: Simplify with function_graph_enter()
x86/function_graph: Simplify with function_graph_enter()
nds32: function_graph: Simplify with function_graph_enter()
powerpc/function_graph: Simplify with function_graph_enter()
sh/function_graph: Simplify with function_graph_enter()
sparc/function_graph: Simplify with function_graph_enter()
parisc: function_graph: Simplify with function_graph_enter()
riscv/function_graph: Simplify with function_graph_enter()
s390/function_graph: Simplify with function_graph_enter()
arm64: function_graph: Simplify with function_graph_enter()
MIPS: function_graph: Simplify with function_graph_enter()
function_graph: Make ftrace_push_return_trace() static
function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack
function_graph: Have profiler use curr_ret_stack and not depth
function_graph: Move return callback before update of curr_ret_stack
function_graph: Reverse the order of pushing the ret_stack and the callback
binder: fix race that allows malicious free of live buffer
ext2: initialize opts.s_mount_opt as zero before using it
ext2: fix potential use after free
ASoC: intel: cht_bsw_max98090_ti: Add quirk for boards using pmc_plt_clk_0
ASoC: pcm186x: Fix device reset-registers trigger value
ARM: dts: rockchip: Remove @0 from the veyron memory node
dmaengine: at_hdmac: fix memory leak in at_dma_xlate()
dmaengine: at_hdmac: fix module unloading
staging: most: use format specifier "%s" in snprintf
staging: vchiq_arm: fix compat VCHIQ_IOC_AWAIT_COMPLETION
staging: mt7621-dma: fix potentially dereferencing uninitialized 'tx_desc'
staging: mt7621-pinctrl: fix uninitialized variable ngroups
staging: rtl8723bs: Fix incorrect sense of ether_addr_equal
staging: rtl8723bs: Add missing return for cfg80211_rtw_get_station
USB: usb-storage: Add new IDs to ums-realtek
usb: core: quirks: add RESET_RESUME quirk for Cherry G230 Stream series
Revert "usb: dwc3: gadget: skip Set/Clear Halt when invalid"
iio/hid-sensors: Fix IIO_CHAN_INFO_RAW returning wrong values for signed numbers
iio:st_magn: Fix enable device after trigger
lib/test_kmod.c: fix rmmod double free
mm: cleancache: fix corruption on missed inode invalidation
mm: use swp_offset as key in shmem_replace_page()
Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl()
misc: mic/scif: fix copy-paste error in scif_create_remote_lookup
Linux 4.19.7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit a74cfffb03 upstream
arch_smt_update() is only called when the sysfs SMT control knob is
changed. This means that when SMT is enabled in the sysfs control knob the
system is considered to have SMT active even if all siblings are offline.
To allow finegrained control of the speculation mitigations, the actual SMT
state is more interesting than the fact that siblings could be enabled.
Rework the code, so arch_smt_update() is invoked from each individual CPU
hotplug function, and simplify the update function while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 321a874a7e upstream
Make the scheduler's 'sched_smt_present' static key globaly available, so
it can be used in the x86 speculation control code.
Provide a query function and a stub for the CONFIG_SMP=n case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This adds a counter to the taskstats extended accounting fields, which
tracks the number of times fsync is called, and then plumbs it through
to the uid_sys_stats driver.
Bug: 120442023
Change-Id: I6c138de5b2332eea70f57e098134d1d141247b3f
Signed-off-by: Jin Qian <jinqian@google.com>
[AmitP: Refactored changes to align with changes from upstream commit
9a07000400 ("sched/headers: Move CONFIG_TASK_XACCT bits from <linux/sched.h> to <linux/sched/xacct.h>")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[tkjos: Needed for storaged fsync accounting ("storaged --uid" and
"storaged --task").]
[astrachan: This is modifying a userspace interface and should probably
be reworked]
Signed-off-by: Alistair Strachan <astrachan@google.com>
This patch adds a parameter to select_task_rq, sibling_count_hint
allowing the caller, where it has this information, to inform the
sched_class the number of tasks that are being woken up as part of
the same event.
The wake_q mechanism is one case where this information is available.
select_task_rq_fair can then use the information to detect that it
needs to widen the search space for task placement in order to avoid
overloading the last-level cache domain's CPUs.
* * *
The reason I am investigating this change is the following use case
on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which
all repeatedly do X amount of work then
pthread_barrier_wait (i.e. sleep until the last task finishes its X
and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU
finish faster, and then those CPUs pull over the tasks that are still
running:
v CPU v ->time->
-------------
0 (big) 11111 /333
-------------
1 (big) 22222 /444|
-------------
2 (LITTLE) 333333/
-------------
3 (LITTLE) 444444/
-------------
Now when task 4 hits the barrier (at |) and wakes the others up,
there are 4 tasks with prev_cpu=<big> and 0 tasks with
prev_cpu=<little>. want_affine therefore means that we'll only look
in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled
on the bigs until the next load balance, something like this:
v CPU v ->time->
------------------------
0 (big) 11111 /333 31313\33333
------------------------
1 (big) 22222 /444|424\4444444
------------------------
2 (LITTLE) 333333/ \222222
------------------------
3 (LITTLE) 444444/ \1111
------------------------
^^^
underutilization
So, I'm trying to get want_affine = 0 for these tasks.
I don't _think_ any incarnation of the wakee_flips mechanism can help
us here because which task is waker and which tasks are wakees
generally changes with each iteration.
However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the
nice property that we know exactly how many tasks are being woken, so
we can cheat.
It might be a disadvantage that we "widen" _every_ task that's woken in
an event, while select_idle_sibling would work fine for the first
sd_llc_size - 1 tasks.
IIUC, if wake_affine() behaves correctly this trick wouldn't be
necessary on SMP systems, so it might be best guarded by the presence
of SD_ASYM_CPUCAPACITY?
* * *
Final note..
In order to observe "perfect" behaviour for this use case, I also had
to disable the TTWU_QUEUE sched feature. Suppose during the wakeup
above we are working through the work queue and have placed tasks 3
and 2, and are about to place task 1:
v CPU v ->time->
--------------
0 (big) 11111 /333 3
--------------
1 (big) 22222 /444|4
--------------
2 (LITTLE) 333333/ 2
--------------
3 (LITTLE) 444444/ <- Task 1 should go here
--------------
If TTWU_QUEUE is enabled, we will not yet have enqueued task
2 (having instead sent a reschedule IPI) or attached its load to CPU
2. So we are likely to also place task 1 on cpu 2. Disabling
TTWU_QUEUE means that we enqueue task 2 before placing task 1,
solving this issue. TTWU_QUEUE is there to minimise rq lock
contention, and I guess that this contention is less of an issue on
big.LITTLE systems since they have relatively few CPUs, which
suggests the trade-off makes sense here.
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
( - Applied from https://patchwork.kernel.org/patch/9895261/
- Fixed trivial conflict in kernel/sched/core.c
- Fixed select_task_rq_idle, now in kernel/sched/idle.c
- Fixed trivial conflict in select_task_rq_fair )
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Change-Id: I3cfc4bf48c3d7feef969db4d22449f4fbb4f795d
Since we don't do energy-aware wakeups when we are overutilized, always
honoring sync wakeups in this state does not prevent wake-wide mechanics
overruling the flag as normal.
This patch is based upon previous work to build EAS for android products.
sync-hint code taken from commit 4a5e890ec60d
"sched/fair: add tunable to force selection at cpu granularity" written
by Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry-picked from commit f1ec666a62dec1083ed52fe1ddef093b84373aaf)
[ Moved the feature to find_energy_efficient_cpu() ]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Change-Id: I4b3d79141fc8e53dc51cd63ac11096c2e3cb10f5
Introduce a new sysctl for this option, 'sched_cstate_aware'.
When this is enabled, the scheduler can make use of the idle state
indexes in order to break the tie between potential CPU candidates.
This patch is based on 7f6fb825d6bc ("ANDROID: sched: EAS: take cstate
into account when selecting idle core") from android-4.14. All the
credits goes to the authors.
Change-Id: Ia076cf32faff91e90905291fa6f7924dc3dd6458
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
In its current state, Energy Aware Scheduling (EAS) starts automatically
on asymmetric platforms having an Energy Model (EM). However, there are
users who want to have an EM (for thermal management for example), but
don't want EAS with it.
In order to let users disable EAS explicitly, introduce a new sysctl
called 'sched_energy_aware'. It is enabled by default so that EAS can
start automatically on platforms where it makes sense. Flipping it to 0
rebuilds the scheduling domains and disables EAS.
Change-Id: I55764e70bf5e90795d2269ec9135ae6e82794a2b
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Message-Id: <20181016101513.26919-11-quentin.perret@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Energy Aware Scheduling (EAS) is designed with the assumption that
frequencies of CPUs follow their utilization value. When using a CPUFreq
governor other than schedutil, the chances of this assumption being true
are small, if any. When schedutil is being used, EAS' predictions are at
least consistent with the frequency requests. Although those requests
have no guarantees to be honored by the hardware, they should at least
guide DVFS in the right direction and provide some hope in regards to the
EAS model being accurate.
To make sure EAS is only used in a sane configuration, create a strong
dependency on schedutil being used. Since having sugov compiled-in does
not provide that guarantee, make CPUFreq call a scheduler function on
governor changes hence letting it rebuild the scheduling domains, check
the governors of the online CPUs, and enable/disable EAS accordingly.
Change-Id: I872949134f97d2772fc681b7393eaed7f0e224f2
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Message-Id: <20181016101513.26919-9-quentin.perret@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Schedutil requests frequency by aggregating utilization signals from
the scheduler (CFS, RT, DL, IRQ) and applying a 25% margin on top of
them. Since Energy Aware Scheduling (EAS) needs to be able to predict
the frequency requests, it needs to forecast the decisions made by the
governor.
In order to prepare the introduction of EAS, introduce
schedutil_freq_util() to centralize the aforementioned signal
aggregation and make it available to both schedutil and EAS. Since
frequency selection and energy estimation still need to deal with RT and
DL signals slightly differently, schedutil_freq_util() is called with a
different 'type' parameter in those two contexts, and returns an
aggregated utilization signal accordingly. While at it, introduce the
map_util_freq() function which is designed to make schedutil's 25%
margin usable easily for both sugov and EAS.
As EAS will be able to predict schedutil's frequency requests more
accurately than any other governor by design, it'd be sensible to make
sure EAS cannot be used without schedutil. This will be done later, once
EAS has actually been introduced.
Change-Id: Idbeeb00926045507b73f9cba37630b38ae0816c0
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Message-Id: <20181016101513.26919-3-quentin.perret@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
By default, arch_scale_cpu_capacity() is only visible from within the
kernel/sched folder. Relocate it to include/linux/sched/topology.h to
make it visible to other clients needing to know about the capacity of
CPUs, such as the Energy Model framework.
Change-Id: I144c7299e122201dbcadc431d55d0a6d24d90005
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Message-Id: <20181016101513.26919-2-quentin.perret@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
sched_domain in the hierarchy where all CPU capacities are visible for
any CPU's point of view on asymmetric CPU capacity systems. The
scheduler can then take to take capacity asymmetry into account when
balancing at this level. It also serves as an indicator for how wide
task placement heuristics have to search to consider all available CPU
capacities as asymmetric systems might often appear symmetric at
smallest level(s) of the sched_domain hierarchy.
The flag has been around for while but so far only been set by
out-of-tree code in Android kernels. One solution is to let each
architecture provide the flag through a custom sched_domain topology
array and associated mask and flag functions. However,
SD_ASYM_CPUCAPACITY is special in the sense that it depends on the
capacity and presence of all CPUs in the system, i.e. when hotplugging
all CPUs out except those with one particular CPU capacity the flag
should disappear even if the sched_domains don't collapse. Similarly,
the flag is affected by cpusets where load-balancing is turned off.
Detecting when the flags should be set therefore depends not only on
topology information but also the cpuset configuration and hotplug
state. The arch code doesn't have easy access to the cpuset
configuration.
Instead, this patch implements the flag detection in generic code where
cpusets and hotplug state is already taken care of. All the arch is
responsible for is to implement arch_scale_cpu_capacity() and force a
full rebuild of the sched_domain hierarchy if capacities are updated,
e.g. later in the boot process when cpufreq has initialized.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1532093554-30504-2-git-send-email-morten.rasmussen@arm.com
[ Fixed 'CPU' capitalization. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 05484e0984)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Change-Id: I1d5f695a95f8d023f1ecf14ecb71a558ceb67ed6
Merge more updates from Andrew Morton:
- the rest of MM
- procfs updates
- various misc things
- more y2038 fixes
- get_maintainer updates
- lib/ updates
- checkpatch updates
- various epoll updates
- autofs updates
- hfsplus
- some reiserfs work
- fatfs updates
- signal.c cleanups
- ipc/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits)
ipc/util.c: update return value of ipc_getref from int to bool
ipc/util.c: further variable name cleanups
ipc: simplify ipc initialization
ipc: get rid of ids->tables_initialized hack
lib/rhashtable: guarantee initial hashtable allocation
lib/rhashtable: simplify bucket_table_alloc()
ipc: drop ipc_lock()
ipc/util.c: correct comment in ipc_obtain_object_check
ipc: rename ipcctl_pre_down_nolock()
ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()
ipc: reorganize initialization of kern_ipc_perm.seq
ipc: compute kern_ipc_perm.id under the ipc lock
init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp
adfs: use timespec64 for time conversion
kernel/sysctl.c: fix typos in comments
drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md
fork: don't copy inconsistent signal handler state to child
signal: make get_signal() return bool
signal: make sigkill_pending() return bool
...
Patch series "signal: refactor some functions", v3.
This series refactors a bunch of functions in signal.c to simplify parts
of the code.
The greatest single change is declaring the static do_sigpending() helper
as void which makes it possible to remove a bunch of unnecessary checks in
the syscalls later on.
This patch (of 17):
force_sigsegv() returned 0 unconditionally so it doesn't make sense to have
it return at all. In addition, there are no callers that check
force_sigsegv()'s return value.
Link: http://lkml.kernel.org/r/20180602103653.18181-2-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently task hung checking interval is equal to timeout, as the result
hung is detected anywhere between timeout and 2*timeout. This is fine for
most interactive environments, but this hurts automated testing setups
(syzbot). In an automated setup we need to strictly order CPU lockup <
RCU stall < workqueue lockup < task hung < silent loss, so that RCU stall
is not detected as task hung and task hung is not detected as silent
machine loss. The large variance in task hung detection timeout requires
setting silent machine loss timeout to a very large value (e.g. if task
hung is 3 mins, then silent loss need to be set to ~7 mins). The
additional 3 minutes significantly reduce testing efficiency because
usually we crash kernel within a minute, and this can add hours to bug
localization process as it needs to do dozens of tests.
Allow setting checking interval separately from timeout. This allows to
set timeout to, say, 3 minutes, but checking interval to 10 secs.
The interval is controlled via a new hung_task_check_interval_secs sysctl,
similar to the existing hung_task_timeout_secs sysctl. The default value
of 0 results in the current behavior: checking interval is equal to
timeout.
[akpm@linux-foundation.org: update hung_task_timeout_max's comment]
Link: http://lkml.kernel.org/r/20180611111004.203513-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
refcount_t type and corresponding API should be used instead of atomic_t
wh en the variable is used as a reference counter. This avoids accidental
refcounter overflows that might lead to use-after-free situations.
Link: http://lkml.kernel.org/r/20180703200141.28415-6-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull core signal handling updates from Eric Biederman:
"It was observed that a periodic timer in combination with a
sufficiently expensive fork could prevent fork from every completing.
This contains the changes to remove the need for that restart.
This set of changes is split into several parts:
- The first part makes PIDTYPE_TGID a proper pid type instead
something only for very special cases. The part starts using
PIDTYPE_TGID enough so that in __send_signal where signals are
actually delivered we know if the signal is being sent to a a group
of processes or just a single process.
- With that prep work out of the way the logic in fork is modified so
that fork logically makes signals received while it is running
appear to be received after the fork completes"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits)
signal: Don't send signals to tasks that don't exist
signal: Don't restart fork when signals come in.
fork: Have new threads join on-going signal group stops
fork: Skip setting TIF_SIGPENDING in ptrace_init_task
signal: Add calculate_sigpending()
fork: Unconditionally exit if a fatal signal is pending
fork: Move and describe why the code examines PIDNS_ADDING
signal: Push pid type down into complete_signal.
signal: Push pid type down into __send_signal
signal: Push pid type down into send_signal
signal: Pass pid type into do_send_sig_info
signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task
signal: Pass pid type into group_send_sig_info
signal: Pass pid and pid type into send_sigqueue
posix-timers: Noralize good_sigevent
signal: Use PIDTYPE_TGID to clearly store where file signals will be sent
pid: Implement PIDTYPE_TGID
pids: Move the pgrp and session pid pointers from task_struct to signal_struct
kvm: Don't open code task_pid in kvm_vcpu_ioctl
pids: Compute task_tgid using signal->leader_pid
...
Patch series "Directed kmem charging", v8.
The Linux kernel's memory cgroup allows limiting the memory usage of the
jobs running on the system to provide isolation between the jobs. All
the kernel memory allocated in the context of the job and marked with
__GFP_ACCOUNT will also be included in the memory usage and be limited
by the job's limit.
The kernel memory can only be charged to the memcg of the process in
whose context kernel memory was allocated. However there are cases
where the allocated kernel memory should be charged to the memcg
different from the current processes's memcg. This patch series
contains two such concrete use-cases i.e. fsnotify and buffer_head.
The fsnotify event objects can consume a lot of system memory for large
or unlimited queues if there is either no or slow listener. The events
are allocated in the context of the event producer. However they should
be charged to the event consumer. Similarly the buffer_head objects can
be allocated in a memcg different from the memcg of the page for which
buffer_head objects are being allocated.
To solve this issue, this patch series introduces mechanism to charge
kernel memory to a given memcg. In case of fsnotify events, the memcg
of the consumer can be used for charging and for buffer_head, the memcg
of the page can be charged. For directed charging, the caller can use
the scope API memalloc_[un]use_memcg() to specify the memcg to charge
for all the __GFP_ACCOUNT allocations within the scope.
This patch (of 2):
A lot of memory can be consumed by the events generated for the huge or
unlimited queues if there is either no or slow listener. This can cause
system level memory pressure or OOMs. So, it's better to account the
fsnotify kmem caches to the memcg of the listener.
However the listener can be in a different memcg than the memcg of the
producer and these allocations happen in the context of the event
producer. This patch introduces remote memcg charging API which the
producer can use to charge the allocations to the memcg of the listener.
There are seven fsnotify kmem caches and among them allocations from
dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
inotify_inode_mark_cachep happens in the context of syscall from the
listener. So, SLAB_ACCOUNT is enough for these caches.
The objects from fsnotify_mark_connector_cachep are not accounted as
they are small compared to the notification mark or events and it is
unclear whom to account connector to since it is shared by all events
attached to the inode.
The allocations from the event caches happen in the context of the event
producer. For such caches we will need to remote charge the allocations
to the listener's memcg. Thus we save the memcg reference in the
fsnotify_group structure of the listener.
This patch has also moved the members of fsnotify_group to keep the size
same, at least for 64 bit build, even with additional member by filling
the holes.
[shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn>
report that a periodic signal received during fork can cause fork to
continually restart preventing an application from making progress.
The code was being overly pessimistic. Fork needs to guarantee that a
signal sent to multiple processes is logically delivered before the
fork and just to the forking process or logically delivered after the
fork to both the forking process and it's newly spawned child. For
signals like periodic timers that are always delivered to a single
process fork can safely complete and let them appear to logically
delivered after the fork().
While examining this issue I also discovered that fork today will miss
signals delivered to multiple processes during the fork and handled by
another thread. Similarly the current code will also miss blocked
signals that are delivered to multiple process, as those signals will
not appear pending during fork.
Add a list of each thread that is currently forking, and keep on that
list a signal set that records all of the signals sent to multiple
processes. When fork completes initialize the new processes
shared_pending signal set with it. The calculate_sigpending function
will see those signals and set TIF_SIGPENDING causing the new task to
take the slow path to userspace to handle those signals. Making it
appear as if those signals were received immediately after the fork.
It is not possible to send real time signals to multiple processes and
exceptions don't go to multiple processes, which means that that are
no signals sent to multiple processes that require siginfo. This
means it is safe to not bother collecting siginfo on signals sent
during fork.
The sigaction of a child of fork is initially the same as the
sigaction of the parent process. So a signal the parent ignores the
child will also initially ignore. Therefore it is safe to ignore
signals sent to multiple processes and ignored by the forking process.
Signals sent to only a single process or only a single thread and delivered
during fork are treated as if they are received after the fork, and generally
not dealt with. They won't cause any problems.
V2: Added removal from the multiprocess list on failure.
V3: Use -ERESTARTNOINTR directly
V4: - Don't queue both SIGCONT and SIGSTOP
- Initialize signal_struct.multiprocess in init_task
- Move setting of shared_pending to before the new task
is visible to signals. This prevents signals from comming
in before shared_pending.signal is set to delayed.signal
and being lost.
V5: - rework list add and delete to account for idle threads
v6: - Use sigdelsetmask when removing stop signals
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447
Reported-by: Wen Yang <wen.yang99@zte.com.cn> and
Reported-by: majiang <ma.jiang@zte.com.cn>
Fixes: 4a2c7a7837 ("[PATCH] make fork() atomic wrt pgrp/session signals")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
There are only two signals that are delivered to every member of a
signal group: SIGSTOP and SIGKILL. Signal delivery requires every
signal appear to be delivered either before or after a clone syscall.
SIGKILL terminates the clone so does not need to be considered. Which
leaves only SIGSTOP that needs to be considered when creating new
threads.
Today in the event of a group stop TIF_SIGPENDING will get set and the
fork will restart ensuring the fork syscall participates in the group
stop.
A fork (especially of a process with a lot of memory) is one of the
most expensive system so we really only want to restart a fork when
necessary.
It is easy so check to see if a SIGSTOP is ongoing and have the new
thread join it immediate after the clone completes. Making it appear
the clone completed happened just before the SIGSTOP.
The calculate_sigpending function will see the bits set in jobctl and
set TIF_SIGPENDING to ensure the new task takes the slow path to userspace.
V2: The call to task_join_group_stop was moved before the new task is
added to the thread group list. This should not matter as
sighand->siglock is held over both the addition of the threads,
the call to task_join_group_stop and do_signal_stop. But the change
is trivial and it is one less thing to worry about when reading
the code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add a function calculate_sigpending to test to see if any signals are
pending for a new task immediately following fork. Signals have to
happen either before or after fork. Today our practice is to push
all of the signals to before the fork, but that has the downside that
frequent or periodic signals can make fork take much much longer than
normal or prevent fork from completing entirely.
So we need move signals that we can after the fork to prevent that.
This updates the code to set TIF_SIGPENDING on a new task if there
are signals or other activities that have moved so that they appear
to happen after the fork.
As the code today restarts if it sees any such activity this won't
immediately have an effect, as there will be no reason for it
to set TIF_SIGPENDING immediately after the fork.
Adding calculate_sigpending means the code in fork can safely be
changed to not always restart if a signal is pending.
The new calculate_sigpending function sets sigpending if there
are pending bits in jobctl, pending signals, the freezer needs
to freeze the new task or the live kernel patching framework
need the new thread to take the slow path to userspace.
I have verified that setting TIF_SIGPENDING does make a new process
take the slow path to userspace before it executes it's first userspace
instruction.
I have looked at the callers of signal_wake_up and the code paths
setting TIF_SIGPENDING and I don't see anything else that needs to be
handled. The code probably doesn't need to set TIF_SIGPENDING for the
kernel live patching as it uses a separate thread flag as well. But
at this point it seems safer reuse the recalc_sigpending logic and get
the kernel live patching folks to sort out their story later.
V2: I have moved the test into schedule_tail where siglock can
be grabbed and recalc_sigpending can be reused directly.
Further as the last action of setting up a new task this
guarantees that TIF_SIGPENDING will be properly set in the
new process.
The helper calculate_sigpending takes the siglock and
uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending
clear TIF_SIGPENDING if it is unnecessary. This allows reusing
the existing code and keeps maintenance of the conditions simple.
Oleg Nesterov <oleg@redhat.com> suggested the movement
and pointed out the need to take siglock if this code
was going to be called while the new task is discoverable.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
kernel_wait4() expects a userland address for status - it's only
rusage that goes as a kernel one (and needs a copyout afterwards)
[ Also, fix the prototype of kernel_wait4() to have that __user
annotation - Linus ]
Fixes: 92ebce5ac5 ("osf_wait4: switch to kernel_wait4()")
Cc: stable@kernel.org # v4.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the code more maintainable by performing more of the signal
related work in send_sigqueue.
A quick inspection of do_timer_create will show that this code path
does not lookup a thread group by a thread's pid. Making it safe
to find the task pointed to by it_pid with "pid_task(it_pid, type)";
This supports the changes needed in fork to tell if a signal was sent
to a single process or a group of processes.
Having the pid to task transition in signal.c will also make it easier
to sort out races with de_thread and and the thread group leader
exiting when it comes time to address that.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Everywhere except in the pid array we distinguish between a tasks pid and
a tasks tgid (thread group id). Even in the enumeration we want that
distinction sometimes so we have added __PIDTYPE_TGID. With leader_pid
we almost have an implementation of PIDTYPE_TGID in struct signal_struct.
Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
into the pids array. Then remove the __PIDTYPE_TGID special case and the
leader_pid in signal_struct.
The net size increase is just an extra pointer added to struct pid and
an extra pair of pointers of an hlist_node added to task_struct.
The effect on code maintenance is the removal of a number of special
cases today and the potential to remove many more special cases as
PIDTYPE_TGID gets used to it's fullest. The long term potential
is allowing zombie thread group leaders to exit, which will remove
a lot more special cases in the code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>