-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4u6tsACgkQONu9yGCS
aT693A//TExeDRnNnf+2v4TJorylyRr17BMxk/Ie2L5E6d2n/RWodsrOThAPU9tx
5alNUkXCT8Jd31BUVnUoPoAQ4zSymSVi++XEf05wDeO0tQ982IESGaLmu9EC1uMF
nnM5y4IdRYmFI1Zji4h5vRJckoYUlB6Mdg4BgMr4Q1KX7RkZYfe6bjs7DwM/uyMx
jVXdFaQBD1H6F5W6A+GmgUZ36g9uNqzcBxxWwv5URj+q816NdI4bsxIJMF0v0WC+
S54fmpS07QWIYKKsQBUepeSgEF4ECESOE2VoF1ICcnfakdPnDBmNgyPJPSrLmVf+
itRUxoH1MewaOvoJrv+xsGBPmM29LcKH2oBmj5DR2Xstp7ACPs+OtXJEU9dUTDN4
NhaSts5fIp0f4Y5mMn508pDUwYDAWDt99ZJWdx6aK/TRyUsHBgpxBQDt37BE3U5W
PCBnObNe2b2KDAsVXLjX5iDYoA0+usFreveMo8uEP+ohfh0ANvJlRkzedYw7NquI
ZCcT+I1P9q8aa0528tR332VLrQeYg+kG6LVi2kAabmRA/VtEsT0w90MY/eo2vuTU
WlPmbs2yerv2HTm050e6MOgBZfPh7wP/FpbjsSXufj7EDywlfxF+1hXdwfrpPJeN
fN3g0kepeUp7+kLzO40FLam/z5ndjAUhyN2SBaPzGsXjMkZdETk=
=zvlh
-----END PGP SIGNATURE-----
Merge 4.19.99 into android-4.19
Changes in 4.19.99
Revert "efi: Fix debugobjects warning on 'efi_rts_work'"
xfs: Sanity check flags of Q_XQUOTARM call
i2c: stm32f7: rework slave_id allocation
i2c: i2c-stm32f7: fix 10-bits check in slave free id search loop
mfd: intel-lpss: Add default I2C device properties for Gemini Lake
SUNRPC: Fix svcauth_gss_proxy_init()
powerpc/pseries: Enable support for ibm,drc-info property
powerpc/archrandom: fix arch_get_random_seed_int()
tipc: update mon's self addr when node addr generated
tipc: fix wrong timeout input for tipc_wait_for_cond()
mt7601u: fix bbp version check in mt7601u_wait_bbp_ready
crypto: sun4i-ss - fix big endian issues
perf map: No need to adjust the long name of modules
soc: aspeed: Fix snoop_file_poll()'s return type
watchdog: sprd: Fix the incorrect pointer getting from driver data
ipmi: Fix memory leak in __ipmi_bmc_register
drm/sti: do not remove the drm_bridge that was never added
ARM: dts: at91: nattis: set the PRLUD and HIPOW signals low
ARM: dts: at91: nattis: make the SD-card slot work
ixgbe: don't clear IPsec sa counters on HW clearing
drm/virtio: fix bounds check in virtio_gpu_cmd_get_capset()
iio: fix position relative kernel version
apparmor: Fix network performance issue in aa_label_sk_perm
ALSA: hda: fix unused variable warning
apparmor: don't try to replace stale label in ptrace access check
ARM: qcom_defconfig: Enable MAILBOX
firmware: coreboot: Let OF core populate platform device
PCI: iproc: Remove PAXC slot check to allow VF support
bridge: br_arp_nd_proxy: set icmp6_router if neigh has NTF_ROUTER
drm/hisilicon: hibmc: Don't overwrite fb helper surface depth
signal/ia64: Use the generic force_sigsegv in setup_frame
signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
ASoC: wm9712: fix unused variable warning
mailbox: mediatek: Add check for possible failure of kzalloc
IB/rxe: replace kvfree with vfree
IB/hfi1: Add mtu check for operational data VLs
genirq/debugfs: Reinstate full OF path for domain name
usb: dwc3: add EXTCON dependency for qcom
usb: gadget: fsl_udc_core: check allocation return value and cleanup on failure
cfg80211: regulatory: make initialization more robust
mei: replace POLL* with EPOLL* for write queues.
drm/msm: fix unsigned comparison with less than zero
of: Fix property name in of_node_get_device_type
ALSA: usb-audio: update quirk for B&W PX to remove microphone
iwlwifi: nvm: get num of hw addresses from firmware
staging: comedi: ni_mio_common: protect register write overflow
netfilter: nft_osf: usage from output path is not valid
pwm: lpss: Release runtime-pm reference from the driver's remove callback
powerpc/pseries/memory-hotplug: Fix return value type of find_aa_index
rtlwifi: rtl8821ae: replace _rtl8821ae_mrate_idx_to_arfr_id with generic version
RDMA/bnxt_re: Add missing spin lock initialization
netfilter: nf_flow_table: do not remove offload when other netns's interface is down
powerpc/kgdb: add kgdb_arch_set/remove_breakpoint()
tipc: eliminate message disordering during binding table update
net: socionext: Add dummy PHY register read in phy_write()
drm/sun4i: hdmi: Fix double flag assignation
net: hns3: add error handler for hns3_nic_init_vector_data()
mlxsw: reg: QEEC: Add minimum shaper fields
mlxsw: spectrum: Set minimum shaper on MC TCs
NTB: ntb_hw_idt: replace IS_ERR_OR_NULL with regular NULL checks
ASoC: wm97xx: fix uninitialized regmap pointer problem
ARM: dts: bcm283x: Correct mailbox register sizes
pcrypt: use format specifier in kobject_add
ASoC: sun8i-codec: add missing route for ADC
pinctrl: meson-gxl: remove invalid GPIOX tsin_a pins
bus: ti-sysc: Add mcasp optional clocks flag
exportfs: fix 'passing zero to ERR_PTR()' warning
drm: rcar-du: Fix the return value in case of error in 'rcar_du_crtc_set_crc_source()'
drm: rcar-du: Fix vblank initialization
net: always initialize pagedlen
drm/dp_mst: Skip validating ports during destruction, just ref
arm64: dts: meson-gx: Add hdmi_5v regulator as hdmi tx supply
arm64: dts: renesas: r8a7795-es1: Add missing power domains to IPMMU nodes
net: phy: Fix not to call phy_resume() if PHY is not attached
IB/hfi1: Correctly process FECN and BECN in packets
OPP: Fix missing debugfs supply directory for OPPs
IB/rxe: Fix incorrect cache cleanup in error flow
mailbox: ti-msgmgr: Off by one in ti_msgmgr_of_xlate()
staging: bcm2835-camera: Abort probe if there is no camera
staging: bcm2835-camera: fix module autoloading
switchtec: Remove immediate status check after submitting MRPC command
ipv6: add missing tx timestamping on IPPROTO_RAW
pinctrl: sh-pfc: r8a7740: Add missing REF125CK pin to gether_gmii group
pinctrl: sh-pfc: r8a7740: Add missing LCD0 marks to lcd0_data24_1 group
pinctrl: sh-pfc: r8a7791: Remove bogus ctrl marks from qspi_data4_b group
pinctrl: sh-pfc: r8a7791: Remove bogus marks from vin1_b_data18 group
pinctrl: sh-pfc: sh73a0: Add missing TO pin to tpu4_to3 group
pinctrl: sh-pfc: r8a7794: Remove bogus IPSR9 field
pinctrl: sh-pfc: r8a77970: Add missing MOD_SEL0 field
pinctrl: sh-pfc: r8a77980: Add missing MOD_SEL0 field
pinctrl: sh-pfc: sh7734: Add missing IPSR11 field
pinctrl: sh-pfc: r8a77995: Remove bogus SEL_PWM[0-3]_3 configurations
pinctrl: sh-pfc: sh7269: Add missing PCIOR0 field
pinctrl: sh-pfc: sh7734: Remove bogus IPSR10 value
net: hns3: fix error handling int the hns3_get_vector_ring_chain
vxlan: changelink: Fix handling of default remotes
Input: nomadik-ske-keypad - fix a loop timeout test
fork,memcg: fix crash in free_thread_stack on memcg charge fail
clk: highbank: fix refcount leak in hb_clk_init()
clk: qoriq: fix refcount leak in clockgen_init()
clk: ti: fix refcount leak in ti_dt_clocks_register()
clk: socfpga: fix refcount leak
clk: samsung: exynos4: fix refcount leak in exynos4_get_xom()
clk: imx6q: fix refcount leak in imx6q_clocks_init()
clk: imx6sx: fix refcount leak in imx6sx_clocks_init()
clk: imx7d: fix refcount leak in imx7d_clocks_init()
clk: vf610: fix refcount leak in vf610_clocks_init()
clk: armada-370: fix refcount leak in a370_clk_init()
clk: kirkwood: fix refcount leak in kirkwood_clk_init()
clk: armada-xp: fix refcount leak in axp_clk_init()
clk: mv98dx3236: fix refcount leak in mv98dx3236_clk_init()
clk: dove: fix refcount leak in dove_clk_init()
MIPS: BCM63XX: drop unused and broken DSP platform device
arm64: defconfig: Re-enable bcm2835-thermal driver
remoteproc: qcom: q6v5-mss: Add missing clocks for MSM8996
remoteproc: qcom: q6v5-mss: Add missing regulator for MSM8996
drm: Fix error handling in drm_legacy_addctx
ARM: dts: r8a7743: Remove generic compatible string from iic3
drm/etnaviv: fix some off by one bugs
drm/fb-helper: generic: Fix setup error path
fork, memcg: fix cached_stacks case
IB/usnic: Fix out of bounds index check in query pkey
RDMA/ocrdma: Fix out of bounds index check in query pkey
RDMA/qedr: Fix out of bounds index check in query pkey
drm/shmob: Fix return value check in shmob_drm_probe
arm64: dts: apq8016-sbc: Increase load on l11 for SDCARD
spi: cadence: Correct initialisation of runtime PM
RDMA/iw_cxgb4: Fix the unchecked ep dereference
net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ9031
memory: tegra: Don't invoke Tegra30+ specific memory timing setup on Tegra20
drm/etnaviv: NULL vs IS_ERR() buf in etnaviv_core_dump()
media: s5p-jpeg: Correct step and max values for V4L2_CID_JPEG_RESTART_INTERVAL
kbuild: mark prepare0 as PHONY to fix external module build
crypto: brcm - Fix some set-but-not-used warning
crypto: tgr192 - fix unaligned memory access
ASoC: imx-sgtl5000: put of nodes if finding codec fails
IB/iser: Pass the correct number of entries for dma mapped SGL
net: hns3: fix wrong combined count returned by ethtool -l
media: tw9910: Unregister subdevice with v4l2-async
IB/mlx5: Don't override existing ip_protocol
rtc: cmos: ignore bogus century byte
spi/topcliff_pch: Fix potential NULL dereference on allocation error
net: hns3: fix bug of ethtool_ops.get_channels for VF
ARM: dts: sun8i-a23-a33: Move NAND controller device node to sort by address
clk: sunxi-ng: sun8i-a23: Enable PLL-MIPI LDOs when ungating it
iwlwifi: mvm: avoid possible access out of array.
net/mlx5: Take lock with IRQs disabled to avoid deadlock
ip_tunnel: Fix route fl4 init in ip_md_tunnel_xmit
arm64: dts: allwinner: h6: Move GIC device node fix base address ordering
iwlwifi: mvm: fix A-MPDU reference assignment
bus: ti-sysc: Fix timer handling with drop pm_runtime_irq_safe()
tty: ipwireless: Fix potential NULL pointer dereference
driver: uio: fix possible memory leak in __uio_register_device
driver: uio: fix possible use-after-free in __uio_register_device
crypto: crypto4xx - Fix wrong ppc4xx_trng_probe()/ppc4xx_trng_remove() arguments
driver core: Fix DL_FLAG_AUTOREMOVE_SUPPLIER device link flag handling
driver core: Avoid careless re-use of existing device links
driver core: Do not resume suppliers under device_links_write_lock()
driver core: Fix handling of runtime PM flags in device_link_add()
driver core: Do not call rpm_put_suppliers() in pm_runtime_drop_link()
ARM: dts: lpc32xx: add required clocks property to keypad device node
ARM: dts: lpc32xx: reparent keypad controller to SIC1
ARM: dts: lpc32xx: fix ARM PrimeCell LCD controller variant
ARM: dts: lpc32xx: fix ARM PrimeCell LCD controller clocks property
ARM: dts: lpc32xx: phy3250: fix SD card regulator voltage
drm/xen-front: Fix mmap attributes for display buffers
iwlwifi: mvm: fix RSS config command
staging: most: cdev: add missing check for cdev_add failure
clk: ingenic: jz4740: Fix gating of UDC clock
rtc: ds1672: fix unintended sign extension
thermal: mediatek: fix register index error
arm64: dts: msm8916: remove bogus argument to the cpu clock
ath10k: fix dma unmap direction for management frames
net: phy: fixed_phy: Fix fixed_phy not checking GPIO
rtc: ds1307: rx8130: Fix alarm handling
net/smc: original socket family in inet_sock_diag
rtc: 88pm860x: fix unintended sign extension
rtc: 88pm80x: fix unintended sign extension
rtc: pm8xxx: fix unintended sign extension
fbdev: chipsfb: remove set but not used variable 'size'
iw_cxgb4: use tos when importing the endpoint
iw_cxgb4: use tos when finding ipv6 routes
ipmi: kcs_bmc: handle devm_kasprintf() failure case
xsk: add missing smp_rmb() in xsk_mmap
drm/etnaviv: potential NULL dereference
ntb_hw_switchtec: debug print 64bit aligned crosslink BAR Numbers
ntb_hw_switchtec: NT req id mapping table register entry number should be 512
pinctrl: sh-pfc: emev2: Add missing pinmux functions
pinctrl: sh-pfc: r8a7791: Fix scifb2_data_c pin group
pinctrl: sh-pfc: r8a7792: Fix vin1_data18_b pin group
pinctrl: sh-pfc: sh73a0: Fix fsic_spdif pin groups
RDMA/mlx5: Fix memory leak in case we fail to add an IB device
driver core: Fix possible supplier PM-usage counter imbalance
PCI: endpoint: functions: Use memcpy_fromio()/memcpy_toio()
usb: phy: twl6030-usb: fix possible use-after-free on remove
block: don't use bio->bi_vcnt to figure out segment number
keys: Timestamp new keys
net: dsa: b53: Fix default VLAN ID
net: dsa: b53: Properly account for VLAN filtering
net: dsa: b53: Do not program CPU port's PVID
mt76: usb: fix possible memory leak in mt76u_buf_free
media: sh: migor: Include missing dma-mapping header
vfio_pci: Enable memory accesses before calling pci_map_rom
hwmon: (pmbus/tps53679) Fix driver info initialization in probe routine
mdio_bus: Fix PTR_ERR() usage after initialization to constant
KVM: PPC: Release all hardware TCE tables attached to a group
staging: r8822be: check kzalloc return or bail
dmaengine: mv_xor: Use correct device for DMA API
cdc-wdm: pass return value of recover_from_urb_loss
brcmfmac: create debugfs files for bus-specific layer
regulator: pv88060: Fix array out-of-bounds access
regulator: pv88080: Fix array out-of-bounds access
regulator: pv88090: Fix array out-of-bounds access
net: dsa: qca8k: Enable delay for RGMII_ID mode
net/mlx5: Delete unused FPGA QPN variable
drm/nouveau/bios/ramcfg: fix missing parentheses when calculating RON
drm/nouveau/pmu: don't print reply values if exec is false
drm/nouveau: fix missing break in switch statement
driver core: Fix PM-runtime for links added during consumer probe
ASoC: qcom: Fix of-node refcount unbalance in apq8016_sbc_parse_of()
net: dsa: fix unintended change of bridge interface STP state
fs/nfs: Fix nfs_parse_devname to not modify it's argument
staging: rtlwifi: Use proper enum for return in halmac_parse_psd_data_88xx
powerpc/64s: Fix logic when handling unknown CPU features
NFS: Fix a soft lockup in the delegation recovery code
perf: Copy parent's address filter offsets on clone
perf, pt, coresight: Fix address filters for vmas with non-zero offset
clocksource/drivers/sun5i: Fail gracefully when clock rate is unavailable
clocksource/drivers/exynos_mct: Fix error path in timer resources initialization
platform/x86: wmi: fix potential null pointer dereference
NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umount
mmc: sdhci-brcmstb: handle mmc_of_parse() errors during probe
iommu: Fix IOMMU debugfs fallout
ARM: 8847/1: pm: fix HYP/SVC mode mismatch when MCPM is used
ARM: 8848/1: virt: Align GIC version check with arm64 counterpart
ARM: 8849/1: NOMMU: Fix encodings for PMSAv8's PRBAR4/PRLAR4
regulator: wm831x-dcdc: Fix list of wm831x_dcdc_ilim from mA to uA
ath10k: Fix length of wmi tlv command for protected mgmt frames
netfilter: nft_set_hash: fix lookups with fixed size hash on big endian
netfilter: nft_set_hash: bogus element self comparison from deactivation path
net: sched: act_csum: Fix csum calc for tagged packets
hwrng: bcm2835 - fix probe as platform device
iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm()
NFS: Add missing encode / decode sequence_maxsz to v4.2 operations
NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE()
net: aquantia: fixed instack structure overflow
powerpc/mm: Check secondary hash page table
media: dvb/earth-pt1: fix wrong initialization for demod blocks
rbd: clear ->xferred on error from rbd_obj_issue_copyup()
PCI: Fix "try" semantics of bus and slot reset
nios2: ksyms: Add missing symbol exports
x86/mm: Remove unused variable 'cpu'
scsi: megaraid_sas: reduce module load time
nfp: fix simple vNIC mailbox length
drivers/rapidio/rio_cm.c: fix potential oops in riocm_ch_listen()
xen, cpu_hotplug: Prevent an out of bounds access
net/mlx5: Fix multiple updates of steering rules in parallel
net/mlx5e: IPoIB, Fix RX checksum statistics update
net: sh_eth: fix a missing check of of_get_phy_mode
regulator: lp87565: Fix missing register for LP87565_BUCK_0
soc: amlogic: gx-socinfo: Add mask for each SoC packages
media: ivtv: update *pos correctly in ivtv_read_pos()
media: cx18: update *pos correctly in cx18_read_pos()
media: wl128x: Fix an error code in fm_download_firmware()
media: cx23885: check allocation return
regulator: tps65086: Fix tps65086_ldoa1_ranges for selector 0xB
crypto: ccree - reduce kernel stack usage with clang
jfs: fix bogus variable self-initialization
tipc: tipc clang warning
m68k: mac: Fix VIA timer counter accesses
ARM: dts: sun8i: a33: Reintroduce default pinctrl muxing
arm64: dts: allwinner: a64: Add missing PIO clocks
ARM: dts: sun9i: optimus: Fix fixed-regulators
net: phy: don't clear BMCR in genphy_soft_reset
ARM: OMAP2+: Fix potentially uninitialized return value for _setup_reset()
net: dsa: Avoid null pointer when failing to connect to PHY
soc: qcom: cmd-db: Fix an error code in cmd_db_dev_probe()
media: davinci-isif: avoid uninitialized variable use
media: tw5864: Fix possible NULL pointer dereference in tw5864_handle_frame
spi: tegra114: clear packed bit for unpacked mode
spi: tegra114: fix for unpacked mode transfers
spi: tegra114: terminate dma and reset on transfer timeout
spi: tegra114: flush fifos
spi: tegra114: configure dma burst size to fifo trig level
bus: ti-sysc: Fix sysc_unprepare() when no clocks have been allocated
soc/fsl/qe: Fix an error code in qe_pin_request()
spi: bcm2835aux: fix driver to not allow 65535 (=-1) cs-gpios
drm/fb-helper: generic: Call drm_client_add() after setup is done
arm64/vdso: don't leak kernel addresses
rtc: Fix timestamp value for RTC_TIMESTAMP_BEGIN_1900
rtc: mt6397: Don't call irq_dispose_mapping.
ehea: Fix a copy-paste err in ehea_init_port_res
bpf: Add missed newline in verifier verbose log
drm/vmwgfx: Remove set but not used variable 'restart'
scsi: qla2xxx: Unregister chrdev if module initialization fails
of: use correct function prototype for of_overlay_fdt_apply()
net/sched: cbs: fix port_rate miscalculation
clk: qcom: Skip halt checks on gcc_pcie_0_pipe_clk for 8998
ACPI: button: reinitialize button state upon resume
firmware: arm_scmi: fix of_node leak in scmi_mailbox_check
rxrpc: Fix detection of out of order acks
scsi: target/core: Fix a race condition in the LUN lookup code
brcmfmac: fix leak of mypkt on error return path
ARM: pxa: ssp: Fix "WARNING: invalid free of devm_ allocated data"
PCI: rockchip: Fix rockchip_pcie_ep_assert_intx() bitwise operations
net: hns3: fix for vport->bw_limit overflow problem
hwmon: (w83627hf) Use request_muxed_region for Super-IO accesses
perf/core: Fix the address filtering fix
staging: android: vsoc: fix copy_from_user overrun
PCI: dwc: Fix dw_pcie_ep_find_capability() to return correct capability offset
soc: amlogic: meson-gx-pwrc-vpu: Fix power on/off register bitmask
platform/x86: alienware-wmi: fix kfree on potentially uninitialized pointer
tipc: set sysctl_tipc_rmem and named_timeout right range
usb: typec: tcpm: Notify the tcpc to start connection-detection for SRPs
selftests/ipc: Fix msgque compiler warnings
net: hns3: fix loop condition of hns3_get_tx_timeo_queue_info()
powerpc: vdso: Make vdso32 installation conditional in vdso_install
ARM: dts: ls1021: Fix SGMII PCS link remaining down after PHY disconnect
media: ov2659: fix unbalanced mutex_lock/unlock
6lowpan: Off by one handling ->nexthdr
dmaengine: axi-dmac: Don't check the number of frames for alignment
ALSA: usb-audio: Handle the error from snd_usb_mixer_apply_create_quirk()
afs: Fix AFS file locking to allow fine grained locks
afs: Further fix file locking
NFS: Don't interrupt file writeout due to fatal errors
coresight: catu: fix clang build warning
s390/kexec_file: Fix potential segment overlap in ELF loader
irqchip/gic-v3-its: fix some definitions of inner cacheability attributes
scsi: qla2xxx: Fix a format specifier
scsi: qla2xxx: Fix error handling in qlt_alloc_qfull_cmd()
scsi: qla2xxx: Avoid that qlt_send_resp_ctio() corrupts memory
KVM: PPC: Book3S HV: Fix lockdep warning when entering the guest
netfilter: nft_flow_offload: add entry to flowtable after confirmation
PCI: iproc: Enable iProc config read for PAXBv2
ARM: dts: logicpd-som-lv: Fix MMC1 card detect
packet: in recvmsg msg_name return at least sizeof sockaddr_ll
ASoC: fix valid stream condition
usb: gadget: fsl: fix link error against usb-gadget module
dwc2: gadget: Fix completed transfer size calculation in DDMA
IB/mlx5: Add missing XRC options to QP optional params mask
RDMA/rxe: Consider skb reserve space based on netdev of GID
iommu/vt-d: Make kernel parameter igfx_off work with vIOMMU
net: ena: fix swapped parameters when calling ena_com_indirect_table_fill_entry
net: ena: fix: Free napi resources when ena_up() fails
net: ena: fix incorrect test of supported hash function
net: ena: fix ena_com_fill_hash_function() implementation
dmaengine: tegra210-adma: restore channel status
watchdog: rtd119x_wdt: Fix remove function
mmc: core: fix possible use after free of host
lightnvm: pblk: fix lock order in pblk_rb_tear_down_check
ath10k: Fix encoding for protected management frames
afs: Fix the afs.cell and afs.volume xattr handlers
vfio/mdev: Avoid release parent reference during error path
vfio/mdev: Follow correct remove sequence
vfio/mdev: Fix aborting mdev child device removal if one fails
l2tp: Fix possible NULL pointer dereference
ALSA: aica: Fix a long-time build breakage
media: omap_vout: potential buffer overflow in vidioc_dqbuf()
media: davinci/vpbe: array underflow in vpbe_enum_outputs()
platform/x86: alienware-wmi: printing the wrong error code
crypto: caam - fix caam_dump_sg that iterates through scatterlist
netfilter: ebtables: CONFIG_COMPAT: reject trailing data after last rule
pwm: meson: Consider 128 a valid pre-divider
pwm: meson: Don't disable PWM when setting duty repeatedly
ARM: riscpc: fix lack of keyboard interrupts after irq conversion
nfp: bpf: fix static check error through tightening shift amount adjustment
kdb: do a sanity check on the cpu in kdb_per_cpu()
netfilter: nf_tables: correct NFT_LOGLEVEL_MAX value
backlight: lm3630a: Return 0 on success in update_status functions
thermal: rcar_gen3_thermal: fix interrupt type
thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power
EDAC/mc: Fix edac_mc_find() in case no device is found
afs: Fix key leak in afs_release() and afs_evict_inode()
afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set
afs: Fix lock-wait/callback-break double locking
afs: Fix double inc of vnode->cb_break
ARM: dts: sun8i-h3: Fix wifi in Beelink X2 DT
clk: meson: gxbb: no spread spectrum on mpll0
clk: meson: axg: spread spectrum is on mpll2
dmaengine: tegra210-adma: Fix crash during probe
arm64: dts: meson: libretech-cc: set eMMC as removable
RDMA/qedr: Fix incorrect device rate.
spi: spi-fsl-spi: call spi_finalize_current_message() at the end
crypto: ccp - fix AES CFB error exposed by new test vectors
crypto: ccp - Fix 3DES complaint from ccp-crypto module
serial: stm32: fix word length configuration
serial: stm32: fix rx error handling
serial: stm32: fix rx data length when parity enabled
serial: stm32: fix transmit_chars when tx is stopped
serial: stm32: Add support of TC bit status check
serial: stm32: fix wakeup source initialization
misc: sgi-xp: Properly initialize buf in xpc_get_rsvd_page_pa
iommu: Add missing new line for dma type
iommu: Use right function to get group for device
signal/bpfilter: Fix bpfilter_kernl to use send_sig not force_sig
signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig
inet: frags: call inet_frags_fini() after unregister_pernet_subsys()
net: hns3: fix a memory leak issue for hclge_map_unmap_ring_to_vf_vector
crypto: talitos - fix AEAD processing.
netvsc: unshare skb in VF rx handler
net: core: support XDP generic on stacked devices.
RDMA/uverbs: check for allocation failure in uapi_add_elm()
net: don't clear sock->sk early to avoid trouble in strparser
phy: qcom-qusb2: fix missing assignment of ret when calling clk_prepare_enable
cpufreq: brcmstb-avs-cpufreq: Fix initial command check
cpufreq: brcmstb-avs-cpufreq: Fix types for voltage/frequency
clk: sunxi-ng: sun50i-h6-r: Fix incorrect W1 clock gate register
media: vivid: fix incorrect assignment operation when setting video mode
crypto: inside-secure - fix zeroing of the request in ahash_exit_inv
crypto: inside-secure - fix queued len computation
arm64: dts: renesas: ebisu: Remove renesas, no-ether-link property
mpls: fix warning with multi-label encap
serial: stm32: fix a recursive locking in stm32_config_rs485
arm64: dts: meson-gxm-khadas-vim2: fix gpio-keys-polled node
arm64: dts: meson-gxm-khadas-vim2: fix Bluetooth support
iommu/vt-d: Duplicate iommu_resv_region objects per device list
phy: usb: phy-brcm-usb: Remove sysfs attributes upon driver removal
firmware: arm_scmi: fix bitfield definitions for SENSOR_DESC attributes
firmware: arm_scmi: update rate_discrete in clock_describe_rates_get
ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
ASoC: meson: axg-tdmin: right_j is not supported
ASoC: meson: axg-tdmout: right_j is not supported
qed: iWARP - Use READ_ONCE and smp_store_release to access ep->state
qed: iWARP - fix uninitialized callback
powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuild
powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration
bpf: fix the check that forwarding is enabled in bpf_ipv6_fib_lookup
IB/hfi1: Handle port down properly in pio
drm/msm/mdp5: Fix mdp5_cfg_init error return
net: netem: fix backlog accounting for corrupted GSO frames
net/udp_gso: Allow TX timestamp with UDP GSO
net/af_iucv: build proper skbs for HiperTransport
net/af_iucv: always register net_device notifier
ASoC: ti: davinci-mcasp: Fix slot mask settings when using multiple AXRs
rtc: pcf8563: Fix interrupt trigger method
rtc: pcf8563: Clear event flags and disable interrupts before requesting irq
ARM: dts: iwg20d-q7-common: Fix SDHI1 VccQ regularor
net/sched: cbs: Fix error path of cbs_module_init
arm64: dts: allwinner: h6: Pine H64: Add interrupt line for RTC
drm/msm/a3xx: remove TPL1 regs from snapshot
ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1()
perf/ioctl: Add check for the sample_period value
dmaengine: hsu: Revert "set HSU_CH_MTSR to memory width"
clk: qcom: Fix -Wunused-const-variable
nvmem: imx-ocotp: Ensure WAIT bits are preserved when setting timing
nvmem: imx-ocotp: Change TIMING calculation to u-boot algorithm
tools: bpftool: use correct argument in cgroup errors
backlight: pwm_bl: Fix heuristic to determine number of brightness levels
fork,memcg: alloc_thread_stack_node needs to set tsk->stack
bnxt_en: Fix ethtool selftest crash under error conditions.
bnxt_en: Suppress error messages when querying DSCP DCB capabilities.
iommu/amd: Make iommu_disable safer
mfd: intel-lpss: Release IDA resources
rxrpc: Fix uninitialized error code in rxrpc_send_data_packet()
xprtrdma: Fix use-after-free in rpcrdma_post_recvs
um: Fix IRQ controller regression on console read
PM: ACPI/PCI: Resume all devices during hibernation
ACPI: PM: Simplify and fix PM domain hibernation callbacks
ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS
fsi/core: Fix error paths on CFAM init
devres: allow const resource arguments
fsi: sbefifo: Don't fail operations when in SBE IPL state
RDMA/hns: Fixs hw access invalid dma memory error
PCI: mobiveil: Remove the flag MSI_FLAG_MULTI_PCI_MSI
PCI: mobiveil: Fix devfn check in mobiveil_pcie_valid_device()
PCI: mobiveil: Fix the valid check for inbound and outbound windows
ceph: fix "ceph.dir.rctime" vxattr value
net: pasemi: fix an use-after-free in pasemi_mac_phy_init()
net/tls: fix socket wmem accounting on fallback with netem
x86/pgtable/32: Fix LOWMEM_PAGES constant
xdp: fix possible cq entry leak
ARM: stm32: use "depends on" instead of "if" after prompt
scsi: libfc: fix null pointer dereference on a null lport
xfrm interface: ifname may be wrong in logs
drm/panel: make drm_panel.h self-contained
clk: sunxi-ng: v3s: add the missing PLL_DDR1
PM: sleep: Fix possible overflow in pm_system_cancel_wakeup()
libertas_tf: Use correct channel range in lbtf_geo_init
qed: reduce maximum stack frame size
usb: host: xhci-hub: fix extra endianness conversion
media: rcar-vin: Clean up correct notifier in error path
mic: avoid statically declaring a 'struct device'.
x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI
crypto: ccp - Reduce maximum stack usage
ALSA: aoa: onyx: always initialize register read value
arm64: dts: renesas: r8a77995: Fix register range of display node
tipc: reduce risk of wakeup queue starvation
ARM: dts: stm32: add missing vdda-supply to adc on stm32h743i-eval
net/mlx5: Fix mlx5_ifc_query_lag_out_bits
cifs: fix rmmod regression in cifs.ko caused by force_sig changes
iio: tsl2772: Use devm_add_action_or_reset for tsl2772_chip_off
net: fix bpf_xdp_adjust_head regression for generic-XDP
spi: bcm-qspi: Fix BSPI QUAD and DUAL mode support when using flex mode
cxgb4: smt: Add lock for atomic_dec_and_test
crypto: caam - free resources in case caam_rng registration failed
ext4: set error return correctly when ext4_htree_store_dirent fails
RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver
RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver
ASoC: es8328: Fix copy-paste error in es8328_right_line_controls
ASoC: cs4349: Use PM ops 'cs4349_runtime_pm'
ASoC: wm8737: Fix copy-paste error in wm8737_snd_controls
net/rds: Add a few missing rds_stat_names entries
tools: bpftool: fix arguments for p_err() in do_event_pipe()
tools: bpftool: fix format strings and arguments for jsonw_printf()
drm: rcar-du: lvds: Fix bridge_to_rcar_lvds
bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
signal: Allow cifs and drbd to receive their terminating signals
powerpc/64s/radix: Fix memory hot-unplug page table split
ASoC: sun4i-i2s: RX and TX counter registers are swapped
dmaengine: dw: platform: Switch to acpi_dma_controller_register()
rtc: rv3029: revert error handling patch to rv3029_eeprom_write()
mac80211: minstrel_ht: fix per-group max throughput rate initialization
i40e: reduce stack usage in i40e_set_fc
media: atmel: atmel-isi: fix timeout value for stop streaming
ARM: 8896/1: VDSO: Don't leak kernel addresses
rtc: pcf2127: bugfix: read rtc disables watchdog
mips: avoid explicit UB in assignment of mips_io_port_base
media: em28xx: Fix exception handling in em28xx_alloc_urbs()
iommu/mediatek: Fix iova_to_phys PA start for 4GB mode
ahci: Do not export local variable ahci_em_messages
rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]
Partially revert "kfifo: fix kfifo_alloc() and kfifo_init()"
hwmon: (lm75) Fix write operations for negative temperatures
net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
power: supply: Init device wakeup after device_add()
x86, perf: Fix the dependency of the x86 insn decoder selftest
staging: greybus: light: fix a couple double frees
irqdomain: Add the missing assignment of domain->fwnode for named fwnode
bcma: fix incorrect update of BCMA_CORE_PCI_MDIO_DATA
usb: typec: tps6598x: Fix build error without CONFIG_REGMAP_I2C
bcache: Fix an error code in bch_dump_read()
iio: dac: ad5380: fix incorrect assignment to val
netfilter: ctnetlink: honor IPS_OFFLOAD flag
ath9k: dynack: fix possible deadlock in ath_dynack_node_{de}init
wcn36xx: use dynamic allocation for large variables
tty: serial: fsl_lpuart: Use appropriate lpuart32_* I/O funcs
ARM: dts: aspeed-g5: Fixe gpio-ranges upper limit
xsk: avoid store-tearing when assigning queues
xsk: avoid store-tearing when assigning umem
led: triggers: Fix dereferencing of null pointer
net: sonic: return NETDEV_TX_OK if failed to map buffer
net: hns3: fix error VF index when setting VLAN offload
rtlwifi: Fix file release memory leak
ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
f2fs: fix wrong error injection path in inc_valid_block_count()
f2fs: fix error path of f2fs_convert_inline_page()
scsi: fnic: fix msix interrupt allocation
Btrfs: fix hang when loading existing inode cache off disk
Btrfs: fix inode cache waiters hanging on failure to start caching thread
Btrfs: fix inode cache waiters hanging on path allocation failure
btrfs: use correct count in btrfs_file_write_iter()
ixgbe: sync the first fragment unconditionally
hwmon: (shtc1) fix shtc1 and shtw1 id mask
net: sonic: replace dev_kfree_skb in sonic_send_packet
pinctrl: iproc-gpio: Fix incorrect pinconf configurations
gpio/aspeed: Fix incorrect number of banks
ath10k: adjust skb length in ath10k_sdio_mbox_rx_packet
RDMA/cma: Fix false error message
net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names'
um: Fix off by one error in IRQ enumeration
bnxt_en: Increase timeout for HWRM_DBG_COREDUMP_XX commands
f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()
mailbox: qcom-apcs: fix max_register value
clk: actions: Fix factor clk struct member access
powerpc/mm/mce: Keep irqs disabled during lockless page table walk
bpf: fix BTF limits
crypto: hisilicon - Matching the dma address for dma_pool_free()
iommu/amd: Wait for completion of IOTLB flush in attach_device
net: aquantia: Fix aq_vec_isr_legacy() return value
cxgb4: Signedness bug in init_one()
net: hisilicon: Fix signedness bug in hix5hd2_dev_probe()
net: broadcom/bcmsysport: Fix signedness in bcm_sysport_probe()
net: netsec: Fix signedness bug in netsec_probe()
net: socionext: Fix a signedness bug in ave_probe()
net: stmmac: dwmac-meson8b: Fix signedness bug in probe
net: axienet: fix a signedness bug in probe
of: mdio: Fix a signedness bug in of_phy_get_and_connect()
net: nixge: Fix a signedness bug in nixge_probe()
net: ethernet: stmmac: Fix signedness bug in ipq806x_gmac_of_parse()
net: sched: cbs: Avoid division by zero when calculating the port rate
nvme: retain split access workaround for capability reads
net: stmmac: gmac4+: Not all Unicast addresses may be available
rxrpc: Fix trace-after-put looking at the put connection record
mac80211: accept deauth frames in IBSS mode
llc: fix another potential sk_buff leak in llc_ui_sendmsg()
llc: fix sk_buff refcounting in llc_conn_state_process()
ip6erspan: remove the incorrect mtu limit for ip6erspan
net: stmmac: fix length of PTP clock's name string
net: stmmac: fix disabling flexible PPS output
sctp: add chunks to sk_backlog when the newsk sk_socket is not set
s390/qeth: Fix error handling during VNICC initialization
s390/qeth: Fix initialization of vnicc cmd masks during set online
act_mirred: Fix mirred_init_module error handling
net: avoid possible false sharing in sk_leave_memory_pressure()
net: add {READ|WRITE}_ONCE() annotations on ->rskq_accept_head
tcp: annotate lockless access to tcp_memory_pressure
net/smc: receive returns without data
net/smc: receive pending data after RCV_SHUTDOWN
drm/msm/dsi: Implement reset correctly
vhost/test: stop device before reset
dmaengine: imx-sdma: fix size check for sdma script_number
firmware: dmi: Fix unlikely out-of-bounds read in save_mem_devices
arm64: hibernate: check pgd table allocation
net: netem: fix error path for corrupted GSO frames
net: netem: correct the parent's backlog when corrupted packet was dropped
xsk: Fix registration of Rx-only sockets
bpf, offload: Unlock on error in bpf_offload_dev_create()
afs: Fix missing timeout reset
net: qca_spi: Move reset_count to struct qcaspi
hv_netvsc: Fix offset usage in netvsc_send_table()
hv_netvsc: Fix send_table offset in case of a host bug
afs: Fix large file support
drm: panel-lvds: Potential Oops in probe error handling
hwrng: omap3-rom - Fix missing clock by probing with device tree
dpaa_eth: perform DMA unmapping before read
dpaa_eth: avoid timestamp read on error paths
MIPS: Loongson: Fix return value of loongson_hwmon_init
hv_netvsc: flag software created hash value
net: neigh: use long type to store jiffies delta
packet: fix data-race in fanout_flow_is_huge()
i2c: stm32f7: report dma error during probe
mmc: sdio: fix wl1251 vendor id
mmc: core: fix wl1251 sdio quirks
affs: fix a memory leak in affs_remount
afs: Remove set but not used variables 'before', 'after'
dmaengine: ti: edma: fix missed failure handling
drm/radeon: fix bad DMA from INTERRUPT_CNTL2
arm64: dts: juno: Fix UART frequency
samples/bpf: Fix broken xdp_rxq_info due to map order assumptions
usb: dwc3: Allow building USB_DWC3_QCOM without EXTCON
IB/iser: Fix dma_nents type definition
serial: stm32: fix clearing interrupt error flags
arm64: dts: meson-gxm-khadas-vim2: fix uart_A bluetooth node
m68k: Call timer_interrupt() with interrupts disabled
Linux 4.19.99
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ieabeab79ea5c8cb4b6b1552702fa5d6100cea5db
[ Upstream commit 1bf4580e00 ]
Commit 5eed6f1dff ("fork,memcg: fix crash in free_thread_stack on
memcg charge fail") corrected two instances, but there was a third
instance of this bug.
Without setting tsk->stack, if memcg_charge_kernel_stack fails, it'll
execute free_thread_stack() on a dangling pointer.
Enterprise kernels are compiled with VMAP_STACK=y so this isn't
critical, but custom VMAP_STACK=n builds should have some performance
advantage, with the drawback of risking to fail fork because compaction
didn't succeed. So as long as VMAP_STACK=n is a supported option it's
worth fixing it upstream.
Link: http://lkml.kernel.org/r/20190619011450.28048-1-aarcange@redhat.com
Fixes: 9b6f7e163c ("mm: rework memcg kernel stack accounting")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba4a45746c ]
Commit 5eed6f1dff ("fork,memcg: fix crash in free_thread_stack on
memcg charge fail") fixes a crash caused due to failed memcg charge of
the kernel stack. However the fix misses the cached_stacks case which
this patch fixes. So, the same crash can happen if the memcg charge of
a cached stack is failed.
Link: http://lkml.kernel.org/r/20190102180145.57406-1-shakeelb@google.com
Fixes: 5eed6f1dff ("fork,memcg: fix crash in free_thread_stack on memcg charge fail")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5eed6f1dff ]
Commit 9b6f7e163c ("mm: rework memcg kernel stack accounting") will
result in fork failing if allocating a kernel stack for a task in
dup_task_struct exceeds the kernel memory allowance for that cgroup.
Unfortunately, it also results in a crash.
This is due to the code jumping to free_stack and calling
free_thread_stack when the memcg kernel stack charge fails, but without
tsk->stack pointing at the freshly allocated stack.
This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:
#5 [ffffc900244efc88] die at ffffffff8101f0ab
#6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
#7 [ffffc900244efce0] general_protection at ffffffff818ff082
[exception RIP: llist_add_batch+7]
RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000
RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a
RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0
R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
#9 [ffffc900244efdb8] copy_process at ffffffff81086e37
#10 [ffffc900244efe98] _do_fork at ffffffff810884e0
#11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
#12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948
RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421
RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040
R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018
ORIG_RAX: 000000000000003a CS: 0033 SS: 002b
The simplest fix is to assign tsk->stack right where it is allocated.
Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e163c ("mm: rework memcg kernel stack accounting")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl3owgEACgkQONu9yGCS
aT43zw//SS1As83XXuHr4mdWIVDjXo6RMJ6Ib7YbRi/uhBmQuUuGVFcqGxUIA9Kl
eSXu5Kt8TNmInzHq9AMYgegrELAEwPD2XfptALGDwiUHonQuiFaqOQn/bltJOm1L
PsG15A7+/gFhuhPJDp2ZfNBmZGdpXdIwD27oUDqF1XD64dMa/HPbFUVgxWn3HHkd
sm0J6Ez0eNA+BmLnHXYDiSaEYIiwvy1nN6XpyIfOyb2Tz6kPoe0vVWU00Cmy8KAU
EIWB+TBRunspgMsShL5Cl1MSFOxf9QOmgnZxcrODAQfb1TbLMACB1FGMjK4nLm+3
wPlSnC7L49ARl/pvmN5NOUrjHi8S8qq/Od9QW+UIckRI6KzOU832h99v4gFuHjSC
KFiLi5K9+uTIMgNOETmINBiKKUcUzYXYVajvm4tuAUq3HO8wy6jeALtt34OiJZQZ
DV8wyBdL9NDUFqBymFaMFA4Us/fGIREzvPgI0E0jth2ANuLFLtScrnStuWv8buwJ
JT3V9xCxHZtZ3Ctevx/Jp6OaQtnbSnWjMjrO0UDzZ6N7+g5UKmh9/R3xL6sBpFVU
Vu49J+qWU3VmbY3EIulel+yARNe7xS4ExK185JmNzpYFyOpXum14FHhhtQ6xNSeu
dRqyITI0KYP7jWtBDKCgVAWF5jC9gHP1ksrHSZMhyGrv1dC1XZM=
=KnJW
-----END PGP SIGNATURE-----
Merge 4.19.88 into android-4.19
Changes in 4.19.88
clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate
clocksource/drivers/mediatek: Fix error handling
ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX
ASoC: compress: fix unsigned integer overflow check
reset: Fix memory leak in reset_control_array_put()
clk: samsung: exynos5433: Fix error paths
ASoC: kirkwood: fix external clock probe defer
ASoC: kirkwood: fix device remove ordering
clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume
pinctrl: cherryview: Allocate IRQ chip dynamic
ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts
reset: fix reset_control_ops kerneldoc comment
clk: at91: avoid sleeping early
clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup
clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18
ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend
samples/bpf: fix build by setting HAVE_ATTR_TEST to zero
powerpc/bpf: Fix tail call implementation
idr: Fix integer overflow in idr_for_each_entry
idr: Fix idr_alloc_u32 on 32-bit systems
x86/resctrl: Prevent NULL pointer dereference when reading mondata
clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call
clk: ti: clkctrl: Fix failed to enable error with double udelay timeout
net: fec: add missed clk_disable_unprepare in remove
bridge: ebtables: don't crash when using dnat target in output chains
can: peak_usb: report bus recovery as well
can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open
can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak
can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max
can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM
can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors
can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error
can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error
can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails
can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition
watchdog: meson: Fix the wrong value of left time
ASoC: stm32: sai: add restriction on mmap support
scripts/gdb: fix debugging modules compiled with hot/cold partitioning
net: bcmgenet: use RGMII loopback for MAC reset
net: bcmgenet: reapply manual settings to the PHY
net: mscc: ocelot: fix __ocelot_rmw_ix prototype
ceph: return -EINVAL if given fsc mount option on kernel w/o support
net/fq_impl: Switch to kvmalloc() for memory allocation
mac80211: fix station inactive_time shortly after boot
block: drbd: remove a stray unlock in __drbd_send_protocol()
pwm: bcm-iproc: Prevent unloading the driver module while in use
scsi: target/tcmu: Fix queue_cmd_ring() declaration
scsi: lpfc: Fix kernel Oops due to null pring pointers
scsi: lpfc: Fix dif and first burst use in write commands
ARM: dts: Fix up SQ201 flash access
tracing: Lock event_mutex before synth_event_mutex
ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed
ARM: dts: imx51: Fix memory node duplication
ARM: dts: imx53: Fix memory node duplication
ARM: dts: imx31: Fix memory node duplication
ARM: dts: imx35: Fix memory node duplication
ARM: dts: imx7: Fix memory node duplication
ARM: dts: imx6ul: Fix memory node duplication
ARM: dts: imx6sx: Fix memory node duplication
ARM: dts: imx6sl: Fix memory node duplication
ARM: dts: imx50: Fix memory node duplication
ARM: dts: imx23: Fix memory node duplication
ARM: dts: imx1: Fix memory node duplication
ARM: dts: imx27: Fix memory node duplication
ARM: dts: imx25: Fix memory node duplication
ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication
parisc: Fix serio address output
parisc: Fix HP SDC hpa address output
ARM: dts: Fix hsi gdd range for omap4
arm64: mm: Prevent mismatched 52-bit VA support
arm64: smp: Handle errors reported by the firmware
bus: ti-sysc: Check for no-reset and no-idle flags at the child level
platform/x86: mlx-platform: Fix LED configuration
ARM: OMAP1: fix USB configuration for device-only setups
RDMA/hns: Fix the bug while use multi-hop of pbl
arm64: preempt: Fix big-endian when checking preempt count in assembly
RDMA/vmw_pvrdma: Use atomic memory allocation in create AH
PM / AVS: SmartReflex: NULL check before some freeing functions is not needed
xfs: zero length symlinks are not valid
ARM: ks8695: fix section mismatch warning
ACPI / LPSS: Ignore acpi_device_fix_up_power() return value
scsi: lpfc: Enable Management features for IF_TYPE=6
scsi: qla2xxx: Fix NPIV handling for FC-NVMe
scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port
nvme: provide fallback for discard alloc failure
s390/zcrypt: make sysfs reset attribute trigger queue reset
crypto: user - support incremental algorithm dumps
arm64: dts: renesas: draak: Fix CVBS input
mwifiex: fix potential NULL dereference and use after free
mwifiex: debugfs: correct histogram spacing, formatting
brcmfmac: set F2 watermark to 256 for 4373
brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373
rtl818x: fix potential use after free
bcache: do not check if debug dentry is ERR or NULL explicitly on remove
bcache: do not mark writeback_running too early
xfs: require both realtime inodes to mount
nvme: fix kernel paging oops
ubifs: Fix default compression selection in ubifs
ubi: Put MTD device after it is not used
ubi: Do not drop UBI device reference before using
microblaze: adjust the help to the real behavior
microblaze: move "... is ready" messages to arch/microblaze/Makefile
microblaze: fix multiple bugs in arch/microblaze/boot/Makefile
iwlwifi: move iwl_nvm_check_version() into dvm
iwlwifi: mvm: force TCM re-evaluation on TCM resume
iwlwifi: pcie: fix erroneous print
iwlwifi: pcie: set cmd_len in the correct place
gpio: pca953x: Fix AI overflow on PCAL6524
gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB
kvm: vmx: Set IA32_TSC_AUX for legacy mode guests
Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS"
Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()"
crypto/chelsio/chtls: listen fails with multiadapt
VSOCK: bind to random port for VMADDR_PORT_ANY
mmc: meson-gx: make sure the descriptor is stopped on errors
mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET
usb: ehci-omap: Fix deferred probe for phy handling
btrfs: Check for missing device before bio submission in btrfs_map_bio
btrfs: fix ncopies raid_attr for RAID56
btrfs: dev-replace: set result code of cancel by status of scrub
Btrfs: allow clear_extent_dirty() to receive a cached extent state record
btrfs: only track ref_heads in delayed_ref_updates
serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback
HID: intel-ish-hid: fixes incorrect error handling
gpio: raspberrypi-exp: decrease refcount on firmware dt node
serial: 8250: Rate limit serial port rx interrupts during input overruns
kprobes/x86/xen: blacklist non-attachable xen interrupt functions
xen/pciback: Check dev_data before using it
kprobes: Blacklist symbols in arch-defined prohibited area
kprobes/x86: Show x86-64 specific blacklisted symbols correctly
vfio-mdev/samples: Use u8 instead of char for handle functions
memory: omap-gpmc: Get the header of the enum
pinctrl: xway: fix gpio-hog related boot issues
net/mlx5: Continue driver initialization despite debugfs failure
netfilter: nf_nat_sip: fix RTP/RTCP source port translations
exofs_mount(): fix leaks on failure exits
bnxt_en: Return linux standard errors in bnxt_ethtool.c
bnxt_en: Save ring statistics before reset.
bnxt_en: query force speeds before disabling autoneg mode.
KVM: s390: unregister debug feature on failing arch init
pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width
pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration
pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10
HID: doc: fix wrong data structure reference for UHID_OUTPUT
dm flakey: Properly corrupt multi-page bios.
gfs2: take jdata unstuff into account in do_grow
dm raid: fix false -EBUSY when handling check/repair message
xfs: Align compat attrlist_by_handle with native implementation.
xfs: Fix bulkstat compat ioctls on x32 userspace.
IB/qib: Fix an error code in qib_sdma_verbs_send()
clocksource/drivers/fttmr010: Fix invalid interrupt register access
vxlan: Fix error path in __vxlan_dev_create()
powerpc/book3s/32: fix number of bats in p/v_block_mapped()
powerpc/xmon: fix dump_segments()
drivers/regulator: fix a missing check of return value
Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading
serial: max310x: Fix tx_empty() callback
openrisc: Fix broken paths to arch/or32
RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer
scsi: qla2xxx: deadlock by configfs_depend_item
scsi: csiostor: fix incorrect dma device in case of vport
brcmfmac: Fix access point mode
ath6kl: Only use match sets when firmware supports it
ath6kl: Fix off by one error in scan completion
powerpc/perf: Fix unit_sel/cache_sel checks
powerpc/32: Avoid unsupported flags with clang
powerpc/prom: fix early DEBUG messages
powerpc/mm: Make NULL pointer deferences explicit on bad page faults.
powerpc/44x/bamboo: Fix PCI range
vfio/spapr_tce: Get rid of possible infinite loop
powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
drbd: ignore "all zero" peer volume sizes in handshake
drbd: reject attach of unsuitable uuids even if connected
drbd: do not block when adjusting "disk-options" while IO is frozen
drbd: fix print_st_err()'s prototype to match the definition
IB/rxe: Make counters thread safe
bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't
regulator: tps65910: fix a missing check of return value
powerpc/83xx: handle machine check caused by watchdog timer
powerpc/pseries: Fix node leak in update_lmb_associativity_index()
powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y
crypto: mxc-scc - fix build warnings on ARM64
pwm: clps711x: Fix period calculation
net/netlink_compat: Fix a missing check of nla_parse_nested
net/net_namespace: Check the return value of register_pernet_subsys()
f2fs: fix block address for __check_sit_bitmap
f2fs: fix to dirty inode synchronously
um: Include sys/uio.h to have writev()
um: Make GCOV depend on !KCOV
net: (cpts) fix a missing check of clk_prepare
net: stmicro: fix a missing check of clk_prepare
net: dsa: bcm_sf2: Propagate error value from mdio_write
atl1e: checking the status of atl1e_write_phy_reg
tipc: fix a missing check of genlmsg_put
net: marvell: fix a missing check of acpi_match_device
net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe()
ocfs2: clear journal dirty flag after shutdown journal
vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n
mm/page_alloc.c: free order-0 pages through PCP in page_frag_free()
mm/page_alloc.c: use a single function to free page
mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free()
tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures
netfilter: nf_tables: fix a missing check of nla_put_failure
xprtrdma: Prevent leak of rpcrdma_rep objects
infiniband: bnxt_re: qplib: Check the return value of send_message
infiniband/qedr: Potential null ptr dereference of qp
firmware: arm_sdei: fix wrong of_node_put() in init function
firmware: arm_sdei: Fix DT platform device creation
lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk
lib/genalloc.c: use vzalloc_node() to allocate the bitmap
fork: fix some -Wmissing-prototypes warnings
drivers/base/platform.c: kmemleak ignore a known leak
lib/genalloc.c: include vmalloc.h
mtd: Check add_mtd_device() ret code
tipc: fix memory leak in tipc_nl_compat_publ_dump
net/core/neighbour: tell kmemleak about hash tables
ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs
PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
net/core/neighbour: fix kmemleak minimal reference count for hash tables
serial: 8250: Fix serial8250 initialization crash
gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change
sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
decnet: fix DN_IFREQ_SIZE
net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
net/smc: don't wait for send buffer space when data was already sent
mm/hotplug: invalid PFNs from pfn_to_online_page()
xfs: end sync buffer I/O properly on shutdown error
net/smc: fix sender_free computation
blktrace: Show requests without sector
net/smc: fix byte_order for rx_curs_confirmed
tipc: fix skb may be leaky in tipc_link_input
ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI
sfc: initialise found bitmap in efx_ef10_mtd_probe
geneve: change NET_UDP_TUNNEL dependency to select
net: fix possible overflow in __sk_mem_raise_allocated()
net: ip_gre: do not report erspan_ver for gre or gretap
net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap
sctp: don't compare hb_timer expire date before starting it
bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()
mmc: core: align max segment size with logical block size
net: dev: Use unsigned integer as an argument to left-shift
kvm: properly check debugfs dentry before using it
bpf: drop refcount if bpf_map_new_fd() fails in map_create()
net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED
net: hns3: fix PFC not setting problem for DCB module
net: hns3: fix an issue for hclgevf_ae_get_hdev
net: hns3: fix an issue for hns3_update_new_int_gl
iommu/amd: Fix NULL dereference bug in match_hid_uid
apparmor: delete the dentry in aafs_remove() to avoid a leak
scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery
ACPI / APEI: Don't wait to serialise with oops messages when panic()ing
ACPI / APEI: Switch estatus pool to use vmalloc memory
scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned
scsi: libsas: Check SMP PHY control function result
RDMA/hns: Fix the bug with updating rq head pointer when flush cqe
RDMA/hns: Bugfix for the scene without receiver queue
RDMA/hns: Fix the state of rereg mr
RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board
powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property()
xdp: fix cpumap redirect SKB creation bug
mtd: Remove a debug trace in mtdpart.c
mm, gup: add missing refcount overflow checks on s390
clk: at91: fix update bit maps on CFG_MOR write
clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated()
usb: dwc2: use a longer core rest timeout in dwc2_core_reset()
staging: rtl8192e: fix potential use after free
staging: rtl8723bs: Drop ACPI device ids
staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids
USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P
mei: bus: prefix device names on bus with the bus name
mei: me: add comet point V device id
thunderbolt: Power cycle the router if NVM authentication fails
xfrm: Fix memleak on xfrm state destroy
media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE
net: macb: fix error format in dev_err()
pwm: Clear chip_data in pwm_put()
media: atmel: atmel-isc: fix asd memory allocation
media: atmel: atmel-isc: fix INIT_WORK misplacement
macvlan: schedule bc_work even if error
net: psample: fix skb_over_panic
openvswitch: fix flow command message size
sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook
slip: Fix use-after-free Read in slip_open
openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
openvswitch: remove another BUG_ON()
selftests: bpf: test_sockmap: handle file creation failures gracefully
tipc: fix link name length check
sctp: cache netns in sctp_ep_common
net: sched: fix `tc -s class show` no bstats on class with nolock subqueues
net: macb: add missed tasklet_kill
ext4: add more paranoia checking in ext4_expand_extra_isize handling
watchdog: sama5d4: fix WDD value to be always set to max
net: macb: Fix SUBNS increment and increase resolution
net: macb driver, check for SKBTX_HW_TSTAMP
mtd: rawnand: atmel: Fix spelling mistake in error message
mtd: rawnand: atmel: fix possible object reference leak
mtd: spi-nor: cast to u64 to avoid uint overflows
drm/atmel-hlcdc: revert shift by 8
mailbox: stm32_ipcc: add spinlock to fix channels concurrent access
tcp: exit if nothing to retransmit on RTO timeout
HID: core: check whether Usage Page item is after Usage ID items
crypto: stm32/hash - Fix hmac issue more than 256 bytes
media: stm32-dcmi: fix DMA corruption when stopping streaming
media: stm32-dcmi: fix check of pm_runtime_get_sync return value
hwrng: stm32 - fix unbalanced pm_runtime_enable
clk: stm32mp1: fix HSI divider flag
clk: stm32mp1: fix mcu divider table
clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks
clk: stm32mp1: parent clocks update
mailbox: mailbox-test: fix null pointer if no mmio
pinctrl: stm32: fix memory leak issue
ASoC: stm32: i2s: fix dma configuration
ASoC: stm32: i2s: fix 16 bit format support
ASoC: stm32: i2s: fix IRQ clearing
ASoC: stm32: sai: add missing put_device()
dmaengine: stm32-dma: check whether length is aligned on FIFO threshold
platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer
platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size
net: fec: fix clock count mis-match
Linux 4.19.88
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifd3801a77cb551be72788031e7fcfc8a1d4fd197
[ Upstream commit fb5bf31722 ]
We get a warning when building kernel with W=1:
kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]
Add the missing declaration in head file to fix this.
Also, remove arch_release_thread_stack() completely because no arch
seems to implement it since bb9d81264 (arch: remove tile port).
Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This change adds generic support for Clang's Shadow Call Stack,
which uses a shadow stack to protect return addresses from being
overwritten by an attacker. Details are available here:
https://clang.llvm.org/docs/ShadowCallStack.html
Note that security guarantees in the kernel differ from the
ones documented for user space. The kernel must store addresses
of shadow stacks used by other tasks and interrupt handlers in
memory, which means an attacker capable reading and writing
arbitrary memory may be able to locate them and hijack control
flow by modifying shadow stacks that are not currently in use.
Bug: 145210207
Change-Id: Ia5f1650593fa95da4efcf86f84830a20989f161c
(am from https://lore.kernel.org/patchwork/patch/1149054/)
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2o0vkACgkQONu9yGCS
aT50rQ//So/ShGy5+D6cmsPVbd8p/9X0fM5D6VjgJ2pj+HbalxW7JhJ0tyVdfkO1
AS6p/24bmUMWI3pxJBbJjfggbyleV2UFqlHIei+SjfHd0sghZ+E/XY7mLdcKvx0a
dGdXuxF+enb18pNiUKUrJUgqksgfO40yMXyZhjGf5A+/JmYsNaJhkyCWgPfGIB5i
YuowOWC4Rkds/CQxyRSRJl+YSQWh8j7Ke0tdjhMQxnmyzFD0wcE5vUtq2vIP7yxO
ypPrz9zWTase3y1yCcPsXd0m4Ji1VpF99hlgAr0HLFNcZn2kNYFpyblfOrI/zfoK
RqUJgVVlZaWvhMOrueQO+XbebXdCWTJHiNTUDxIFakEAPNz+XkTQQf6LvzOjcFxB
oLHnp1qBCn72y6o6Q3XmauFcmYBokyfBvXDAJsXYr+X5B4akn/fpyzzBCInX0h+r
og5zpLbXDLszG3j76TPSpcjd+t4xqMy+MG+jkH/mLtMAFyaVi2sfMzsZJAQCACr2
wNHRFQQGjDmQjovkwSRYENQBUuITSj+VrBUD0M7lnfA/Fni3BAJaxY5I6WeEvbeW
ejzPVFZB8dfEy7hAKKiVEuI8/akcp8MUXfLJOfTxytLIpYllOBoVC4LtjgO5FYu3
grSbRuowjrlUCtnCU9H18avLE1ScFFEGTmPOqI6xDqKiZf59QsQ=
=sKVA
-----END PGP SIGNATURE-----
Merge 4.19.80 into android-4.19
Changes in 4.19.80
panic: ensure preemption is disabled during panic()
f2fs: use EINVAL for superblock with invalid magic
USB: rio500: Remove Rio 500 kernel driver
USB: yurex: Don't retry on unexpected errors
USB: yurex: fix NULL-derefs on disconnect
USB: usb-skeleton: fix runtime PM after driver unbind
USB: usb-skeleton: fix NULL-deref on disconnect
xhci: Fix false warning message about wrong bounce buffer write length
xhci: Prevent device initiated U1/U2 link pm if exit latency is too long
xhci: Check all endpoints for LPM timeout
xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based hosts
usb: xhci: wait for CNR controller not ready bit in xhci resume
xhci: Prevent deadlock when xhci adapter breaks during init
xhci: Increase STS_SAVE timeout in xhci_suspend()
USB: adutux: fix use-after-free on disconnect
USB: adutux: fix NULL-derefs on disconnect
USB: adutux: fix use-after-free on release
USB: iowarrior: fix use-after-free on disconnect
USB: iowarrior: fix use-after-free on release
USB: iowarrior: fix use-after-free after driver unbind
USB: usblp: fix runtime PM after driver unbind
USB: chaoskey: fix use-after-free on release
USB: ldusb: fix NULL-derefs on driver unbind
serial: uartlite: fix exit path null pointer
USB: serial: keyspan: fix NULL-derefs on open() and write()
USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20
USB: serial: option: add Telit FN980 compositions
USB: serial: option: add support for Cinterion CLS8 devices
USB: serial: fix runtime PM after driver unbind
USB: usblcd: fix I/O after disconnect
USB: microtek: fix info-leak at probe
USB: dummy-hcd: fix power budget for SuperSpeed mode
usb: renesas_usbhs: gadget: Do not discard queues in usb_ep_set_{halt,wedge}()
usb: renesas_usbhs: gadget: Fix usb_ep_set_{halt,wedge}() behavior
USB: legousbtower: fix slab info leak at probe
USB: legousbtower: fix deadlock on disconnect
USB: legousbtower: fix potential NULL-deref on disconnect
USB: legousbtower: fix open after failed reset request
USB: legousbtower: fix use-after-free on release
mei: me: add comet point (lake) LP device ids
mei: avoid FW version request on Ibex Peak and earlier
gpio: eic: sprd: Fix the incorrect EIC offset when toggling
Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc
staging: vt6655: Fix memory leak in vt6655_probe
iio: adc: hx711: fix bug in sampling of data
iio: adc: ad799x: fix probe error handling
iio: adc: axp288: Override TS pin bias current for some models
iio: light: opt3001: fix mutex unlock race
efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
perf llvm: Don't access out-of-scope array
perf inject jit: Fix JIT_CODE_MOVE filename
blk-wbt: fix performance regression in wbt scale_up/scale_down
CIFS: Gracefully handle QueryInfo errors during open
CIFS: Force revalidate inode when dentry is stale
CIFS: Force reval dentry if LOOKUP_REVAL flag is set
kernel/sysctl.c: do not override max_threads provided by userspace
mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
firmware: google: increment VPD key_len properly
gpiolib: don't clear FLAG_IS_OUT when emulating open-drain/open-source
iio: adc: stm32-adc: move registers definitions
iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic
btrfs: fix incorrect updating of log root tree
btrfs: fix uninitialized ret in ref-verify
NFS: Fix O_DIRECT accounting of number of bytes read/written
MIPS: Disable Loongson MMI instructions for kernel build
MIPS: elf_hwcap: Export userspace ASEs
ACPICA: ACPI 6.3: PPTT add additional fields in Processor Structure Flags
ACPI/PPTT: Add support for ACPI 6.3 thread flag
arm64: topology: Use PPTT to determine if PE is a thread
Fix the locking in dcache_readdir() and friends
media: stkwebcam: fix runtime PM after driver unbind
arm64/sve: Fix wrong free for task->thread.sve_state
tracing/hwlat: Report total time spent in all NMIs during the sample
tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
ftrace: Get a reference counter for the trace_array on filter files
tracing: Get trace_array reference for available_tracers files
hwmon: Fix HWMON_P_MIN_ALARM mask
x86/asm: Fix MWAITX C-state hint value
PCI: vmd: Fix config addressing when using bus offsets
perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization
Linux 4.19.80
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8ef1835585f79b752712a835852a4bc960ae3b97
commit b0f53dbc4b upstream.
Partially revert 16db3d3f11 ("kernel/sysctl.c: threads-max observe
limits") because the patch is causing a regression to any workload which
needs to override the auto-tuning of the limit provided by kernel.
set_max_threads is implementing a boot time guesstimate to provide a
sensible limit of the concurrently running threads so that runaways will
not deplete all the memory. This is a good thing in general but there
are workloads which might need to increase this limit for an application
to run (reportedly WebSpher MQ is affected) and that is simply not
possible after the mentioned change. It is also very dubious to
override an admin decision by an estimation that doesn't have any direct
relation to correctness of the kernel operation.
Fix this by dropping set_max_threads from sysctl_max_threads so any
value is accepted as long as it fits into MAX_THREADS which is important
to check because allowing more threads could break internal robust futex
restriction. While at it, do not use MIN_THREADS as the lower boundary
because it is also only a heuristic for automatic estimation and admin
might have a good reason to stop new threads to be created even when
below this limit.
This became more severe when we switched x86 from 4k to 8k kernel
stacks. Starting since 6538b8ea88 ("x86_64: expand kernel stack to
16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned
value.
In the particular case
3.12
kernel.threads-max = 515561
4.4
kernel.threads-max = 200000
Neither of the two values is really insane on 32GB machine.
I am not sure we want/need to tune the max_thread value further. If
anything the tuning should be removed altogether if proven not useful in
general. But we definitely need a way to override this auto-tuning.
Link: http://lkml.kernel.org/r/20190922065801.GB18814@dhcp22.suse.cz
Fixes: 16db3d3f11 ("kernel/sysctl.c: threads-max observe limits")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds polling support to pidfd.
Android low memory killer (LMK) needs to know when a process dies once
it is sent the kill signal. It does so by checking for the existence of
/proc/pid which is both racy and slow. For example, if a PID is reused
between when LMK sends a kill signal and checks for existence of the
PID, since the wrong PID is now possibly checked for existence.
Using the polling support, LMK will be able to get notified when a process
exists in race-free and fast way, and allows the LMK to do other things
(such as by polling on other fds) while awaiting the process being killed
to die.
For notification to polling processes, we follow the same existing
mechanism in the kernel used when the parent of the task group is to be
notified of a child's death (do_notify_parent). This is precisely when the
tasks waiting on a poll of pidfd are also awakened in this patch.
We have decided to include the waitqueue in struct pid for the following
reasons:
1. The wait queue has to survive for the lifetime of the poll. Including
it in task_struct would not be option in this case because the task can
be reaped and destroyed before the poll returns.
2. By including the struct pid for the waitqueue means that during
de_thread(), the new thread group leader automatically gets the new
waitqueue/pid even though its task_struct is different.
Appropriate test cases are added in the second patch to provide coverage of
all the cases the patch is handling.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Jonathan Kowalski <bl0pbl33p@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: kernel-team@android.com
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Co-developed-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
(cherry picked from commit b53b0b9d9a)
Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I0391acb666b34e33ef60a778dffdf12ed3654393
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
This patchset makes it possible to retrieve pid file descriptors at
process creation time by introducing the new flag CLONE_PIDFD to the
clone() system call. Linus originally suggested to implement this as a
new flag to clone() instead of making it a separate system call. As
spotted by Linus, there is exactly one bit for clone() left.
CLONE_PIDFD creates file descriptors based on the anonymous inode
implementation in the kernel that will also be used to implement the new
mount api. They serve as a simple opaque handle on pids. Logically,
this makes it possible to interpret a pidfd differently, narrowing or
widening the scope of various operations (e.g. signal sending). Thus, a
pidfd cannot just refer to a tgid, but also a tid, or in theory - given
appropriate flag arguments in relevant syscalls - a process group or
session. A pidfd does not represent a privilege. This does not imply it
cannot ever be that way but for now this is not the case.
A pidfd comes with additional information in fdinfo if the kernel supports
procfs. The fdinfo file contains the pid of the process in the callers
pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d".
As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the
parent_tidptr argument of clone. This has the advantage that we can
give back the associated pid and the pidfd at the same time.
To remove worries about missing metadata access this patchset comes with
a sample program that illustrates how a combination of CLONE_PIDFD, and
pidfd_send_signal() can be used to gain race-free access to process
metadata through /proc/<pid>. The sample program can easily be
translated into a helper that would be suitable for inclusion in libc so
that users don't have to worry about writing it themselves.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit b3e5838252)
Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I8a8f87e8fb23de0adb6d6acf2e622926b7a9f55c
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1GibIACgkQONu9yGCS
aT7z2hAAmv8AsH9IG43m7t6zLroJVswr/9594xk7yPBQgcY3/PW2aTFBCFbsdOL4
yXcj2PSwRiq9K6qAJULrvOvncR9fIILHqzWzyXnoaZ30lR/FxaaFmuHZX/5Ix1tB
e5EEE/EA49UAEjEDaMLq8g2IvibsReDxmSpnXyBJWoyRAdFIElVnMJ2+zvP/wRhF
NKzQj/bj/qecCbis2lUCaVWJFZ6+P/52UbD8lvIwqR3nk2TKsGDcLU6eY3yg4KrB
rEHl5T8KIPrkX3KNIEB8EcFREene+rdpZLLVe4fYwf+gOqfiFXSzZZvweauMkplq
ehlVHkykvQvlsVM2tjBD379z3C4aasZDuMVNMCbAy2FlruLeBQ7gEn77mCJB9VH5
/n/mlc2yizdoowtARCLWOUMfASpdSbqu2SQ7A/3kwG7l6GrpzKSIU2nQgm+41sUZ
QJVtZ3IYsPoYjnU4B3JZzgJnf3M9jcRz/3JegviqhSEbF1gaScJX0cqN8C1idN/v
ZAGCJK9S20/EEEsp5jn+bq2grUehvmD4TVDfot4P+5yRYyBIhMFpbM2RpjydOpwy
+x8D1Q34LYPFgZfQ0vF62vcSBhMBiJ/7j41rUeo44K+Lg00F3yCOyL6FxK6S8h6j
wsD0xLbllMrhV5KRYFizb3QbCHoHYiROIJk76uLvB+Tqq2Jg9VQ=
=qIi2
-----END PGP SIGNATURE-----
Merge 4.19.64 into android-4.19
Changes in 4.19.64
hv_sock: Add support for delayed close
vsock: correct removal of socket from the list
NFS: Fix dentry revalidation on NFSv4 lookup
NFS: Refactor nfs_lookup_revalidate()
NFSv4: Fix lookup revalidate of regular files
usb: dwc2: Disable all EP's on disconnect
usb: dwc2: Fix disable all EP's on disconnect
arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
binder: fix possible UAF when freeing buffer
ISDN: hfcsusb: checking idx of ep configuration
media: au0828: fix null dereference in error path
ath10k: Change the warning message string
media: cpia2_usb: first wake up, then free in disconnect
media: pvrusb2: use a different format for warnings
NFS: Cleanup if nfs_match_client is interrupted
media: radio-raremono: change devm_k*alloc to k*alloc
iommu/vt-d: Don't queue_iova() if there is no flush queue
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
Bluetooth: hci_uart: check for missing tty operations
vhost: introduce vhost_exceeds_weight()
vhost_net: fix possible infinite loop
vhost: vsock: add weight support
vhost: scsi: add weight support
sched/fair: Don't free p->numa_faults with concurrent readers
sched/fair: Use RCU accessors consistently for ->numa_group
/proc/<pid>/cmdline: remove all the special cases
/proc/<pid>/cmdline: add back the setproctitle() special case
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
Fix allyesconfig output.
ceph: hold i_ceph_lock when removing caps for freeing inode
block, scsi: Change the preempt-only flag into a counter
scsi: core: Avoid that a kernel warning appears during system resume
ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
Linux 4.19.64
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3e9055b677bd8ad9d5070307fae0bc765d444e9d
commit 16d51a590a upstream.
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.
During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.
Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzk4CsACgkQONu9yGCS
aT5Xaw//UWopx4Yqbiv+4HBgW+2ijP4utxI4lBNYITD44jvkyVJnztUtVkWepu5r
Tkl/7zytXOpxbpuhS0xqpWwG7lL5eT4NCG08KSX4lYQVjIWX4YzVkw9gLe9V2AaK
IqTzaWtbuagARbnR3UC65TI4kjRGsr9ldY0AbbGGVTM6IwPquHN9Qd9TAzRwRohn
CxY94Bwp1RcN2sSPkD3nUCUGOSNh97BXyypeM7FyceOzOpyAdQCXoUPc84cPqdNC
4GBkd5Z1IL/7zX3HDjQeGS0KK6e1enslSmsbSSUVuHI90LCr3CZPJkFF8RFnPnff
2RA7bdhp8C1JPeLDimr+SNSLEl9yywoH6d4UQAnBwoLDjiFCEITVgjDtYzzd81+1
ES6lbUAs8v/LXkaCaExq6pNNd1prg6Mj9Fe6cz+G9V/YV1tLUsoAJHdFucu8Sp7w
rwz/PZ6waCf8VRO4aYFF9b+u7PQ/RFZWQYsz22P7PhAYg0CTajV1FWGk1AYi0+wQ
5YCmthbWhDo9U5lAFyQ0pVTXv/UNgEu6MfV1/jKtCk5AzsbE77orj1xusKckHq2e
QojgmELmHMlFFajI0h/ddDo7iwz/5OrPVs9D03RysiOciMzdTKPucPyC0Ah4yEBA
sJ0cQkaVtqO2Nu3E42lfQTpVIqBgi8NGav+kRwryB1YyKeaXLsM=
=HJ7O
-----END PGP SIGNATURE-----
Merge 4.19.45 into android-4.19
Changes in 4.19.45
locking/rwsem: Prevent decrement of reader count before increment
x86/speculation/mds: Revert CPU buffer clear on double fault exit
x86/speculation/mds: Improve CPU buffer clear documentation
objtool: Fix function fallthrough detection
arm64: dts: rockchip: Disable DCMDs on RK3399's eMMC controller.
ARM: dts: exynos: Fix interrupt for shared EINTs on Exynos5260
ARM: dts: exynos: Fix audio (microphone) routing on Odroid XU3
mmc: sdhci-of-arasan: Add DTS property to disable DCMDs.
ARM: exynos: Fix a leaked reference by adding missing of_node_put
power: supply: axp288_charger: Fix unchecked return value
power: supply: axp288_fuel_gauge: Add ACEPC T8 and T11 mini PCs to the blacklist
arm64: mmap: Ensure file offset is treated as unsigned
arm64: arch_timer: Ensure counter register reads occur with seqlock held
arm64: compat: Reduce address limit
arm64: Clear OSDLR_EL1 on CPU boot
arm64: Save and restore OSDLR_EL1 across suspend/resume
sched/x86: Save [ER]FLAGS on context switch
crypto: crypto4xx - fix ctr-aes missing output IV
crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues
crypto: salsa20 - don't access already-freed walk.iv
crypto: chacha20poly1305 - set cra_name correctly
crypto: ccp - Do not free psp_master when PLATFORM_INIT fails
crypto: vmx - fix copy-paste error in CTR mode
crypto: skcipher - don't WARN on unprocessed data after slow walk step
crypto: crct10dif-generic - fix use via crypto_shash_digest()
crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
crypto: arm64/gcm-aes-ce - fix no-NEON fallback code
crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
crypto: rockchip - update IV buffer to contain the next IV
crypto: arm/aes-neonbs - don't access already-freed walk.iv
crypto: arm64/aes-neonbs - don't access already-freed walk.iv
mmc: core: Fix tag set memory leak
ALSA: line6: toneport: Fix broken usage of timer for delayed execution
ALSA: usb-audio: Fix a memory leak bug
ALSA: hda/hdmi - Read the pin sense from register when repolling
ALSA: hda/hdmi - Consider eld_valid when reporting jack event
ALSA: hda/realtek - EAPD turn on later
ALSA: hdea/realtek - Headset fixup for System76 Gazelle (gaze14)
ASoC: max98090: Fix restore of DAPM Muxes
ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
ASoC: fsl_esai: Fix missing break in switch statement
ASoC: codec: hdac_hdmi add device_link to card device
bpf, arm64: remove prefetch insn in xadd mapping
crypto: ccree - remove special handling of chained sg
crypto: ccree - fix mem leak on error path
crypto: ccree - don't map MAC key on stack
crypto: ccree - use correct internal state sizes for export
crypto: ccree - don't map AEAD key and IV on stack
crypto: ccree - pm resume first enable the source clk
crypto: ccree - HOST_POWER_DOWN_EN should be the last CC access during suspend
crypto: ccree - add function to handle cryptocell tee fips error
crypto: ccree - handle tee fips error during power management resume
mm/mincore.c: make mincore() more conservative
mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses
mm/hugetlb.c: don't put_page in lock of hugetlb_lock
hugetlb: use same fault hash key for shared and private mappings
ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
userfaultfd: use RCU to free the task struct when fork fails
ACPI: PM: Set enable_for_wake for wakeup GPEs during suspend-to-idle
mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values
mtd: spi-nor: intel-spi: Avoid crossing 4K address boundary on read/write
tty: vt.c: Fix TIOCL_BLANKSCREEN console blanking if blankinterval == 0
tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
jbd2: check superblock mapped prior to committing
ext4: make sanity check in mballoc more strict
ext4: ignore e_value_offs for xattrs with value-in-ea-inode
ext4: avoid drop reference to iloc.bh twice
ext4: fix use-after-free race with debug_want_extra_isize
ext4: actually request zeroing of inode table after grow
ext4: fix ext4_show_options for file systems w/o journal
btrfs: Check the first key and level for cached extent buffer
btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails
btrfs: Honour FITRIM range constraints during free space trim
Btrfs: send, flush dellaloc in order to avoid data loss
Btrfs: do not start a transaction during fiemap
Btrfs: do not start a transaction at iterate_extent_inodes()
bcache: fix a race between cache register and cacheset unregister
bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
ipmi:ssif: compare block number correctly for multi-part return messages
crypto: ccm - fix incompatibility between "ccm" and "ccm_base"
fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
tty: Don't force RISCV SBI console as preferred console
ext4: zero out the unused memory region in the extent tree block
ext4: fix data corruption caused by overlapping unaligned and aligned IO
ext4: fix use-after-free in dx_release()
ext4: avoid panic during forced reboot due to aborted journal
ALSA: hda/realtek - Corrected fixup for System76 Gazelle (gaze14)
ALSA: hda/realtek - Fixup headphone noise via runtime suspend
ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
jbd2: fix potential double free
KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
KVM: lapic: Busy wait for timer to expire when using hv_timer
kbuild: turn auto.conf.cmd into a mandatory include file
xen/pvh: set xen_domain_type to HVM in xen_pvh_init
libnvdimm/namespace: Fix label tracking error
iov_iter: optimize page_copy_sane()
pstore: Centralize init/exit routines
pstore: Allocate compression during late_initcall()
pstore: Refactor compression initialization
ext4: fix compile error when using BUFFER_TRACE
ext4: don't update s_rev_level if not required
Linux 4.19.45
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit c3f3ce049f upstream.
The task structure is freed while get_mem_cgroup_from_mm() holds
rcu_read_lock() and dereferences mm->owner.
get_mem_cgroup_from_mm() failing fork()
---- ---
task = mm->owner
mm->owner = NULL;
free(task)
if (task) *task; /* use after free */
The fix consists in freeing the task with RCU also in the fork failure
case, exactly like it always happens for the regular exit(2) path. That
is enough to make the rcu_read_lock hold in get_mem_cgroup_from_mm()
(left side above) effective to avoid a use after free when dereferencing
the task structure.
An alternate possible fix would be to defer the delivery of the
userfaultfd contexts to the monitor until after fork() is guaranteed to
succeed. Such a change would require more changes because it would
create a strict ordering dependency where the uffd methods would need to
be called beyond the last potentially failing branch in order to be
safe. This solution as opposed only adds the dependency to common code
to set mm->owner to NULL and to free the task struct that was pointed by
mm->owner with RCU, if fork ends up failing. The userfaultfd methods
can still be called anywhere during the fork runtime and the monitor
will keep discarding orphaned "mm" coming from failed forks in userland.
This race condition couldn't trigger if CONFIG_MEMCG was set =n at build
time.
[aarcange@redhat.com: improve changelog, reduce #ifdefs per Michal]
Link: http://lkml.kernel.org/r/20190429035752.4508-1-aarcange@redhat.com
Link: http://lkml.kernel.org/r/20190325225636.11635-2-aarcange@redhat.com
Fixes: 893e26e61d ("userfaultfd: non-cooperative: Add fork() event")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: zhong jiang <zhongjiang@huawei.com>
Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills. In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.
In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.
A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively. Stall states are aggregate versions of the per-task delay
accounting delays:
cpu: some tasks are runnable but not executing on a CPU
memory: tasks are reclaiming, or waiting for swapin or thrashing cache
io: tasks are waiting for io completions
These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit. They can also indicate when the system
is approaching lockup scenarios and OOMs.
To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states. Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime. A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).
[hannes@cmpxchg.org: doc fixlet, per Randy]
Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit eb414681d5)
Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Id00d23c977169b0c4636d92016fc1fee0274be05
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add time in state data to task structs, and create
/proc/<pid>/time_in_state files to show how long each individual task
has run at each frequency.
Create a CONFIG_CPU_FREQ_TIMES option to enable/disable this tracking.
Bug: 72339335
Bug: 127641090
Test: Read /proc/<pid>/time_in_state
Change-Id: Ia6456754f4cb1e83b2bc35efa8fbe9f8696febc8
Signed-off-by: Connor O'Brien <connoro@google.com>
[astrachan: Folded the following changes into this patch:
a6d3de6a7fba ("ANDROID: Reduce use of #ifdef CONFIG_CPU_FREQ_TIMES")
b89ada5d9c09 ("ANDROID: Fix massive cpufreq_times memory leaks")]
Signed-off-by: Alistair Strachan <astrachan@google.com>
commit 7b55851367 upstream.
This changes the fork(2) syscall to record the process start_time after
initializing the basic task structure but still before making the new
process visible to user-space.
Technically, we could record the start_time anytime during fork(2). But
this might lead to scenarios where a start_time is recorded long before
a process becomes visible to user-space. For instance, with
userfaultfd(2) and TLS, user-space can delay the execution of fork(2)
for an indefinite amount of time (and will, if this causes network
access, or similar).
By recording the start_time late, it much closer reflects the point in
time where the process becomes live and can be observed by other
processes.
Lastly, this makes it much harder for user-space to predict and control
the start_time they get assigned. Previously, user-space could fork a
process and stall it in copy_thread_tls() before its pid is allocated,
but after its start_time is recorded. This can be misused to later-on
cycle through PIDs and resume the stalled fork(2) yielding a process
that has the same pid and start_time as a process that existed before.
This can be used to circumvent security systems that identify processes
by their pid+start_time combination.
Even though user-space was always aware that start_time recording is
flaky (but several projects are known to still rely on start_time-based
identification), changing the start_time to be recorded late will help
mitigate existing attacks and make it much harder for user-space to
control the start_time a process gets assigned.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit d70f2a14b7 ("include/linux/sched/mm.h: uninline mmdrop_async(),
etc") ignored the return value of arch_dup_mmap(). As a result, on x86,
a failure to duplicate the LDT (e.g. due to memory allocation error)
would leave the duplicated memory mapping in an inconsistent state.
Fix by using the return value, as it was before the change.
Link: http://lkml.kernel.org/r/20180823051229.211856-1-namit@vmware.com
Fixes: d70f2a14b7 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc")
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge more updates from Andrew Morton:
- the rest of MM
- procfs updates
- various misc things
- more y2038 fixes
- get_maintainer updates
- lib/ updates
- checkpatch updates
- various epoll updates
- autofs updates
- hfsplus
- some reiserfs work
- fatfs updates
- signal.c cleanups
- ipc/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits)
ipc/util.c: update return value of ipc_getref from int to bool
ipc/util.c: further variable name cleanups
ipc: simplify ipc initialization
ipc: get rid of ids->tables_initialized hack
lib/rhashtable: guarantee initial hashtable allocation
lib/rhashtable: simplify bucket_table_alloc()
ipc: drop ipc_lock()
ipc/util.c: correct comment in ipc_obtain_object_check
ipc: rename ipcctl_pre_down_nolock()
ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()
ipc: reorganize initialization of kern_ipc_perm.seq
ipc: compute kern_ipc_perm.id under the ipc lock
init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp
adfs: use timespec64 for time conversion
kernel/sysctl.c: fix typos in comments
drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md
fork: don't copy inconsistent signal handler state to child
signal: make get_signal() return bool
signal: make sigkill_pending() return bool
...
Before this change, if a multithreaded process forks while one of its
threads is changing a signal handler using sigaction(), the memcpy() in
copy_sighand() can race with the struct assignment in do_sigaction(). It
isn't clear whether this can cause corruption of the userspace signal
handler pointer, but it definitely can cause inconsistency between
different fields of struct sigaction.
Take the appropriate spinlock to avoid this.
I have tested that this patch prevents inconsistency between sa_sigaction
and sa_flags, which is possible before this patch.
Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently task hung checking interval is equal to timeout, as the result
hung is detected anywhere between timeout and 2*timeout. This is fine for
most interactive environments, but this hurts automated testing setups
(syzbot). In an automated setup we need to strictly order CPU lockup <
RCU stall < workqueue lockup < task hung < silent loss, so that RCU stall
is not detected as task hung and task hung is not detected as silent
machine loss. The large variance in task hung detection timeout requires
setting silent machine loss timeout to a very large value (e.g. if task
hung is 3 mins, then silent loss need to be set to ~7 mins). The
additional 3 minutes significantly reduce testing efficiency because
usually we crash kernel within a minute, and this can add hours to bug
localization process as it needs to do dozens of tests.
Allow setting checking interval separately from timeout. This allows to
set timeout to, say, 3 minutes, but checking interval to 10 secs.
The interval is controlled via a new hung_task_check_interval_secs sysctl,
similar to the existing hung_task_timeout_secs sysctl. The default value
of 0 results in the current behavior: checking interval is equal to
timeout.
[akpm@linux-foundation.org: update hung_task_timeout_max's comment]
Link: http://lkml.kernel.org/r/20180611111004.203513-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rather than in vm_area_alloc(). To ensure that the various oddball
stack-based vmas are in a good state. Some of the callers were zeroing
them out, others were not.
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull core signal handling updates from Eric Biederman:
"It was observed that a periodic timer in combination with a
sufficiently expensive fork could prevent fork from every completing.
This contains the changes to remove the need for that restart.
This set of changes is split into several parts:
- The first part makes PIDTYPE_TGID a proper pid type instead
something only for very special cases. The part starts using
PIDTYPE_TGID enough so that in __send_signal where signals are
actually delivered we know if the signal is being sent to a a group
of processes or just a single process.
- With that prep work out of the way the logic in fork is modified so
that fork logically makes signals received while it is running
appear to be received after the fork completes"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits)
signal: Don't send signals to tasks that don't exist
signal: Don't restart fork when signals come in.
fork: Have new threads join on-going signal group stops
fork: Skip setting TIF_SIGPENDING in ptrace_init_task
signal: Add calculate_sigpending()
fork: Unconditionally exit if a fatal signal is pending
fork: Move and describe why the code examines PIDNS_ADDING
signal: Push pid type down into complete_signal.
signal: Push pid type down into __send_signal
signal: Push pid type down into send_signal
signal: Pass pid type into do_send_sig_info
signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task
signal: Pass pid type into group_send_sig_info
signal: Pass pid and pid type into send_sigqueue
posix-timers: Noralize good_sigevent
signal: Use PIDTYPE_TGID to clearly store where file signals will be sent
pid: Implement PIDTYPE_TGID
pids: Move the pgrp and session pid pointers from task_struct to signal_struct
kvm: Don't open code task_pid in kvm_vcpu_ioctl
pids: Compute task_tgid using signal->leader_pid
...
Patch series "Directed kmem charging", v8.
The Linux kernel's memory cgroup allows limiting the memory usage of the
jobs running on the system to provide isolation between the jobs. All
the kernel memory allocated in the context of the job and marked with
__GFP_ACCOUNT will also be included in the memory usage and be limited
by the job's limit.
The kernel memory can only be charged to the memcg of the process in
whose context kernel memory was allocated. However there are cases
where the allocated kernel memory should be charged to the memcg
different from the current processes's memcg. This patch series
contains two such concrete use-cases i.e. fsnotify and buffer_head.
The fsnotify event objects can consume a lot of system memory for large
or unlimited queues if there is either no or slow listener. The events
are allocated in the context of the event producer. However they should
be charged to the event consumer. Similarly the buffer_head objects can
be allocated in a memcg different from the memcg of the page for which
buffer_head objects are being allocated.
To solve this issue, this patch series introduces mechanism to charge
kernel memory to a given memcg. In case of fsnotify events, the memcg
of the consumer can be used for charging and for buffer_head, the memcg
of the page can be charged. For directed charging, the caller can use
the scope API memalloc_[un]use_memcg() to specify the memcg to charge
for all the __GFP_ACCOUNT allocations within the scope.
This patch (of 2):
A lot of memory can be consumed by the events generated for the huge or
unlimited queues if there is either no or slow listener. This can cause
system level memory pressure or OOMs. So, it's better to account the
fsnotify kmem caches to the memcg of the listener.
However the listener can be in a different memcg than the memcg of the
producer and these allocations happen in the context of the event
producer. This patch introduces remote memcg charging API which the
producer can use to charge the allocations to the memcg of the listener.
There are seven fsnotify kmem caches and among them allocations from
dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
inotify_inode_mark_cachep happens in the context of syscall from the
listener. So, SLAB_ACCOUNT is enough for these caches.
The objects from fsnotify_mark_connector_cachep are not accounted as
they are small compared to the notification mark or events and it is
unclear whom to account connector to since it is shared by all events
attached to the inode.
The allocations from the event caches happen in the context of the event
producer. For such caches we will need to remote charge the allocations
to the listener's memcg. Thus we save the memcg reference in the
fsnotify_group structure of the listener.
This patch has also moved the members of fsnotify_group to keep the size
same, at least for 64 bit build, even with additional member by filling
the holes.
[shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltwvasQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpv65EACTq5gSLnJBI6ZPr1RAHruVDnjfzO2Veitl
tUtjm0XfWmnEiwQ3dYvnyhy99xbyaG3900d9BClCTlH6xaUdSiQkDpcKG/R2F36J
5mZitYukQcpFAQJWF8YKsTTE7JPl4VglCIDqYiC4+C3rOSVi8lrKn2qp4J4MMCFn
thRg3jCcq7c5s9Eigsop1pXWQSasubkXfk55Krcp4oybKYpYRKXXf74Mj14QAbwJ
QHN3VisyAUWoBRg7UQZo1Npe2oPk6bbnJypnjf8M0M2EnlvddEkIlHob91sodka8
6p4APOEu5cbyXOBCAQsw/koff14mb8aEadqeQA68WvXfIdX9ZjfxCX0OoC3sBEXk
yqJhZ0C980AM13zIBD8ejv4uasGcPca8W+47mE5P8sRiI++5kBsFWDZPCtUBna0X
2Kh24NsmEya9XRR5vsB84dsIPQ3tLMkxg/IgQRVDaSnfJz0c/+zm54xDyKRaFT4l
5iERk2WSkm9+8jNfVmWG0edrv6nRAXjpGwFfOCPh6/LCSCi4xQRULYN7sVzsX8ZK
FRjt24HftBI8mJbh4BtweJvg+ppVe1gAk3IO3HvxAQhv29Hz+uvFYe9kL+3N8LJA
Qosr9n9O4+wKYizJcDnw+5iPqCHfAwOm9th4pyedR+R7SmNcP3yNC8AbbheNBiF5
Zolos5H+JA==
=b9ib
-----END PGP SIGNATURE-----
Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
"First pull request for this merge window, there will also be a
followup request with some stragglers.
This pull request contains:
- Fix for a thundering heard issue in the wbt block code (Anchal
Agarwal)
- A few NVMe pull requests:
* Improved tracepoints (Keith)
* Larger inline data support for RDMA (Steve Wise)
* RDMA setup/teardown fixes (Sagi)
* Effects log suppor for NVMe target (Chaitanya Kulkarni)
* Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
* TP4004 (ANA) support (Christoph)
* Various NVMe fixes
- Block io-latency controller support. Much needed support for
properly containing block devices. (Josef)
- Series improving how we handle sense information on the stack
(Kees)
- Lightnvm fixes and updates/improvements (Mathias/Javier et al)
- Zoned device support for null_blk (Matias)
- AIX partition fixes (Mauricio Faria de Oliveira)
- DIF checksum code made generic (Max Gurtovoy)
- Add support for discard in iostats (Michael Callahan / Tejun)
- Set of updates for BFQ (Paolo)
- Removal of async write support for bsg (Christoph)
- Bio page dirtying and clone fixups (Christoph)
- Set of bcache fix/changes (via Coly)
- Series improving blk-mq queue setup/teardown speed (Ming)
- Series improving merging performance on blk-mq (Ming)
- Lots of other fixes and cleanups from a slew of folks"
* tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
blkcg: Make blkg_root_lookup() work for queues in bypass mode
bcache: fix error setting writeback_rate through sysfs interface
null_blk: add lock drop/acquire annotation
Blk-throttle: reduce tail io latency when iops limit is enforced
block: paride: pd: mark expected switch fall-throughs
block: Ensure that a request queue is dissociated from the cgroup controller
block: Introduce blk_exit_queue()
blkcg: Introduce blkg_root_lookup()
block: Remove two superfluous #include directives
blk-mq: count the hctx as active before allocating tag
block: bvec_nr_vecs() returns value for wrong slab
bcache: trivial - remove tailing backslash in macro BTREE_FLAG
bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
bcache: set max writeback rate when I/O request is idle
bcache: add code comments for bset.c
bcache: fix mistaken comments in request.c
bcache: fix mistaken code comments in bcache.h
bcache: add a comment in super.c
bcache: avoid unncessary cache prefetch bch_btree_node_get()
bcache: display rate debug parameters to 0 when writeback is not running
...
Pull x86 mm updates from Thomas Gleixner:
- Make lazy TLB mode even lazier to avoid pointless switch_mm()
operations, which reduces CPU load by 1-2% for memcache workloads
- Small cleanups and improvements all over the place
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove redundant check for kmem_cache_create()
arm/asm/tlb.h: Fix build error implicit func declaration
x86/mm/tlb: Make clear_asid_other() static
x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off()
x86/mm/tlb: Always use lazy TLB mode
x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs
x86/mm/tlb: Make lazy TLB mode lazier
x86/mm/tlb: Restructure switch_mm_irqs_off()
x86/mm/tlb: Leave lazy TLB mode at page table free time
mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids
x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap: Update pgtable free interfaces with addr
x86/mm: Disable ioremap free page handling on x86-PAE
Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn>
report that a periodic signal received during fork can cause fork to
continually restart preventing an application from making progress.
The code was being overly pessimistic. Fork needs to guarantee that a
signal sent to multiple processes is logically delivered before the
fork and just to the forking process or logically delivered after the
fork to both the forking process and it's newly spawned child. For
signals like periodic timers that are always delivered to a single
process fork can safely complete and let them appear to logically
delivered after the fork().
While examining this issue I also discovered that fork today will miss
signals delivered to multiple processes during the fork and handled by
another thread. Similarly the current code will also miss blocked
signals that are delivered to multiple process, as those signals will
not appear pending during fork.
Add a list of each thread that is currently forking, and keep on that
list a signal set that records all of the signals sent to multiple
processes. When fork completes initialize the new processes
shared_pending signal set with it. The calculate_sigpending function
will see those signals and set TIF_SIGPENDING causing the new task to
take the slow path to userspace to handle those signals. Making it
appear as if those signals were received immediately after the fork.
It is not possible to send real time signals to multiple processes and
exceptions don't go to multiple processes, which means that that are
no signals sent to multiple processes that require siginfo. This
means it is safe to not bother collecting siginfo on signals sent
during fork.
The sigaction of a child of fork is initially the same as the
sigaction of the parent process. So a signal the parent ignores the
child will also initially ignore. Therefore it is safe to ignore
signals sent to multiple processes and ignored by the forking process.
Signals sent to only a single process or only a single thread and delivered
during fork are treated as if they are received after the fork, and generally
not dealt with. They won't cause any problems.
V2: Added removal from the multiprocess list on failure.
V3: Use -ERESTARTNOINTR directly
V4: - Don't queue both SIGCONT and SIGSTOP
- Initialize signal_struct.multiprocess in init_task
- Move setting of shared_pending to before the new task
is visible to signals. This prevents signals from comming
in before shared_pending.signal is set to delayed.signal
and being lost.
V5: - rework list add and delete to account for idle threads
v6: - Use sigdelsetmask when removing stop signals
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447
Reported-by: Wen Yang <wen.yang99@zte.com.cn> and
Reported-by: majiang <ma.jiang@zte.com.cn>
Fixes: 4a2c7a7837 ("[PATCH] make fork() atomic wrt pgrp/session signals")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltU8z0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5X8H/2fJr7m3k242+t76
sitwvx1eoPqTgryW59dRKm9IuXAGA+AjauvHzaz1QxomeQa50JghGWefD0eiJfkA
1AphQ/24EOiAbbVk084dAI/C2p122dE4D5Fy7CrfLnuouyrbFaZI5STbnrRct7sR
9deeYW0GDHO1Uenp4WDCj0baaqJqaevZ+7GG09DnWpya2nQtSkGBjqn6GpYmrfOU
mqFuxAX8mEOW6cwK16y/vYtnVjuuMAiZ63/OJ8AQ6d6ArGLwAsdn7f8Fn4I4tEr2
L0d3CRLUyegms4++Dmlu05k64buQu46WlPhjCZc5/Ts4kjrNxBuHejj2/jeSnUSt
vJJlibI=
=42a5
-----END PGP SIGNATURE-----
Merge tag 'v4.18-rc6' into for-4.19/block2
Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a
merge conflict down the line.
Signed-of-by: Jens Axboe <axboe@kernel.dk>
There are only two signals that are delivered to every member of a
signal group: SIGSTOP and SIGKILL. Signal delivery requires every
signal appear to be delivered either before or after a clone syscall.
SIGKILL terminates the clone so does not need to be considered. Which
leaves only SIGSTOP that needs to be considered when creating new
threads.
Today in the event of a group stop TIF_SIGPENDING will get set and the
fork will restart ensuring the fork syscall participates in the group
stop.
A fork (especially of a process with a lot of memory) is one of the
most expensive system so we really only want to restart a fork when
necessary.
It is easy so check to see if a SIGSTOP is ongoing and have the new
thread join it immediate after the clone completes. Making it appear
the clone completed happened just before the SIGSTOP.
The calculate_sigpending function will see the bits set in jobctl and
set TIF_SIGPENDING to ensure the new task takes the slow path to userspace.
V2: The call to task_join_group_stop was moved before the new task is
added to the thread group list. This should not matter as
sighand->siglock is held over both the addition of the threads,
the call to task_join_group_stop and do_signal_stop. But the change
is trivial and it is one less thing to worry about when reading
the code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
We were hitting a panic in production where we put too many times on the
request queue. This is because we'd get the throttle_queue of the
parent if we fork()'ed while we needed to be throttled, but we didn't
have a reference on it. Instead just clear these flags on fork so the
child doesn't pay for the sins of its father.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Not all VMAs allocated with vm_area_alloc(). Some of them allocated on
stack or in data segment.
The new helper can be use to initialize VMA properly regardless where it
was allocated.
Link: http://lkml.kernel.org/r/20180724121139.62570-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In practice this does not change anything as testing for fatal_signal_pending
and exiting for with an error code duplicates the work of the next clause
which recalculates pending signals and then exits fork if any are pending.
In both cases the pending signal will trigger the slow path when existing
to userspace, and the fatal signal will cause do_exit to be called.
The advantage of making this a separate test is that it makes it clear
processing the fatal signal will terminate the fork, and it allows the
rest of the signal logic to be updated without fear that this important
case will be lost.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Normally this would be something that would be handled by handling
signals that are sent to a group of processes but in this case the
forking process is not a member of the group being signaled. Thus
special code is needed to prevent a race with pid namespaces exiting,
and fork adding new processes within them.
Move this test up before the signal restart just in case signals are
also pending. Fatal conditions should take presedence over restarts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Like vm_area_dup(), it initializes the anon_vma_chain head, and the
basic mm pointer.
The rest of the fields end up being different for different users,
although the plan is to also initialize the 'vm_ops' field to a dummy
entry.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
.. and re-initialize th eanon_vma_chain head.
This removes some boiler-plate from the users, and also makes it clear
why it didn't need use the 'zalloc()' version.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.
We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.
Right now those functions are literally just wrappers around the
kmem_cache_*() calls. This is a purely mechanical conversion:
# new vma:
kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()
# copy old vma
kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)
# free vma
kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)
to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Everywhere except in the pid array we distinguish between a tasks pid and
a tasks tgid (thread group id). Even in the enumeration we want that
distinction sometimes so we have added __PIDTYPE_TGID. With leader_pid
we almost have an implementation of PIDTYPE_TGID in struct signal_struct.
Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
into the pids array. Then remove the __PIDTYPE_TGID special case and the
leader_pid in signal_struct.
The net size increase is just an extra pointer added to struct pid and
an extra pair of pointers of an hlist_node added to task_struct.
The effect on code maintenance is the removal of a number of special
cases today and the potential to remove many more special cases as
PIDTYPE_TGID gets used to it's fullest. The long term potential
is allowing zombie thread group leaders to exit, which will remove
a lot more special cases in the code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
To access these fields the code always has to go to group leader so
going to signal struct is no loss and is actually a fundamental simplification.
This saves a little bit of memory by only allocating the pid pointer array
once instead of once for every thread, and even better this removes a
few potential races caused by the fact that group_leader can be changed
by de_thread, while signal_struct can not.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The mm_struct always contains a cpumask bitmap, regardless of
CONFIG_CPUMASK_OFFSTACK. That means the first step can be to
simplify things, and simply have one bitmask at the end of the
mm_struct for the mm_cpumask.
This does necessitate moving everything else in mm_struct into
an anonymous sub-structure, which can be randomized when struct
randomization is enabled.
The second step is to determine the correct size for the
mm_struct slab object from the size of the mm_struct
(excluding the CPU bitmap) and the size the cpumask.
For init_mm we can simply allocate the maximum size this
kernel is compiled for, since we only have one init_mm
in the system, anyway.
Pointer magic by Mike Galbraith, to evade -Wstringop-overflow
getting confused by the dynamically sized array.
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@fb.com
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/20180716190337.26133-2-riel@surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As a theoretical problem, dup_mmap() of an mm_struct with 60000+ vmas
can loop while potentially allocating memory, with mm->mmap_sem held for
write by current thread. This is bad if current thread was selected as
an OOM victim, for current thread will continue allocations using memory
reserves while OOM reaper is unable to reclaim memory.
As an actually observable problem, it is not difficult to make OOM
reaper unable to reclaim memory if the OOM victim is blocked at
i_mmap_lock_write() in this loop. Unfortunately, since nobody can
explain whether it is safe to use killable wait there, let's check for
SIGKILL before trying to allocate memory. Even without an OOM event,
there is no point with continuing the loop from the beginning if current
thread is killed.
I tested with debug printk(). This patch should be safe because we
already fail if security_vm_enough_memory_mm() or
kmem_cache_alloc(GFP_KERNEL) fails and exit_mmap() handles it.
***** Aborting dup_mmap() due to SIGKILL *****
***** Aborting dup_mmap() due to SIGKILL *****
***** Aborting dup_mmap() due to SIGKILL *****
***** Aborting dup_mmap() due to SIGKILL *****
***** Aborting exit_mmap() due to NULL mmap *****
[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/201804071938.CDE04681.SOFVQJFtMHOOLF@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The changes to automatically test for working stack protector compiler
support in the Kconfig files removed the special STACKPROTECTOR_AUTO
option that picked the strongest stack protector that the compiler
supported.
That was all a nice cleanup - it makes no sense to have the AUTO case
now that the Kconfig phase can just determine the compiler support
directly.
HOWEVER.
It also meant that doing "make oldconfig" would now _disable_ the strong
stackprotector if you had AUTO enabled, because in a legacy config file,
the sane stack protector configuration would look like
CONFIG_HAVE_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_NONE is not set
# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_CC_STACKPROTECTOR_AUTO=y
and when you ran this through "make oldconfig" with the Kbuild changes,
it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
used to be disabled (because it was really enabled by AUTO), and would
disable it in the new config, resulting in:
CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
That's dangerously subtle - people could suddenly find themselves with
the weaker stack protector setup without even realizing.
The solution here is to just rename not just the old RECULAR stack
protector option, but also the strong one. This does that by just
removing the CC_ prefix entirely for the user choices, because it really
is not about the compiler support (the compiler support now instead
automatially impacts _visibility_ of the options to users).
This results in "make oldconfig" actually asking the user for their
choice, so that we don't have any silent subtle security model changes.
The end result would generally look like this:
CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
where the "CC_" versions really are about internal compiler
infrastructure, not the user selections.
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull restartable sequence support from Thomas Gleixner:
"The restartable sequences syscall (finally):
After a lot of back and forth discussion and massive delays caused by
the speculative distraction of maintainers, the core set of
restartable sequences has finally reached a consensus.
It comes with the basic non disputed core implementation along with
support for arm, powerpc and x86 and a full set of selftests
It was exposed to linux-next earlier this week, so it does not fully
comply with the merge window requirements, but there is really no
point to drag it out for yet another cycle"
* 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq/selftests: Provide Makefile, scripts, gitignore
rseq/selftests: Provide parametrized tests
rseq/selftests: Provide basic percpu ops test
rseq/selftests: Provide basic test
rseq/selftests: Provide rseq library
selftests/lib.mk: Introduce OVERRIDE_TARGETS
powerpc: Wire up restartable sequences system call
powerpc: Add syscall detection for restartable sequences
powerpc: Add support for restartable sequences
x86: Wire up restartable sequence system call
x86: Add support for restartable sequences
arm: Wire up restartable sequences system call
arm: Add syscall detection for restartable sequences
arm: Add restartable sequences support
rseq: Introduce restartable sequences system call
uapi/headers: Provide types_32_64.h
mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too. It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.
And, the mmap_sem contention may cause unexpected issue like below:
INFO: task ps:14018 blocked for more than 120 seconds.
Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
ps D 0 14018 1 0x00000004
Call Trace:
schedule+0x36/0x80
rwsem_down_read_failed+0xf0/0x150
call_rwsem_down_read_failed+0x18/0x30
down_read+0x20/0x40
proc_pid_cmdline_read+0xd9/0x4e0
__vfs_read+0x37/0x150
vfs_read+0x96/0x130
SyS_read+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xc5
Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.
So, introduce a new spinlock in mm_struct to protect the concurrent
access to arg_start|end, env_start|end and others, as well as replace
write map_sem to read to protect the race condition between prctl and
sys_brk which might break check_data_rlimit(), and makes prctl more
friendly to other VM operations.
This patch just eliminates the abuse of mmap_sem, but it can't resolve
the above hung task warning completely since the later
access_remote_vm() call needs acquire mmap_sem. The mmap_sem
scalability issue will be solved in the future.
[yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock]
Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAlsXFUEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQVeRaWujKfIoomg//eRNpc6x9kxTijN670AC2uD0CBTlZ
2z6mHuJaOhG8bTxjZxQfUBoo6/eZJ2YC1yq6ornGFNzw4sfKsR/j86ujJim2HAmo
opUhziq3SILGEvjsxfPkREe/wb49jy0AA/WjZqciitB1ig8Hz7xzqi0lpNaEspFh
QJFB6XXkojWGFGrRzruAVJnPS+pDWoTQR0qafs3JWKnpeinpOdZnl1hPsysAEHt5
Ag8o4qS/P9xJM0khi7T+jWECmTyT/mtWqEtFcZ0o+JLOgt/EMvNX6DO4ETDiYRD2
mVChga9x5r78bRgNy2U8IlEWWa76WpcQAEODvhzbijX4RxMAmjsmLE+e+udZSnMZ
eCITl2f7ExxrL5SwNFC/5h7pAv0RJ+SOC19vcyeV4JDlQNNVjUy/aNKv5baV0aeg
EmkeobneMWxqHx52aERz8RF1in5pT8gLOYoYnWfNpcDEmjLrwhuZLX2asIzUEqrS
SoPJ8hxIDCxceHOWIIrz5Dqef7x28Dyi46w3QINC8bSy2RnR/H3q40DRegvXOGiS
9WcbbwbhnM4Kau413qKicGCvdqTVYdeyZqo7fVelSciD139Vk7pZotyom4MuU25p
fIyGfXa8/8gkl7fZ+HNkZbba0XWNfAZt//zT095qsp3CkhVnoybwe6OwG1xRqErq
W7OOQbS7vvN/KGo=
=10u6
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Another reasonable chunk of audit changes for v4.18, thirteen patches
in total.
The thirteen patches can mostly be broken down into one of four
categories: general bug fixes, accessor functions for audit state
stored in the task_struct, negative filter matches on executable
names, and extending the (relatively) new seccomp logging knobs to the
audit subsystem.
The main driver for the accessor functions from Richard are the
changes we're working on to associate audit events with containers,
but I think they have some standalone value too so I figured it would
be good to get them in now.
The seccomp/audit patches from Tyler apply the seccomp logging
improvements from a few releases ago to audit's seccomp logging;
starting with this patchset the changes in
/proc/sys/kernel/seccomp/actions_logged should apply to both the
standard kernel logging and audit.
As usual, everything passes the audit-testsuite and it happens to
merge cleanly with your tree"
[ Heh, except it had trivial merge conflicts with the SELinux tree that
also came in from Paul - Linus ]
* tag 'audit-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: Fix wrong task in comparison of session ID
audit: use existing session info function
audit: normalize loginuid read access
audit: use new audit_context access funciton for seccomp_actions_logged
audit: use inline function to set audit context
audit: use inline function to get audit context
audit: convert sessionid unset to a macro
seccomp: Don't special case audited processes when logging
seccomp: Audit attempts to modify the actions_logged sysctl
seccomp: Configurable separator for the actions_logged string
seccomp: Separate read and write code for actions_logged sysctl
audit: allow not equal op for audit by executable
audit: add syscall information to FEATURE_CHANGE records
Expose a new system call allowing each thread to register one userspace
memory area to be used as an ABI between kernel and user-space for two
purposes: user-space restartable sequences and quick access to read the
current CPU number value from user-space.
* Restartable sequences (per-cpu atomics)
Restartables sequences allow user-space to perform update operations on
per-cpu data without requiring heavy-weight atomic operations.
The restartable critical sections (percpu atomics) work has been started
by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
critical sections. [1] [2] The re-implementation proposed here brings a
few simplifications to the ABI which facilitates porting to other
architectures and speeds up the user-space fast path.
Here are benchmarks of various rseq use-cases.
Test hardware:
arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core
x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading
The following benchmarks were all performed on a single thread.
* Per-CPU statistic counter increment
getcpu+atomic (ns/op) rseq (ns/op) speedup
arm32: 344.0 31.4 11.0
x86-64: 15.3 2.0 7.7
* LTTng-UST: write event 32-bit header, 32-bit payload into tracer
per-cpu buffer
getcpu+atomic (ns/op) rseq (ns/op) speedup
arm32: 2502.0 2250.0 1.1
x86-64: 117.4 98.0 1.2
* liburcu percpu: lock-unlock pair, dereference, read/compare word
getcpu+atomic (ns/op) rseq (ns/op) speedup
arm32: 751.0 128.5 5.8
x86-64: 53.4 28.6 1.9
* jemalloc memory allocator adapted to use rseq
Using rseq with per-cpu memory pools in jemalloc at Facebook (based on
rseq 2016 implementation):
The production workload response-time has 1-2% gain avg. latency, and
the P99 overall latency drops by 2-3%.
* Reading the current CPU number
Speeding up reading the current CPU number on which the caller thread is
running is done by keeping the current CPU number up do date within the
cpu_id field of the memory area registered by the thread. This is done
by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
current thread. Upon return to user-space, a notify-resume handler
updates the current CPU value within the registered user-space memory
area. User-space can then read the current CPU number directly from
memory.
Keeping the current cpu id in a memory area shared between kernel and
user-space is an improvement over current mechanisms available to read
the current CPU number, which has the following benefits over
alternative approaches:
- 35x speedup on ARM vs system call through glibc
- 20x speedup on x86 compared to calling glibc, which calls vdso
executing a "lsl" instruction,
- 14x speedup on x86 compared to inlined "lsl" instruction,
- Unlike vdso approaches, this cpu_id value can be read from an inline
assembly, which makes it a useful building block for restartable
sequences.
- The approach of reading the cpu id through memory mapping shared
between kernel and user-space is portable (e.g. ARM), which is not the
case for the lsl-based x86 vdso.
On x86, yet another possible approach would be to use the gs segment
selector to point to user-space per-cpu data. This approach performs
similarly to the cpu id cache, but it has two disadvantages: it is
not portable, and it is incompatible with existing applications already
using the gs segment selector for other purposes.
Benchmarking various approaches for reading the current CPU number:
ARMv7 Processor rev 4 (v7l)
Machine model: Cubietruck
- Baseline (empty loop): 8.4 ns
- Read CPU from rseq cpu_id: 16.7 ns
- Read CPU from rseq cpu_id (lazy register): 19.8 ns
- glibc 2.19-0ubuntu6.6 getcpu: 301.8 ns
- getcpu system call: 234.9 ns
x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
- Baseline (empty loop): 0.8 ns
- Read CPU from rseq cpu_id: 0.8 ns
- Read CPU from rseq cpu_id (lazy register): 0.8 ns
- Read using gs segment selector: 0.8 ns
- "lsl" inline assembly: 13.0 ns
- glibc 2.19-0ubuntu6 getcpu: 16.6 ns
- getcpu system call: 53.9 ns
- Speed (benchmark taken on v8 of patchset)
Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
expectations, that enabling CONFIG_RSEQ slightly accelerates the
scheduler:
Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
restartable sequences series applied.
* CONFIG_RSEQ=n
avg.: 41.37 s
std.dev.: 0.36 s
* CONFIG_RSEQ=y
avg.: 40.46 s
std.dev.: 0.33 s
- Size
On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
567 bytes, and the data size increase of vmlinux is 5696 bytes.
[1] https://lwn.net/Articles/650333/
[2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Hunter <ahh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com
Recognizing that the audit context is an internal audit value, use an
access function to set the audit context pointer for the task
rather than reaching directly into the task struct to set it.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: merge fuzz in audit.h]
Signed-off-by: Paul Moore <paul@paul-moore.com>
One of the classes of kernel stack content leaks[1] is exposing the
contents of prior heap or stack contents when a new process stack is
allocated. Normally, those stacks are not zeroed, and the old contents
remain in place. In the face of stack content exposure flaws, those
contents can leak to userspace.
Fixing this will make the kernel no longer vulnerable to these flaws, as
the stack will be wiped each time a stack is assigned to a new process.
There's not a meaningful change in runtime performance; it almost looks
like it provides a benefit.
Performing back-to-back kernel builds before:
Run times: 157.86 157.09 158.90 160.94 160.80
Mean: 159.12
Std Dev: 1.54
and after:
Run times: 159.31 157.34 156.71 158.15 160.81
Mean: 158.46
Std Dev: 1.46
Instead of making this a build or runtime config, Andy Lutomirski
recommended this just be enabled by default.
[1] A noisy search for many kinds of stack content leaks can be seen here:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
I did some more with perf and cycle counts on running 100,000 execs of
/bin/true.
before:
Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
Mean: 221015379122.60
Std Dev: 4662486552.47
after:
Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
Mean: 217745009865.40
Std Dev: 5935559279.99
It continues to look like it's faster, though the deviation is rather
wide, but I'm not sure what I could do that would be less noisy. I'm
open to ideas!
Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KASAN splats indicate that in some cases we free a live mm, then
continue to access it, with potentially disastrous results. This is
likely due to a mismatched mmdrop() somewhere in the kernel, but so far
the culprit remains elusive.
Let's have __mmdrop() verify that the mm isn't live for the current
task, similar to the existing check for init_mm. This way, we can catch
this class of issue earlier, and without requiring KASAN.
Currently, idle_task_exit() leaves active_mm stale after it switches to
init_mm. This isn't harmful, but will trigger the new assertions, so we
must adjust idle_task_exit() to update active_mm.
Link: http://lkml.kernel.org/r/20180312140103.19235-1-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>