Changes in 5.10.20
vmlinux.lds.h: add DWARF v5 sections
vdpa/mlx5: fix param validation in mlx5_vdpa_get_config()
debugfs: be more robust at handling improper input in debugfs_lookup()
debugfs: do not attempt to create a new file before the filesystem is initalized
scsi: libsas: docs: Remove notify_ha_event()
scsi: qla2xxx: Fix mailbox Ch erroneous error
kdb: Make memory allocations more robust
w1: w1_therm: Fix conversion result for negative temperatures
PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
PCI: Decline to resize resources if boot config must be preserved
virt: vbox: Do not use wait_event_interruptible when called from kernel context
bfq: Avoid false bfq queue merging
ALSA: usb-audio: Fix PCM buffer allocation in non-vmalloc mode
MIPS: vmlinux.lds.S: add missing PAGE_ALIGNED_DATA() section
vmlinux.lds.h: Define SANTIZER_DISCARDS with CONFIG_GCOV_KERNEL=y
random: fix the RNDRESEEDCRNG ioctl
ALSA: pcm: Call sync_stop at disconnection
ALSA: pcm: Assure sync with the pending stop operation at suspend
ALSA: pcm: Don't call sync_stop if it hasn't been stopped
drm/i915/gt: One more flush for Baytrail clear residuals
ath10k: Fix error handling in case of CE pipe init failure
Bluetooth: btqcomsmd: Fix a resource leak in error handling paths in the probe function
Bluetooth: hci_uart: Fix a race for write_work scheduling
Bluetooth: Fix initializing response id after clearing struct
arm64: dts: renesas: beacon kit: Fix choppy Bluetooth Audio
arm64: dts: renesas: beacon: Fix audio-1.8V pin enable
ARM: dts: exynos: correct PMIC interrupt trigger level on Artik 5
ARM: dts: exynos: correct PMIC interrupt trigger level on Monk
ARM: dts: exynos: correct PMIC interrupt trigger level on Rinato
ARM: dts: exynos: correct PMIC interrupt trigger level on Spring
ARM: dts: exynos: correct PMIC interrupt trigger level on Arndale Octa
ARM: dts: exynos: correct PMIC interrupt trigger level on Odroid XU3 family
arm64: dts: exynos: correct PMIC interrupt trigger level on TM2
arm64: dts: exynos: correct PMIC interrupt trigger level on Espresso
memory: mtk-smi: Fix PM usage counter unbalance in mtk_smi ops
Bluetooth: hci_qca: Fix memleak in qca_controller_memdump
staging: vchiq: Fix bulk userdata handling
staging: vchiq: Fix bulk transfers on 64-bit builds
arm64: dts: qcom: msm8916-samsung-a5u: Fix iris compatible
net: stmmac: dwmac-meson8b: fix enabling the timing-adjustment clock
bpf: Add bpf_patch_call_args prototype to include/linux/bpf.h
bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_args
firmware: arm_scmi: Fix call site of scmi_notification_exit
arm64: dts: allwinner: A64: properly connect USB PHY to port 0
arm64: dts: allwinner: H6: properly connect USB PHY to port 0
arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card
arm64: dts: allwinner: H6: Allow up to 150 MHz MMC bus frequency
arm64: dts: allwinner: A64: Limit MMC2 bus frequency to 150 MHz
arm64: dts: qcom: msm8916-samsung-a2015: Fix sensors
cpufreq: brcmstb-avs-cpufreq: Free resources in error path
cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove()
arm64: dts: rockchip: rk3328: Add clock_in_out property to gmac2phy node
ACPICA: Fix exception code class checks
usb: gadget: u_audio: Free requests only after callback
arm64: dts: qcom: sdm845-db845c: Fix reset-pin of ov8856 node
soc: qcom: socinfo: Fix an off by one in qcom_show_pmic_model()
soc: ti: pm33xx: Fix some resource leak in the error handling paths of the probe function
staging: media: atomisp: Fix size_t format specifier in hmm_alloc() debug statemenet
Bluetooth: drop HCI device reference before return
Bluetooth: Put HCI device if inquiry procedure interrupts
memory: ti-aemif: Drop child node when jumping out loop
ARM: dts: Configure missing thermal interrupt for 4430
usb: dwc2: Do not update data length if it is 0 on inbound transfers
usb: dwc2: Abort transaction after errors with unknown reason
usb: dwc2: Make "trimming xfer length" a debug message
staging: rtl8723bs: wifi_regd.c: Fix incorrect number of regulatory rules
x86/MSR: Filter MSR writes through X86_IOC_WRMSR_REGS ioctl too
arm64: dts: renesas: beacon: Fix EEPROM compatible value
can: mcp251xfd: mcp251xfd_probe(): fix errata reference
ARM: dts: armada388-helios4: assign pinctrl to LEDs
ARM: dts: armada388-helios4: assign pinctrl to each fan
arm64: dts: armada-3720-turris-mox: rename u-boot mtd partition to a53-firmware
opp: Correct debug message in _opp_add_static_v2()
Bluetooth: btusb: Fix memory leak in btusb_mtk_wmt_recv
soc: qcom: ocmem: don't return NULL in of_get_ocmem
arm64: dts: msm8916: Fix reserved and rfsa nodes unit address
arm64: dts: meson: fix broken wifi node for Khadas VIM3L
iwlwifi: mvm: set enabled in the PPAG command properly
ARM: s3c: fix fiq for clang IAS
optee: simplify i2c access
staging: wfx: fix possible panic with re-queued frames
ARM: at91: use proper asm syntax in pm_suspend
ath10k: Fix suspicious RCU usage warning in ath10k_wmi_tlv_parse_peer_stats_info()
ath10k: Fix lockdep assertion warning in ath10k_sta_statistics
ath11k: fix a locking bug in ath11k_mac_op_start()
soc: aspeed: snoop: Add clock control logic
iwlwifi: mvm: fix the type we use in the PPAG table validity checks
iwlwifi: mvm: store PPAG enabled/disabled flag properly
iwlwifi: mvm: send stored PPAG command instead of local
iwlwifi: mvm: assign SAR table revision to the command later
iwlwifi: mvm: don't check if CSA event is running before removing
bpf_lru_list: Read double-checked variable once without lock
iwlwifi: pnvm: set the PNVM again if it was already loaded
iwlwifi: pnvm: increment the pointer before checking the TLV
ath9k: fix data bus crash when setting nf_override via debugfs
selftests/bpf: Convert test_xdp_redirect.sh to bash
ibmvnic: Set to CLOSED state even on error
bnxt_en: reverse order of TX disable and carrier off
bnxt_en: Fix devlink info's stored fw.psid version format.
xen/netback: fix spurious event detection for common event case
dpaa2-eth: fix memory leak in XDP_REDIRECT
net: phy: consider that suspend2ram may cut off PHY power
net/mlx5e: Don't change interrupt moderation params when DIM is enabled
net/mlx5e: Change interrupt moderation channel params also when channels are closed
net/mlx5: Fix health error state handling
net/mlx5e: Replace synchronize_rcu with synchronize_net
net/mlx5e: kTLS, Use refcounts to free kTLS RX priv context
net/mlx5: Disable devlink reload for multi port slave device
net/mlx5: Disallow RoCE on multi port slave device
net/mlx5: Disallow RoCE on lag device
net/mlx5: Disable devlink reload for lag devices
net/mlx5e: CT: manage the lifetime of the ct entry object
net/mlx5e: Check tunnel offload is required before setting SWP
mac80211: fix potential overflow when multiplying to u32 integers
libbpf: Ignore non function pointer member in struct_ops
bpf: Fix an unitialized value in bpf_iter
bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocation
bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx
selftests: mptcp: fix ACKRX debug message
tcp: fix SO_RCVLOWAT related hangs under mem pressure
net: axienet: Handle deferred probe on clock properly
cxgb4/chtls/cxgbit: Keeping the max ofld immediate data size same in cxgb4 and ulds
b43: N-PHY: Fix the update of coef for the PHY revision >= 3case
bpf: Clear subreg_def for global function return values
ibmvnic: add memory barrier to protect long term buffer
ibmvnic: skip send_request_unmap for timeout reset
net: dsa: felix: perform teardown in reverse order of setup
net: dsa: felix: don't deinitialize unused ports
net: phy: mscc: adding LCPLL reset to VSC8514
net: amd-xgbe: Reset the PHY rx data path when mailbox command timeout
net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning
net: amd-xgbe: Reset link when the link never comes back
net: amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP
net: mvneta: Remove per-cpu queue mapping for Armada 3700
net: enetc: fix destroyed phylink dereference during unbind
tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer
tty: implement read_iter
fbdev: aty: SPARC64 requires FB_ATY_CT
drm/gma500: Fix error return code in psb_driver_load()
gma500: clean up error handling in init
drm/fb-helper: Add missed unlocks in setcmap_legacy()
drm/panel: mantix: Tweak init sequence
drm/vc4: hdmi: Take into account the clock doubling flag in atomic_check
crypto: sun4i-ss - linearize buffers content must be kept
crypto: sun4i-ss - fix kmap usage
crypto: arm64/aes-ce - really hide slower algos when faster ones are enabled
hwrng: ingenic - Fix a resource leak in an error handling path
media: allegro: Fix use after free on error
kcsan: Rewrite kcsan_prandom_u32_max() without prandom_u32_state()
drm: rcar-du: Fix PM reference leak in rcar_cmm_enable()
drm: rcar-du: Fix crash when using LVDS1 clock for CRTC
drm: rcar-du: Fix the return check of of_parse_phandle and of_find_device_by_node
drm/amdgpu: Fix macro name _AMDGPU_TRACE_H_ in preprocessor if condition
MIPS: c-r4k: Fix section mismatch for loongson2_sc_init
MIPS: lantiq: Explicitly compare LTQ_EBU_PCC_ISTAT against 0
drm/virtio: make sure context is created in gem open
drm/fourcc: fix Amlogic format modifier masks
media: ipu3-cio2: Build only for x86
media: i2c: ov5670: Fix PIXEL_RATE minimum value
media: imx: Unregister csc/scaler only if registered
media: imx: Fix csc/scaler unregister
media: mtk-vcodec: fix error return code in vdec_vp9_decode()
media: camss: missing error code in msm_video_register()
media: vsp1: Fix an error handling path in the probe function
media: em28xx: Fix use-after-free in em28xx_alloc_urbs
media: media/pci: Fix memleak in empress_init
media: tm6000: Fix memleak in tm6000_start_stream
media: aspeed: fix error return code in aspeed_video_setup_video()
ASoC: cs42l56: fix up error handling in probe
ASoC: qcom: qdsp6: Move frontend AIFs to q6asm-dai
evm: Fix memleak in init_desc
crypto: bcm - Rename struct device_private to bcm_device_private
sched/fair: Avoid stale CPU util_est value for schedutil in task dequeue
drm/sun4i: tcon: fix inverted DCLK polarity
media: imx7: csi: Fix regression for parallel cameras on i.MX6UL
media: imx7: csi: Fix pad link validation
media: ti-vpe: cal: fix write to unallocated memory
MIPS: properly stop .eh_frame generation
MIPS: Compare __SYNC_loongson3_war against 0
drm/tegra: Fix reference leak when pm_runtime_get_sync() fails
drm/amdgpu: toggle on DF Cstate after finishing xgmi injection
bsg: free the request before return error code
macintosh/adb-iop: Use big-endian autopoll mask
drm/amd/display: Fix 10/12 bpc setup in DCE output bit depth reduction.
drm/amd/display: Fix HDMI deep color output for DCE 6-11.
media: software_node: Fix refcounts in software_node_get_next_child()
media: lmedm04: Fix misuse of comma
media: vidtv: psi: fix missing crc for PMT
media: atomisp: Fix a buffer overflow in debug code
media: qm1d1c0042: fix error return code in qm1d1c0042_init()
media: cx25821: Fix a bug when reallocating some dma memory
media: mtk-vcodec: fix argument used when DEBUG is defined
media: pxa_camera: declare variable when DEBUG is defined
media: uvcvideo: Accept invalid bFormatIndex and bFrameIndex values
sched/eas: Don't update misfit status if the task is pinned
f2fs: compress: fix potential deadlock
ASoC: qcom: lpass-cpu: Remove bit clock state check
ASoC: SOF: Intel: hda: cancel D0i3 work during runtime suspend
perf/arm-cmn: Fix PMU instance naming
perf/arm-cmn: Move IRQs when migrating context
mtd: parser: imagetag: fix error codes in bcm963xx_parse_imagetag_partitions()
crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
crypto: talitos - Fix ctr(aes) on SEC1
drm/nouveau: bail out of nouveau_channel_new if channel init fails
mm: proc: Invalidate TLB after clearing soft-dirty page state
ata: ahci_brcm: Add back regulators management
ASoC: cpcap: fix microphone timeslot mask
ASoC: codecs: add missing max_register in regmap config
mtd: parsers: afs: Fix freeing the part name memory in failure
f2fs: fix to avoid inconsistent quota data
drm/amdgpu: Prevent shift wrapping in amdgpu_read_mask()
f2fs: fix a wrong condition in __submit_bio
ASoC: qcom: Fix typo error in HDMI regmap config callbacks
KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs
drm/mediatek: Check if fb is null
Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind()
ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A5E
ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A3E
locking/lockdep: Avoid unmatched unlock
ASoC: qcom: lpass: Fix i2s ctl register bit map
ASoC: rt5682: Fix panic in rt5682_jack_detect_handler happening during system shutdown
ASoC: SOF: debug: Fix a potential issue on string buffer termination
btrfs: clarify error returns values in __load_free_space_cache
btrfs: fix double accounting of ordered extent for subpage case in btrfs_invalidapge
KVM: x86: Restore all 64 bits of DR6 and DR7 during RSM on x86-64
s390/zcrypt: return EIO when msg retry limit reached
drm/vc4: hdmi: Move hdmi reset to bind
drm/vc4: hdmi: Fix register offset with longer CEC messages
drm/vc4: hdmi: Fix up CEC registers
drm/vc4: hdmi: Restore cec physical address on reconnect
drm/vc4: hdmi: Compute the CEC clock divider from the clock rate
drm/vc4: hdmi: Update the CEC clock divider on HSM rate change
drm/lima: fix reference leak in lima_pm_busy
drm/dp_mst: Don't cache EDIDs for physical ports
hwrng: timeriomem - Fix cooldown period calculation
crypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key()
io_uring: fix possible deadlock in io_uring_poll
nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUs
nvmet-tcp: fix potential race of tcp socket closing accept_work
nvme-multipath: set nr_zones for zoned namespaces
nvmet: remove extra variable in identify ns
nvmet: set status to 0 in case for invalid nsid
ASoC: SOF: sof-pci-dev: add missing Up-Extreme quirk
ima: Free IMA measurement buffer on error
ima: Free IMA measurement buffer after kexec syscall
ASoC: simple-card-utils: Fix device module clock
fs/jfs: fix potential integer overflow on shift of a int
jffs2: fix use after free in jffs2_sum_write_data()
ubifs: Fix memleak in ubifs_init_authentication
ubifs: replay: Fix high stack usage, again
ubifs: Fix error return code in alloc_wbufs()
irqchip/imx: IMX_INTMUX should not default to y, unconditionally
smp: Process pending softirqs in flush_smp_call_function_from_idle()
drm/amdgpu/display: remove hdcp_srm sysfs on device removal
capabilities: Don't allow writing ambiguous v3 file capabilities
HSI: Fix PM usage counter unbalance in ssi_hw_init
power: supply: cpcap: Add missing IRQF_ONESHOT to fix regression
clk: meson: clk-pll: fix initializing the old rate (fallback) for a PLL
clk: meson: clk-pll: make "ret" a signed integer
clk: meson: clk-pll: propagate the error from meson_clk_pll_set_rate()
selftests/powerpc: Make the test check in eeh-basic.sh posix compliant
regulator: qcom-rpmh-regulator: add pm8009-1 chip revision
arm64: dts: qcom: qrb5165-rb5: fix pm8009 regulators
quota: Fix memory leak when handling corrupted quota file
i2c: iproc: handle only slave interrupts which are enabled
i2c: iproc: update slave isr mask (ISR_MASK_SLAVE)
i2c: iproc: handle master read request
spi: cadence-quadspi: Abort read if dummy cycles required are too many
clk: sunxi-ng: h6: Fix CEC clock
clk: renesas: r8a779a0: Remove non-existent S2 clock
clk: renesas: r8a779a0: Fix parent of CBFUSA clock
HID: core: detect and skip invalid inputs to snto32()
RDMA/siw: Fix handling of zero-sized Read and Receive Queues.
dmaengine: fsldma: Fix a resource leak in the remove function
dmaengine: fsldma: Fix a resource leak in an error handling path of the probe function
dmaengine: owl-dma: Fix a resource leak in the remove function
dmaengine: hsu: disable spurious interrupt
mfd: bd9571mwv: Use devm_mfd_add_devices()
power: supply: cpcap-charger: Fix missing power_supply_put()
power: supply: cpcap-battery: Fix missing power_supply_put()
power: supply: cpcap-charger: Fix power_supply_put on null battery pointer
fdt: Properly handle "no-map" field in the memory region
of/fdt: Make sure no-map does not remove already reserved regions
RDMA/rtrs: Extend ibtrs_cq_qp_create
RDMA/rtrs-srv: Release lock before call into close_sess
RDMA/rtrs-srv: Use sysfs_remove_file_self for disconnect
RDMA/rtrs-clt: Set mininum limit when create QP
RDMA/rtrs: Call kobject_put in the failure path
RDMA/rtrs-srv: Fix missing wr_cqe
RDMA/rtrs-clt: Refactor the failure cases in alloc_clt
RDMA/rtrs-srv: Init wr_cnt as 1
power: reset: at91-sama5d2_shdwc: fix wkupdbc mask
rtc: s5m: select REGMAP_I2C
dmaengine: idxd: set DMA channel to be private
power: supply: fix sbs-charger build, needs REGMAP_I2C
clocksource/drivers/ixp4xx: Select TIMER_OF when needed
clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined
spi: imx: Don't print error on -EPROBEDEFER
RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation
IB/mlx5: Add mutex destroy call to cap_mask_mutex mutex
clk: sunxi-ng: h6: Fix clock divider range on some clocks
platform/chrome: cros_ec_proto: Use EC_HOST_EVENT_MASK not BIT
platform/chrome: cros_ec_proto: Add LID and BATTERY to default mask
regulator: axp20x: Fix reference cout leak
watch_queue: Drop references to /dev/watch_queue
certs: Fix blacklist flag type confusion
regulator: s5m8767: Fix reference count leak
spi: atmel: Put allocated master before return
regulator: s5m8767: Drop regulators OF node reference
power: supply: axp20x_usb_power: Init work before enabling IRQs
power: supply: smb347-charger: Fix interrupt usage if interrupt is unavailable
regulator: core: Avoid debugfs: Directory ... already present! error
isofs: release buffer head before return
watchdog: intel-mid_wdt: Postpone IRQ handler registration till SCU is ready
auxdisplay: ht16k33: Fix refresh rate handling
objtool: Fix error handling for STD/CLD warnings
objtool: Fix retpoline detection in asm code
objtool: Fix ".cold" section suffix check for newer versions of GCC
scsi: lpfc: Fix ancient double free
iommu: Switch gather->end to the inclusive end
IB/umad: Return EIO in case of when device disassociated
IB/umad: Return EPOLLERR in case of when device disassociated
KVM: PPC: Make the VMX instruction emulation routines static
powerpc/47x: Disable 256k page size
powerpc/time: Enable sched clock for irqtime
mmc: owl-mmc: Fix a resource leak in an error handling path and in the remove function
mmc: sdhci-sprd: Fix some resource leaks in the remove function
mmc: usdhi6rol0: Fix a resource leak in the error handling path of the probe
mmc: renesas_sdhi_internal_dmac: Fix DMA buffer alignment from 8 to 128-bytes
ARM: 9046/1: decompressor: Do not clear SCTLR.nTLSMD for ARMv7+ cores
i2c: qcom-geni: Store DMA mapping data in geni_i2c_dev struct
amba: Fix resource leak for drivers without .remove
iommu: Move iotlb_sync_map out from __iommu_map
iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping
IB/mlx5: Return appropriate error code instead of ENOMEM
IB/cm: Avoid a loop when device has 255 ports
tracepoint: Do not fail unregistering a probe due to memory failure
rtc: zynqmp: depend on HAS_IOMEM
perf tools: Fix DSO filtering when not finding a map for a sampled address
perf vendor events arm64: Fix Ampere eMag event typo
RDMA/rxe: Fix coding error in rxe_recv.c
RDMA/rxe: Fix coding error in rxe_rcv_mcast_pkt
RDMA/rxe: Correct skb on loopback path
spi: stm32: properly handle 0 byte transfer
mfd: altera-sysmgr: Fix physical address storing more
mfd: wm831x-auxadc: Prevent use after free in wm831x_auxadc_read_irq()
powerpc/pseries/dlpar: handle ibm, configure-connector delay status
powerpc/8xx: Fix software emulation interrupt
clk: qcom: gcc-msm8998: Fix Alpha PLL type for all GPLLs
kunit: tool: fix unit test cleanup handling
kselftests: dmabuf-heaps: Fix Makefile's inclusion of the kernel's usr/include dir
RDMA/hns: Fixed wrong judgments in the goto branch
RDMA/siw: Fix calculation of tx_valid_cpus size
RDMA/hns: Fix type of sq_signal_bits
RDMA/hns: Disable RQ inline by default
clk: divider: fix initialization with parent_hw
spi: pxa2xx: Fix the controller numbering for Wildcat Point
powerpc/uaccess: Avoid might_fault() when user access is enabled
powerpc/kuap: Restore AMR after replaying soft interrupts
regulator: qcom-rpmh: fix pm8009 ldo7
clk: aspeed: Fix APLL calculate formula from ast2600-A2
selftests/ftrace: Update synthetic event syntax errors
perf symbols: Use (long) for iterator for bfd symbols
regulator: bd718x7, bd71828, Fix dvs voltage levels
spi: dw: Avoid stack content exposure
spi: Skip zero-length transfers in spi_transfer_one_message()
printk: avoid prb_first_valid_seq() where possible
perf symbols: Fix return value when loading PE DSO
nfsd: register pernet ops last, unregister first
svcrdma: Hold private mutex while invoking rdma_accept()
ceph: fix flush_snap logic after putting caps
RDMA/hns: Fixes missing error code of CMDQ
RDMA/ucma: Fix use-after-free bug in ucma_create_uevent
RDMA/rtrs-srv: Fix stack-out-of-bounds
RDMA/rtrs: Only allow addition of path to an already established session
RDMA/rtrs-srv: fix memory leak by missing kobject free
RDMA/rtrs-srv-sysfs: fix missing put_device
RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR()
Input: sur40 - fix an error code in sur40_probe()
perf record: Fix continue profiling after draining the buffer
perf intel-pt: Fix missing CYC processing in PSB
perf intel-pt: Fix premature IPC
perf intel-pt: Fix IPC with CYC threshold
perf test: Fix unaligned access in sample parsing test
Input: elo - fix an error code in elo_connect()
sparc64: only select COMPAT_BINFMT_ELF if BINFMT_ELF is set
sparc: fix led.c driver when PROC_FS is not enabled
Input: zinitix - fix return type of zinitix_init_touch()
ARM: 9065/1: OABI compat: fix build when EPOLL is not enabled
misc: eeprom_93xx46: Fix module alias to enable module autoprobe
phy: rockchip-emmc: emmc_phy_init() always return 0
phy: cadence-torrent: Fix error code in cdns_torrent_phy_probe()
misc: eeprom_93xx46: Add module alias to avoid breaking support for non device tree users
PCI: rcar: Always allocate MSI addresses in 32bit space
soundwire: cadence: fix ACK/NAK handling
pwm: rockchip: Enable APB clock during register access while probing
pwm: rockchip: rockchip_pwm_probe(): Remove superfluous clk_unprepare()
pwm: rockchip: Eliminate potential race condition when probing
PCI: xilinx-cpm: Fix reference count leak on error path
VMCI: Use set_page_dirty_lock() when unregistering guest memory
PCI: Align checking of syscall user config accessors
mei: hbm: call mei_set_devstate() on hbm stop response
drm/msm: Fix MSM_INFO_GET_IOVA with carveout
drm/msm/dsi: Correct io_start for MSM8994 (20nm PHY)
drm/msm/mdp5: Fix wait-for-commit for cmd panels
drm/msm: Fix race of GPU init vs timestamp power management.
drm/msm: Fix races managing the OOB state for timestamp vs timestamps.
drm/msm/dp: trigger unplug event in msm_dp_display_disable
vfio/iommu_type1: Populate full dirty when detach non-pinned group
vfio/iommu_type1: Fix some sanity checks in detach group
vfio-pci/zdev: fix possible segmentation fault issue
ext4: fix potential htree index checksum corruption
phy: USB_LGM_PHY should depend on X86
coresight: etm4x: Skip accessing TRCPDCR in save/restore
nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of()
nvmem: core: skip child nodes not matching binding
soundwire: bus: use sdw_update_no_pm when initializing a device
soundwire: bus: use sdw_write_no_pm when setting the bus scale registers
soundwire: export sdw_write/read_no_pm functions
soundwire: bus: fix confusion on device used by pm_runtime
misc: fastrpc: fix incorrect usage of dma_map_sgtable
remoteproc/mediatek: acknowledge watchdog IRQ after handled
regmap: sdw: use _no_pm functions in regmap_read/write
ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it
mailbox: sprd: correct definition of SPRD_OUTBOX_FIFO_FULL
device-dax: Fix default return code of range_parse()
PCI: pci-bridge-emul: Fix array overruns, improve safety
PCI: cadence: Fix DMA range mapping early return error
i40e: Fix flow for IPv6 next header (extension header)
i40e: Add zero-initialization of AQ command structures
i40e: Fix overwriting flow control settings during driver loading
i40e: Fix addition of RX filters after enabling FW LLDP agent
i40e: Fix VFs not created
Take mmap lock in cacheflush syscall
nios2: fixed broken sys_clone syscall
i40e: Fix add TC filter for IPv6
octeontx2-af: Fix an off by one in rvu_dbg_qsize_write()
pwm: iqs620a: Fix overflow and optimize calculations
vfio/type1: Use follow_pte()
ice: report correct max number of TCs
ice: Account for port VLAN in VF max packet size calculation
ice: Fix state bits on LLDP mode switch
ice: update the number of available RSS queues
net: stmmac: fix CBS idleslope and sendslope calculation
net/mlx4_core: Add missed mlx4_free_cmd_mailbox()
PCI: rockchip: Make 'ep-gpios' DT property optional
vxlan: move debug check after netdev unregister
wireguard: device: do not generate ICMP for non-IP packets
wireguard: kconfig: use arm chacha even with no neon
ocfs2: fix a use after free on error
mm: memcontrol: fix NR_ANON_THPS accounting in charge moving
mm: memcontrol: fix slub memory accounting
mm/memory.c: fix potential pte_unmap_unlock pte error
mm/hugetlb: fix potential double free in hugetlb_register_node() error path
mm/hugetlb: suppress wrong warning info when alloc gigantic page
mm/compaction: fix misbehaviors of fast_find_migrateblock()
r8169: fix jumbo packet handling on RTL8168e
NFSv4: Fixes for nfs4_bitmask_adjust()
KVM: SVM: Intercept INVPCID when it's disabled to inject #UD
KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages
arm64: Add missing ISB after invalidating TLB in __primary_switch
i2c: brcmstb: Fix brcmstd_send_i2c_cmd condition
i2c: exynos5: Preserve high speed master code
mm,thp,shmem: make khugepaged obey tmpfs mount flags
mm: fix memory_failure() handling of dax-namespace metadata
mm/rmap: fix potential pte_unmap on an not mapped pte
proc: use kvzalloc for our kernel buffer
csky: Fix a size determination in gpr_get()
scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc
block: reopen the device in blkdev_reread_part
ide/falconide: Fix module unload
scsi: sd: Fix Opal support
blk-settings: align max_sectors on "logical_block_size" boundary
soundwire: intel: fix possible crash when no device is detected
ACPI: property: Fix fwnode string properties matching
ACPI: configfs: add missing check after configfs_register_default_group()
cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known
HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming
HID: wacom: Ignore attempts to overwrite the touch_max value from HID
Input: raydium_ts_i2c - do not send zero length
Input: xpad - add support for PowerA Enhanced Wired Controller for Xbox Series X|S
Input: joydev - prevent potential read overflow in ioctl
Input: i8042 - add ASUS Zenbook Flip to noselftest list
media: mceusb: Fix potential out-of-bounds shift
USB: serial: option: update interface mapping for ZTE P685M
usb: musb: Fix runtime PM race in musb_queue_resume_work
usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
USB: serial: ftdi_sio: fix FTX sub-integer prescaler
USB: serial: pl2303: fix line-speed handling on newer chips
USB: serial: mos7840: fix error code in mos7840_write()
USB: serial: mos7720: fix error code in mos7720_write()
phy: lantiq: rcu-usb2: wait after clock enable
ALSA: fireface: fix to parse sync status register of latter protocol
ALSA: hda: Add another CometLake-H PCI ID
ALSA: hda/hdmi: Drop bogus check at closing a stream
ALSA: hda/realtek: modify EAPD in the ALC886
ALSA: hda/realtek: Quirk for HP Spectre x360 14 amp setup
MIPS: Ingenic: Disable HPTLB for D0 XBurst CPUs too
MIPS: Support binutils configured with --enable-mips-fix-loongson3-llsc=yes
MIPS: VDSO: Use CLANG_FLAGS instead of filtering out '--target='
Revert "MIPS: Octeon: Remove special handling of CONFIG_MIPS_ELF_APPENDED_DTB=y"
Revert "bcache: Kill btree_io_wq"
bcache: Give btree_io_wq correct semantics again
bcache: Move journal work to new flush wq
Revert "drm/amd/display: Update NV1x SR latency values"
drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth()
drm/amd/display: Remove Assert from dcn10_get_dig_frontend
drm/amd/display: Add vupdate_no_lock interrupts for DCN2.1
drm/amdkfd: Fix recursive lock warnings
drm/amdgpu: Set reference clock to 100Mhz on Renoir (v2)
drm/nouveau/kms: handle mDP connectors
drm/modes: Switch to 64bit maths to avoid integer overflow
drm/sched: Cancel and flush all outstanding jobs before finish.
drm/panel: kd35t133: allow using non-continuous dsi clock
drm/rockchip: Require the YTR modifier for AFBC
ASoC: siu: Fix build error by a wrong const prefix
selinux: fix inconsistency between inode_getxattr and inode_listsecurity
erofs: initialized fields can only be observed after bit is set
tpm_tis: Fix check_locality for correct locality acquisition
tpm_tis: Clean up locality release
KEYS: trusted: Fix incorrect handling of tpm_get_random()
KEYS: trusted: Fix migratable=1 failing
KEYS: trusted: Reserve TPM for seal and unseal operations
btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
btrfs: do not warn if we can't find the reloc root when looking up backref
btrfs: add asserts for deleting backref cache nodes
btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
btrfs: fix reloc root leak with 0 ref reloc roots on recovery
btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
btrfs: account for new extents being deleted in total_bytes_pinned
btrfs: fix extent buffer leak on failure to copy root
drm/i915/gt: Flush before changing register state
drm/i915/gt: Correct surface base address for renderclear
crypto: arm64/sha - add missing module aliases
crypto: aesni - prevent misaligned buffers on the stack
crypto: michael_mic - fix broken misalignment handling
crypto: sun4i-ss - checking sg length is not sufficient
crypto: sun4i-ss - IV register does not work on A10 and A13
crypto: sun4i-ss - handle BigEndian for cipher
crypto: sun4i-ss - initialize need_fallback
soc: samsung: exynos-asv: don't defer early on not-supported SoCs
soc: samsung: exynos-asv: handle reading revision register error
seccomp: Add missing return in non-void function
arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL)
misc: rtsx: init of rts522a add OCP power off when no card is present
drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue
pstore: Fix typo in compression option name
dts64: mt7622: fix slow sd card access
arm64: dts: agilex: fix phy interface bit shift for gmac1 and gmac2
staging/mt7621-dma: mtk-hsdma.c->hsdma-mt7621.c
staging: gdm724x: Fix DMA from stack
staging: rtl8188eu: Add Edimax EW-7811UN V2 to device table
floppy: reintroduce O_NDELAY fix
media: i2c: max9286: fix access to unallocated memory
media: ir_toy: add another IR Droid device
media: ipu3-cio2: Fix mbus_code processing in cio2_subdev_set_fmt()
media: marvell-ccic: power up the device on mclk enable
media: smipcie: fix interrupt handling and IR timeout
x86/virt: Eat faults on VMXOFF in reboot flows
x86/reboot: Force all cpus to exit VMX root if VMX is supported
x86/fault: Fix AMD erratum #91 errata fixup for user code
x86/entry: Fix instrumentation annotation
powerpc/prom: Fix "ibm,arch-vec-5-platform-support" scan
rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
rcu/nocb: Perform deferred wake up before last idle's need_resched() check
kprobes: Fix to delay the kprobes jump optimization
arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55
iommu/arm-smmu-qcom: Fix mask extraction for bootloader programmed SMRs
arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails
arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing
arm64 module: set plt* section addresses to 0x0
arm64: spectre: Prevent lockdep splat on v4 mitigation enable path
riscv: Disable KSAN_SANITIZE for vDSO
watchdog: qcom: Remove incorrect usage of QCOM_WDT_ENABLE_IRQ
watchdog: mei_wdt: request stop on unregister
coresight: etm4x: Handle accesses to TRCSTALLCTLR
mtd: spi-nor: sfdp: Fix last erase region marking
mtd: spi-nor: sfdp: Fix wrong erase type bitmask for overlaid region
mtd: spi-nor: core: Fix erase type discovery for overlaid region
mtd: spi-nor: core: Add erase size check for erase command initialization
mtd: spi-nor: hisi-sfc: Put child node np on error path
fs/affs: release old buffer head on error path
seq_file: document how per-entry resources are managed.
x86: fix seq_file iteration for pat/memtype.c
mm: memcontrol: fix swap undercounting in cgroup2
mm: memcontrol: fix get_active_memcg return value
hugetlb: fix update_and_free_page contig page struct assumption
hugetlb: fix copy_huge_page_from_user contig page struct assumption
mm/vmscan: restore zone_reclaim_mode ABI
mm, compaction: make fast_isolate_freepages() stay within zone
KVM: nSVM: fix running nested guests when npt=0
nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer
module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
mmc: sdhci-esdhc-imx: fix kernel panic when remove module
mmc: sdhci-pci-o2micro: Bug fix for SDR104 HW tuning failure
powerpc/32: Preserve cr1 in exception prolog stack check to fix build error
powerpc/kexec_file: fix FDT size estimation for kdump kernel
powerpc/32s: Add missing call to kuep_lock on syscall entry
spmi: spmi-pmic-arb: Fix hw_irq overflow
mei: fix transfer over dma with extended header
mei: me: emmitsburg workstation DID
mei: me: add adler lake point S DID
mei: me: add adler lake point LP DID
gpio: pcf857x: Fix missing first interrupt
mfd: gateworks-gsc: Fix interrupt type
printk: fix deadlock when kernel panic
exfat: fix shift-out-of-bounds in exfat_fill_super()
zonefs: Fix file size of zones in full condition
kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks
cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument
cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available
proc: don't allow async path resolution of /proc/thread-self components
s390/vtime: fix inline assembly clobber list
virtio/s390: implement virtio-ccw revision 2 correctly
um: mm: check more comprehensively for stub changes
um: defer killing userspace on page table update failures
irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap
f2fs: fix out-of-repair __setattr_copy()
f2fs: enforce the immutable flag on open files
f2fs: flush data when enabling checkpoint back
sparc32: fix a user-triggerable oops in clear_user()
spi: fsl: invert spisel_boot signal on MPC8309
spi: spi-synquacer: fix set_cs handling
gfs2: fix glock confusion in function signal_our_withdraw
gfs2: Don't skip dlm unlock if glock has an lvb
gfs2: Lock imbalance on error path in gfs2_recover_one
gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end
dm: fix deadlock when swapping to encrypted device
dm table: fix iterate_devices based device capability checks
dm table: fix DAX iterate_devices based device capability checks
dm table: fix zoned iterate_devices based device capability checks
dm writecache: fix performance degradation in ssd mode
dm writecache: return the exact table values that were set
dm writecache: fix writing beyond end of underlying device when shrinking
dm era: Recover committed writeset after crash
dm era: Update in-core bitset after committing the metadata
dm era: Verify the data block size hasn't changed
dm era: Fix bitset memory leaks
dm era: Use correct value size in equality function of writeset tree
dm era: Reinitialize bitset cache before digesting a new writeset
dm era: only resize metadata in preresume
drm/i915: Reject 446-480MHz HDMI clock on GLK
kgdb: fix to kill breakpoints on initmem after boot
ipv6: silence compilation warning for non-IPV6 builds
net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending
wireguard: selftests: test multiple parallel streams
wireguard: queueing: get rid of per-peer ring buffers
net: sched: fix police ext initialization
net: qrtr: Fix memory leak in qrtr_tun_open
net_sched: fix RTNL deadlock again caused by request_module()
ARM: dts: aspeed: Add LCLK to lpc-snoop
Linux 5.10.20
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3fbcecd9413ce212dac68d5cc800c9457feba56a
[ Upstream commit a643bff752 ]
Add bpf_patch_call_args() prototype. This function is called from BPF verifier
and only if CONFIG_BPF_JIT_ALWAYS_ON is not defined. This fixes compiler
warning about missing prototype in some kernel configurations.
Fixes: 1ea47e01ad ("bpf: add support for bpf_call to interpreter")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-2-andrii@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
BPF dispatcher functions are patched at runtime to perform direct
instead of indirect calls. Disable CFI for the dispatcher functions
to avoid conflicts.
Bug: 145210207
Change-Id: Iea72f5a9fe09dd5adbb90b0174945707f42594b0
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Recent work in f4d0525921 ("bpf: Add map_meta_equal map ops") and 134fede4ee
("bpf: Relax max_entries check for most of the inner map types") added support
for dynamic inner max elements for most map-in-map types. Exceptions were maps
like array or prog array where the map_gen_lookup() callback uses the maps'
max_entries field as a constant when emitting instructions.
We recently implemented Maglev consistent hashing into Cilium's load balancer
which uses map-in-map with an outer map being hash and inner being array holding
the Maglev backend table for each service. This has been designed this way in
order to reduce overall memory consumption given the outer hash map allows to
avoid preallocating a large, flat memory area for all services. Also, the
number of service mappings is not always known a-priori.
The use case for dynamic inner array map entries is to further reduce memory
overhead, for example, some services might just have a small number of back
ends while others could have a large number. Right now the Maglev backend table
for small and large number of backends would need to have the same inner array
map entries which adds a lot of unneeded overhead.
Dynamic inner array map entries can be realized by avoiding the inlined code
generation for their lookup. The lookup will still be efficient since it will
be calling into array_map_lookup_elem() directly and thus avoiding retpoline.
The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips
inline code generation and relaxes array_map_meta_equal() check to ignore both
maps' max_entries. This also still allows to have faster lookups for map-in-map
when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed.
Example code generation where inner map is dynamic sized array:
# bpftool p d x i 125
int handle__sys_enter(void * ctx):
; int handle__sys_enter(void *ctx)
0: (b4) w1 = 0
; int key = 0;
1: (63) *(u32 *)(r10 -4) = r1
2: (bf) r2 = r10
;
3: (07) r2 += -4
; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key);
4: (18) r1 = map[id:468]
6: (07) r1 += 272
7: (61) r0 = *(u32 *)(r2 +0)
8: (35) if r0 >= 0x3 goto pc+5
9: (67) r0 <<= 3
10: (0f) r0 += r1
11: (79) r0 = *(u64 *)(r0 +0)
12: (15) if r0 == 0x0 goto pc+1
13: (05) goto pc+1
14: (b7) r0 = 0
15: (b4) w6 = -1
; if (!inner_map)
16: (15) if r0 == 0x0 goto pc+6
17: (bf) r2 = r10
;
18: (07) r2 += -4
; val = bpf_map_lookup_elem(inner_map, &key);
19: (bf) r1 = r0 | No inlining but instead
20: (85) call array_map_lookup_elem#149280 | call to array_map_lookup_elem()
; return val ? *val : -1; | for inner array lookup.
21: (15) if r0 == 0x0 goto pc+1
; return val ? *val : -1;
22: (61) r6 = *(u32 *)(r0 +0)
; }
23: (bc) w0 = w6
24: (95) exit
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
Add bpf_this_cpu_ptr() to help access percpu var on this cpu. This
helper always returns a valid pointer, therefore no need to check
returned value for NULL. Also note that all programs run with
preemption disabled, which means that the returned pointer is stable
during all the execution of the program.
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-6-haoluo@google.com
Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
except that it may return NULL. This happens when the cpu parameter is
out of range. So the caller must check the returned value.
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com
This enables support for attaching freplace programs to multiple attach
points. It does this by amending the UAPI for bpf_link_Create with a target
btf ID that can be used to supply the new attachment point along with the
target program fd. The target must be compatible with the target that was
supplied at program load time.
The implementation reuses the checks that were factored out of
check_attach_btf_id() to ensure compatibility between the BTF types of the
old and new attachment. If these match, a new bpf_tracing_link will be
created for the new attach target, allowing multiple attachments to
co-exist simultaneously.
The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, there is no API support for doing so.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk
In preparation for allowing multiple attachments of freplace programs, move
the references to the target program and trampoline into the
bpf_tracing_link structure when that is created. To do this atomically,
introduce a new mutex in prog->aux to protect writing to the two pointers
to target prog and trampoline, and rename the members to make it clear that
they are related.
With this change, it is no longer possible to attach the same tracing
program multiple times (detaching in-between), since the reference from the
tracing program to the target disappears on the first attach. However,
since the next patch will let the caller supply an attach target, that will
also make it possible to attach to the same place multiple times.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355059.48470.2503076992210324984.stgit@toke.dk
A helper is added to support tracing kernel type information in BPF
using the BPF Type Format (BTF). Its signature is
long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
u32 btf_ptr_size, u64 flags);
struct btf_ptr * specifies
- a pointer to the data to be traced
- the BTF id of the type of data pointed to
- a flags field is provided for future use; these flags
are not to be confused with the BTF_F_* flags
below that control how the btf_ptr is displayed; the
flags member of the struct btf_ptr may be used to
disambiguate types in kernel versus module BTF, etc;
the main distinction is the flags relate to the type
and information needed in identifying it; not how it
is displayed.
For example a BPF program with a struct sk_buff *skb
could do the following:
static struct btf_ptr b = { };
b.ptr = skb;
b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);
Default output looks like this:
(struct sk_buff){
.transport_header = (__u16)65535,
.mac_header = (__u16)65535,
.end = (sk_buff_data_t)192,
.head = (unsigned char *)0x000000007524fd8b,
.data = (unsigned char *)0x000000007524fd8b,
.truesize = (unsigned int)768,
.users = (refcount_t){
.refs = (atomic_t){
.counter = (int)1,
},
},
}
Flags modifying display are as follows:
- BTF_F_COMPACT: no formatting around type information
- BTF_F_NONAME: no struct/union member names/types
- BTF_F_PTR_RAW: show raw (unobfuscated) pointer values;
equivalent to %px.
- BTF_F_ZERO: show zero-valued struct/union members;
they are not displayed by default
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com
The check_attach_btf_id() function really does three things:
1. It performs a bunch of checks on the program to ensure that the
attachment is valid.
2. It stores a bunch of state about the attachment being requested in
the verifier environment and struct bpf_prog objects.
3. It allocates a trampoline for the attachment.
This patch splits out (1.) and (3.) into separate functions which will
perform the checks, but return the computed values instead of directly
modifying the environment. This is done in preparation for reusing the
checks when the actual attachment is happening, which will allow tracing
programs to have multiple (compatible) attachments.
This also fixes a bug where a bunch of checks were skipped if a trampoline
already existed for the tracing target.
Fixes: 6ba43b761c ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Fixes: 1e6c62a882 ("bpf: Introduce sleepable BPF programs")
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In preparation for moving code around, change a bunch of references to
env->log (and the verbose() logging helper) to use bpf_log() and a direct
pointer to struct bpf_verifier_log. While we're touching the function
signature, mark the 'prog' argument to bpf_check_type_match() as const.
Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
for the log struct so we can re-use the code with logging disabled.
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
the target program on a specific CPU. This is achieved by a new flag in
bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
BPF programs that handle perf_event and other percpu resources, as the
program can access these resource locally.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
The meaning of PTR_TO_BTF_ID_OR_NULL differs slightly from other types
denoted with the *_OR_NULL type. For example the types PTR_TO_SOCKET
and PTR_TO_SOCKET_OR_NULL can be used for branch analysis because the
type PTR_TO_SOCKET is guaranteed to _not_ have a null value.
In contrast PTR_TO_BTF_ID and BTF_TO_BTF_ID_OR_NULL have slightly
different meanings. A PTR_TO_BTF_TO_ID may be a pointer to NULL value,
but it is safe to read this pointer in the program context because
the program context will handle any faults. The fallout is for
PTR_TO_BTF_ID the verifier can assume reads are safe, but can not
use the type in branch analysis. Additionally, authors need to be
extra careful when passing PTR_TO_BTF_ID into helpers. In general
helpers consuming type PTR_TO_BTF_ID will need to assume it may
be null.
Seeing the above is not obvious to readers without the back knowledge
lets add a comment in the type definition.
Editorial comment, as networking and tracing programs get closer
and more tightly merged we may need to consider a new type that we
can ensure is non-null for branch analysis and also passing into
helpers.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
There is a constant need to add more fields into the bpf_tcp_sock
for the bpf programs running at tc, sock_ops...etc.
A current workaround could be to use bpf_probe_read_kernel(). However,
other than making another helper call for reading each field and missing
CO-RE, it is also not as intuitive to use as directly reading
"tp->lsndtime" for example. While already having perfmon cap to do
bpf_probe_read_kernel(), it will be much easier if the bpf prog can
directly read from the tcp_sock.
This patch tries to do that by using the existing casting-helpers
bpf_skc_to_*() whose func_proto returns a btf_id. For example, the
func_proto of bpf_skc_to_tcp_sock returns the btf_id of the
kernel "struct tcp_sock".
These helpers are also added to is_ptr_cast_function().
It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id.
That will keep the ref-tracking works properly.
The bpf_skc_to_* helpers are made available to most of the bpf prog
types in filter.c. The bpf_skc_to_* helpers will be limited by
perfmon cap.
This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON. The helper accepting
this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON])
or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON). The bpf_skc_to_*()
helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that
they will accept pointer obtained from skb->sk.
Instead of specifying both arg_type and arg_btf_id in the same func_proto
which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of
the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the
compatible_reg_types[] in verifier.c. The reason is the arg_btf_id is
always the same. Discussion in this thread:
https://lore.kernel.org/bpf/20200922070422.1917351-1-kafai@fb.com/
The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is
expecting a PTR_TO_BTF_ID which could be NULL. This is the same
behavior as the existing helper taking ARG_PTR_TO_BTF_ID.
The _SOCK_COMMON part means the helper is also expecting the legacy
SOCK_COMMON pointer.
By excluding the _OR_NULL part, the bpf prog cannot call helper
with a literal NULL which doesn't make sense in most cases.
e.g. bpf_skc_to_tcp_sock(NULL) will be rejected. All PTR_TO_*_OR_NULL
reg has to do a NULL check first before passing into the helper or else
the bpf prog will be rejected. This behavior is nothing new and
consistent with the current expectation during bpf-prog-load.
[ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace
ARG_PTR_TO_SOCK* of other existing helpers later such that
those existing helpers can take the PTR_TO_BTF_ID returned by
the bpf_skc_to_*() helpers.
The only special case is bpf_sk_lookup_assign() which can accept a
literal NULL ptr. It has to be handled specially in another follow
up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL
to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ]
[ When converting the older helpers that take ARG_PTR_TO_SOCK* in
the later patch, if the kernel does not support BTF,
ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON
because no reg->type could have PTR_TO_BTF_ID in this case.
It is not a concern for the newer-btf-only helper like the bpf_skc_to_*()
here though because these helpers must require BTF vmlinux to begin
with. ]
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200925000350.3855720-1-kafai@fb.com
The mapping between bpf_arg_type and bpf_reg_type is encoded in a big
hairy if statement that is hard to follow. The debug output also leaves
to be desired: if a reg_type doesn't match we only print one of the
options, instead printing all the valid ones.
Convert the if statement into a table which is then used to drive type
checking. If none of the reg_types match we print all options, e.g.:
R2 type=rdonly_buf expected=fp, pkt, pkt_meta, map_value
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-12-lmb@cloudflare.com
Function prototypes using ARG_PTR_TO_BTF_ID currently use two ways to signal
which BTF IDs are acceptable. First, bpf_func_proto.btf_id is an array of
IDs, one for each argument. This array is only accessed up to the highest
numbered argument that uses ARG_PTR_TO_BTF_ID and may therefore be less than
five arguments long. It usually points at a BTF_ID_LIST. Second, check_btf_id
is a function pointer that is called by the verifier if present. It gets the
actual BTF ID of the register, and the argument number we're currently checking.
It turns out that the only user check_arg_btf_id ignores the argument, and is
simply used to check whether the BTF ID has a struct sock_common at it's start.
Replace both of these mechanisms with an explicit BTF ID for each argument
in a function proto. Thanks to btf_struct_ids_match this is very flexible:
check_arg_btf_id can be replaced by requiring struct sock_common.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-5-lmb@cloudflare.com
bsearch doesn't modify the contents of the array, so we can take a const pointer.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-2-lmb@cloudflare.com
This commit serves two things:
1) it optimizes BPF prologue/epilogue generation
2) it makes possible to have tailcalls within BPF subprogram
Both points are related to each other since without 1), 2) could not be
achieved.
In [1], Alexei says:
"The prologue will look like:
nop5
xor eax,eax // two new bytes if bpf_tail_call() is used in this
// function
push rbp
mov rbp, rsp
sub rsp, rounded_stack_depth
push rax // zero init tail_call counter
variable number of push rbx,r13,r14,r15
Then bpf_tail_call will pop variable number rbx,..
and final 'pop rax'
Then 'add rsp, size_of_current_stack_frame'
jmp to next function and skip over 'nop5; xor eax,eax; push rpb; mov
rbp, rsp'
This way new function will set its own stack size and will init tail
call
counter with whatever value the parent had.
If next function doesn't use bpf_tail_call it won't have 'xor eax,eax'.
Instead it would need to have 'nop2' in there."
Implement that suggestion.
Since the layout of stack is changed, tail call counter handling can not
rely anymore on popping it to rbx just like it have been handled for
constant prologue case and later overwrite of rbx with actual value of
rbx pushed to stack. Therefore, let's use one of the register (%rcx) that
is considered to be volatile/caller-saved and pop the value of tail call
counter in there in the epilogue.
Drop the BUILD_BUG_ON in emit_prologue and in
emit_bpf_tail_call_indirect where instruction layout is not constant
anymore.
Introduce new poke target, 'tailcall_bypass' to poke descriptor that is
dedicated for skipping the register pops and stack unwind that are
generated right before the actual jump to target program.
For case when the target program is not present, BPF program will skip
the pop instructions and nop5 dedicated for jmpq $target. An example of
such state when only R6 of callee saved registers is used by program:
ffffffffc0513aa1: e9 0e 00 00 00 jmpq 0xffffffffc0513ab4
ffffffffc0513aa6: 5b pop %rbx
ffffffffc0513aa7: 58 pop %rax
ffffffffc0513aa8: 48 81 c4 00 00 00 00 add $0x0,%rsp
ffffffffc0513aaf: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffffc0513ab4: 48 89 df mov %rbx,%rdi
When target program is inserted, the jump that was there to skip
pops/nop5 will become the nop5, so CPU will go over pops and do the
actual tailcall.
One might ask why there simply can not be pushes after the nop5?
In the following example snippet:
ffffffffc037030c: 48 89 fb mov %rdi,%rbx
(...)
ffffffffc0370332: 5b pop %rbx
ffffffffc0370333: 58 pop %rax
ffffffffc0370334: 48 81 c4 00 00 00 00 add $0x0,%rsp
ffffffffc037033b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffffc0370340: 48 81 ec 00 00 00 00 sub $0x0,%rsp
ffffffffc0370347: 50 push %rax
ffffffffc0370348: 53 push %rbx
ffffffffc0370349: 48 89 df mov %rbx,%rdi
ffffffffc037034c: e8 f7 21 00 00 callq 0xffffffffc0372548
There is the bpf2bpf call (at ffffffffc037034c) right after the tailcall
and jump target is not present. ctx is in %rbx register and BPF
subprogram that we will call into on ffffffffc037034c is relying on it,
e.g. it will pick ctx from there. Such code layout is therefore broken
as we would overwrite the content of %rbx with the value that was pushed
on the prologue. That is the reason for the 'bypass' approach.
Special care needs to be taken during the install/update/remove of
tailcall target. In case when target program is not present, the CPU
must not execute the pop instructions that precede the tailcall.
To address that, the following states can be defined:
A nop, unwind, nop
B nop, unwind, tail
C skip, unwind, nop
D skip, unwind, tail
A is forbidden (lead to incorrectness). The state transitions between
tailcall install/update/remove will work as follows:
First install tail call f: C->D->B(f)
* poke the tailcall, after that get rid of the skip
Update tail call f to f': B(f)->B(f')
* poke the tailcall (poke->tailcall_target) and do NOT touch the
poke->tailcall_bypass
Remove tail call: B(f')->C(f')
* poke->tailcall_bypass is poked back to jump, then we wait the RCU
grace period so that other programs will finish its execution and
after that we are safe to remove the poke->tailcall_target
Install new tail call (f''): C(f')->D(f'')->B(f'').
* same as first step
This way CPU can never be exposed to "unwind, tail" state.
Last but not least, when tailcalls get mixed with bpf2bpf calls, it
would be possible to encounter the endless loop due to clearing the
tailcall counter if for example we would use the tailcall3-like from BPF
selftests program that would be subprogram-based, meaning the tailcall
would be present within the BPF subprogram.
This test, broken down to particular steps, would do:
entry -> set tailcall counter to 0, bump it by 1, tailcall to func0
func0 -> call subprog_tail
(we are NOT skipping the first 11 bytes of prologue and this subprogram
has a tailcall, therefore we clear the counter...)
subprog -> do the same thing as entry
and then loop forever.
To address this, the idea is to go through the call chain of bpf2bpf progs
and look for a tailcall presence throughout whole chain. If we saw a single
tail call then each node in this call chain needs to be marked as a subprog
that can reach the tailcall. We would later feed the JIT with this info
and:
- set eax to 0 only when tailcall is reachable and this is the entry prog
- if tailcall is reachable but there's no tailcall in insns of currently
JITed prog then push rax anyway, so that it will be possible to
propagate further down the call chain
- finally if tailcall is reachable, then we need to precede the 'call'
insn with mov rax, [rbp - (stack_depth + 8)]
Tail call related cases from test_verifier kselftest are also working
fine. Sample BPF programs that utilize tail calls (sockex3, tracex5)
work properly as well.
[1]: https://lore.kernel.org/bpf/20200517043227.2gpq22ifoq37ogst@ast-mbp.dhcp.thefacebook.com/
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reflect the actual purpose of poke->ip and rename it to
poke->tailcall_target so that it will not the be confused with another
poke target that will be introduced in next commit.
While at it, do the same thing with poke->ip_stable - rename it to
poke->tailcall_target_stable.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Previously, there was no need for poke descriptors being present in
subprogram's bpf_prog_aux struct since tailcalls were simply not allowed
in them. Each subprog is JITed independently so in order to enable
JITing subprograms that use tailcalls, do the following:
- in fixup_bpf_calls() store the index of tailcall insn onto the generated
poke descriptor,
- in case when insn patching occurs, adjust the tailcall insn idx from
bpf_patch_insn_data,
- then in jit_subprogs() check whether the given poke descriptor belongs
to the current subprog by checking if that previously stored absolute
index of tail call insn is in the scope of the insns of given subprog,
- update the insn->imm with new poke descriptor slot so that while JITing
the proper poke descriptor will be grabbed
This way each of the main program's poke descriptors are distributed
across the subprograms poke descriptor array, so main program's
descriptors can be untracked out of the prog array map.
Add also subprog's aux struct to the BPF map poke_progs list by calling
on it map_poke_track().
In case of any error, call the map_poke_untrack() on subprog's aux
structs that have already been registered to prog array map.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
To support modifying the used_maps array, we use a mutex to protect
the use of the counter and the array. The mutex is initialized right
after the prog aux is allocated, and destroyed right before prog
aux is freed. This way we guarantee it's initialized for both cBPF
and eBPF.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Link: https://lore.kernel.org/bpf/20200915234543.3220146-2-sdf@google.com
Sleepable BPF programs can now use copy_from_user() to access user memory.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200827220114.69225-4-alexei.starovoitov@gmail.com
Introduce sleepable BPF programs that can request such property for themselves
via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
to use helpers like bpf_copy_from_user() that might sleep. At present only
fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
when they are attached to kernel functions that are known to allow sleeping.
The non-sleepable programs are relying on implicit rcu_read_lock() and
migrate_disable() to protect life time of programs, maps that they use and
per-cpu kernel structures used to pass info between bpf programs and the
kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
should not be enclosed in migrate_disable() as well. Therefore
rcu_read_lock_trace is used to protect the life time of sleepable progs.
There are many networking and tracing program types. In many cases the
'struct bpf_prog *' pointer itself is rcu protected within some other kernel
data structure and the kernel code is using rcu_dereference() to load that
program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
Instead sleepable bpf programs are allowed with bpf trampoline only. The
program pointers are hard-coded into generated assembly of bpf trampoline and
synchronize_rcu_tasks_trace() is used to protect the life time of the program.
The same trampoline can hold both sleepable and non-sleepable progs.
When rcu_read_lock_trace is held it means that some sleepable bpf program is
running from bpf trampoline. Those programs can use bpf arrays and preallocated
hash/lru maps. These map types are waiting on programs to complete via
synchronize_rcu_tasks_trace();
Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
synchronize_rcu_tasks() to wait for sleepable progs to finish and for
trampoline assembly to finish.
This is the first step of introducing sleepable progs. Eventually dynamically
allocated hash maps can be allowed and networking program types can become
sleepable too.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
Some properties of the inner map is used in the verification time.
When an inner map is inserted to an outer map at runtime,
bpf_map_meta_equal() is currently used to ensure those properties
of the inserting inner map stays the same as the verification
time.
In particular, the current bpf_map_meta_equal() checks max_entries which
turns out to be too restrictive for most of the maps which do not use
max_entries during the verification time. It limits the use case that
wants to replace a smaller inner map with a larger inner map. There are
some maps do use max_entries during verification though. For example,
the map_gen_lookup in array_map_ops uses the max_entries to generate
the inline lookup code.
To accommodate differences between maps, the map_meta_equal is added
to bpf_map_ops. Each map-type can decide what to check when its
map is used as an inner map during runtime.
Also, some map types cannot be used as an inner map and they are
currently black listed in bpf_map_meta_alloc() in map_in_map.c.
It is not unusual that the new map types may not aware that such
blacklist exists. This patch enforces an explicit opt-in
and only allows a map to be used as an inner map if it has
implemented the map_meta_equal ops. It is based on the
discussion in [1].
All maps that support inner map has its map_meta_equal points
to bpf_map_meta_equal in this patch. A later patch will
relax the max_entries check for most maps. bpf_types.h
counts 28 map types. This patch adds 23 ".map_meta_equal"
by using coccinelle. -5 for
BPF_MAP_TYPE_PROG_ARRAY
BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE
BPF_MAP_TYPE_STRUCT_OPS
BPF_MAP_TYPE_ARRAY_OF_MAPS
BPF_MAP_TYPE_HASH_OF_MAPS
The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc()
is moved such that the same error is returned.
[1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
Adding support to define sorted set of BTF ID values.
Following defines sorted set of BTF ID values:
BTF_SET_START(btf_allowlist_d_path)
BTF_ID(func, vfs_truncate)
BTF_ID(func, vfs_fallocate)
BTF_ID(func, dentry_open)
BTF_ID(func, vfs_getattr)
BTF_ID(func, filp_close)
BTF_SET_END(btf_allowlist_d_path)
It defines following 'struct btf_id_set' variable to access
values and count:
struct btf_id_set btf_allowlist_d_path;
Adding 'allowed' callback to struct bpf_func_proto, to allow
verifier the check on allowed callers.
Adding btf_id_set_contains function, which will be used by
allowed callbacks to verify the caller's BTF ID value is
within allowed set.
Also removing extra '\' in __BTF_ID_LIST macro.
Added BTF_SET_START_GLOBAL macro for global sets.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200825192124.710397-10-jolsa@kernel.org
Adding btf_struct_ids_match function to check if given address provided
by BTF object + offset is also address of another nested BTF object.
This allows to pass an argument to helper, which is defined via parent
BTF object + offset, like for bpf_d_path (added in following changes):
SEC("fentry/filp_close")
int BPF_PROG(prog_close, struct file *file, void *id)
{
...
ret = bpf_d_path(&file->f_path, ...
The first bpf_d_path argument is hold by verifier as BTF file object
plus offset of f_path member.
The btf_struct_ids_match function will walk the struct file object and
check if there's nested struct path object on the given offset.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200825192124.710397-9-jolsa@kernel.org
Refactor the functionality in bpf_sk_storage.c so that concept of
storage linked to kernel objects can be extended to other objects like
inode, task_struct etc.
Each new local storage will still be a separate map and provide its own
set of helpers. This allows for future object specific extensions and
still share a lot of the underlying implementation.
This includes the changes suggested by Martin in:
https://lore.kernel.org/bpf/20200725013047.4006241-1-kafai@fb.com/
adding new map operations to support bpf_local_storage maps:
* storages for different kernel objects to optionally have different
memory charging strategy (map_local_storage_charge,
map_local_storage_uncharge)
* Functionality to extract the storage pointer from a pointer to the
owning object (map_owner_storage_ptr)
Co-developed-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200825182919.1118197-4-kpsingh@chromium.org
Don't go via map->ops to call sock_map_update_elem, since we know
what function to call in bpf_map_update_value. Since we currently
don't allow calling map_update_elem from BPF context, we can remove
ops->map_update_elem and rename the function to sock_map_update_elem_sys.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200821102948.21918-4-lmb@cloudflare.com
For bpf_map_elem and bpf_sk_local_storage bpf iterators,
additional map_id should be shown for fdinfo and
userspace query. For example, the following is for
a bpf_map_elem iterator.
$ cat /proc/1753/fdinfo/9
pos: 0
flags: 02000000
mnt_id: 14
link_type: iter
link_id: 34
prog_tag: 104be6d3fe45e6aa
prog_id: 173
target_name: bpf_map_elem
map_id: 127
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200821184419.574240-1-yhs@fb.com
This patch implemented bpf_link callback functions
show_fdinfo and fill_link_info to support link_query
interface.
The general interface for show_fdinfo and fill_link_info
will print/fill the target_name. Each targets can
register show_fdinfo and fill_link_info callbacks
to print/fill more target specific information.
For example, the below is a fdinfo result for a bpf
task iterator.
$ cat /proc/1749/fdinfo/7
pos: 0
flags: 02000000
mnt_id: 14
link_type: iter
link_id: 11
prog_tag: 990e1f8152f7e54f
prog_id: 59
target_name: task
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200821184418.574122-1-yhs@fb.com
Refactor the code a bit to extract bpf_link_by_id() helper.
It's similar to existing bpf_prog_by_id().
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200819042759.51280-2-alexei.starovoitov@gmail.com
Commit a5cbe05a66 ("bpf: Implement bpf iterator for
map elements") added bpf iterator support for
map elements. The map element bpf iterator requires
info to identify a particular map. In the above
commit, the attr->link_create.target_fd is used
to carry map_fd and an enum bpf_iter_link_info
is added to uapi to specify the target_fd actually
representing a map_fd:
enum bpf_iter_link_info {
BPF_ITER_LINK_UNSPEC = 0,
BPF_ITER_LINK_MAP_FD = 1,
MAX_BPF_ITER_LINK_INFO,
};
This is an extensible approach as we can grow
enumerator for pid, cgroup_id, etc. and we can
unionize target_fd for pid, cgroup_id, etc.
But in the future, there are chances that
more complex customization may happen, e.g.,
for tasks, it could be filtered based on
both cgroup_id and user_id.
This patch changed the uapi to have fields
__aligned_u64 iter_info;
__u32 iter_info_len;
for additional iter_info for link_create.
The iter_info is defined as
union bpf_iter_link_info {
struct {
__u32 map_fd;
} map;
};
So future extension for additional customization
will be easier. The bpf_iter_link_info will be
passed to target callback to validate and generic
bpf_iter framework does not need to deal it any
more.
Note that map_fd = 0 will be considered invalid
and -EBADF will be returned to user space.
Fixes: a5cbe05a66 ("bpf: Implement bpf iterator for map elements")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com
Add LINK_DETACH command to force-detach bpf_link without destroying it. It has
the same behavior as auto-detaching of bpf_link due to cgroup dying for
bpf_cgroup_link or net_device being destroyed for bpf_xdp_link. In such case,
bpf_link is still a valid kernel object, but is defuncts and doesn't hold BPF
program attached to corresponding BPF hook. This functionality allows users
with enough access rights to manually force-detach attached bpf_link without
killing respective owner process.
This patch implements LINK_DETACH for cgroup, xdp, and netns links, mostly
re-using existing link release handling code.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-2-andriin@fb.com
Similarly to bpf_prog, make bpf_link and related generic API available
unconditionally to make it easier to have bpf_link support in various parts of
the kernel. Stub out init/prime/settle/cleanup and inc/put APIs.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-2-andriin@fb.com
Calling get_perf_callchain() on perf_events from PEBS entries may cause
unwinder errors. To fix this issue, the callchain is fetched early. Such
perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
also cause unwinder errors. To fix this, add separate version of these
two helpers, bpf_get_[stack|stackid]_pe. These two hepers use callchain in
bpf_perf_event_data_kern->data->callchain.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-2-songliubraving@fb.com
The bpf iterator for map elements are implemented.
The bpf program will receive four parameters:
bpf_iter_meta *meta: the meta data
bpf_map *map: the bpf_map whose elements are traversed
void *key: the key of one element
void *value: the value of the same element
Here, meta and map pointers are always valid, and
key has register type PTR_TO_RDONLY_BUF_OR_NULL and
value has register type PTR_TO_RDWR_BUF_OR_NULL.
The kernel will track the access range of key and value
during verification time. Later, these values will be compared
against the values in the actual map to ensure all accesses
are within range.
A new field iter_seq_info is added to bpf_map_ops which
is used to add map type specific information, i.e., seq_ops,
init/fini seq_file func and seq_file private data size.
Subsequent patches will have actual implementation
for bpf_map_ops->iter_seq_info.
In user space, BPF_ITER_LINK_MAP_FD needs to be
specified in prog attr->link_create.flags, which indicates
that attr->link_create.target_fd is a map_fd.
The reason for such an explicit flag is for possible
future cases where one bpf iterator may allow more than
one possible customization, e.g., pid and cgroup id for
task_file.
Current kernel internal implementation only allows
the target to register at most one required bpf_iter_link_info.
To support the above case, optional bpf_iter_link_info's
are needed, the target can be extended to register such link
infos, and user provided link_info needs to match one of
target supported ones.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184112.590360-1-yhs@fb.com
Readonly and readwrite buffer register states
are introduced. Totally four states,
PTR_TO_RDONLY_BUF[_OR_NULL] and PTR_TO_RDWR_BUF[_OR_NULL]
are supported. As suggested by their respective
names, PTR_TO_RDONLY_BUF[_OR_NULL] are for
readonly buffers and PTR_TO_RDWR_BUF[_OR_NULL]
for read/write buffers.
These new register states will be used
by later bpf map element iterator.
New register states share some similarity to
PTR_TO_TP_BUFFER as it will calculate accessed buffer
size during verification time. The accessed buffer
size will be later compared to other metrics during
later attach/link_create time.
Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf
iterator programs, PTR_TO_RDONLY_BUF_OR_NULL or
PTR_TO_RDWR_BUF_OR_NULL reg_types can be set at
prog->aux->bpf_ctx_arg_aux, and bpf verifier will
retrieve the values during btf_ctx_access().
Later bpf map element iterator implementation
will show how such information will be assigned
during target registeration time.
The verifier is also enhanced such that PTR_TO_RDONLY_BUF
can be passed to ARG_PTR_TO_MEM[_OR_NULL] helper argument, and
PTR_TO_RDWR_BUF can be passed to ARG_PTR_TO_MEM[_OR_NULL] or
ARG_PTR_TO_UNINIT_MEM.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184111.590274-1-yhs@fb.com
This patch refactored target bpf_iter_init_seq_priv_t callback
function to accept additional information. This will be needed
in later patches for map element targets since a particular
map should be passed to traverse elements for that particular
map. In the future, other information may be passed to target
as well, e.g., pid, cgroup id, etc. to customize the iterator.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com
There is no functionality change for this patch.
Struct bpf_iter_reg is used to register a bpf_iter target,
which includes information for both prog_load, link_create
and seq_file creation.
This patch puts fields related seq_file creation into
a different structure. This will be useful for map
elements iterator where one iterator covers different
map types and different map types may have different
seq_ops, init/fini private_data function and
private_data size.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184109.590030-1-yhs@fb.com
It's mostly a copy paste of commit 6086d29def ("bpf: Add bpf_map iterator")
that is use to implement bpf_seq_file opreations to traverse all bpf programs.
v1->v2: Tweak to use build time btf_id
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
One additional field btf_id is added to struct
bpf_ctx_arg_aux to store the precomputed btf_ids.
The btf_id is computed at build time with
BTF_ID_LIST or BTF_ID_LIST_GLOBAL macro definitions.
All existing bpf iterators are changed to used
pre-compute btf_ids.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163403.1393551-1-yhs@fb.com
Currently, socket types (struct tcp_sock, udp_sock, etc.)
used by bpf_skc_to_*() helpers are computed when vmlinux_btf
is first built in the kernel.
Commit 5a2798ab32
("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros")
implemented a mechanism to compute btf_ids at kernel build
time which can simplify kernel implementation and reduce
runtime overhead by removing in-kernel btf_id calculation.
This patch did exactly this, removing in-kernel btf_id
computation and utilizing build-time btf_id computation.
If CONFIG_DEBUG_INFO_BTF is not defined, BTF_ID_LIST will
define an array with size of 5, which is not enough for
btf_sock_ids. So define its own static array if
CONFIG_DEBUG_INFO_BTF is not defined.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163358.1393023-1-yhs@fb.com
Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
when looking up a listening socket for a new connection request for
connection oriented protocols, or when looking up an unconnected socket for
a packet for connection-less protocols.
When called, SK_LOOKUP BPF program can select a socket that will receive
the packet. This serves as a mechanism to overcome the limits of what
bind() API allows to express. Two use-cases driving this work are:
(1) steer packets destined to an IP range, on fixed port to a socket
192.0.2.0/24, port 80 -> NGINX socket
(2) steer packets destined to an IP address, on any port to a socket
198.51.100.1, any port -> L7 proxy socket
In its run-time context program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.
To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.
In its basic form, SK_LOOKUP acts as a filter and hence must return either
SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
look for a socket to receive the packet, or use the one selected by the
program if available, while SK_DROP informs the transport layer that the
lookup should fail.
This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.
Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the
prog_array at given position when link gets attached/updated/released.
This let's us lift the limit of having just one link attached for the new
attach type introduced by subsequent patch.
No functional changes intended.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-2-jakub@cloudflare.com
Introduce the capability to attach an eBPF program to cpumap entries.
The idea behind this feature is to add the possibility to define on
which CPU run the eBPF program if the underlying hw does not support
RSS. Current supported verdicts are XDP_DROP and XDP_PASS.
This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu
sample available in the kernel tree to identify possible performance
regressions. Results show there are no observable differences in
packet-per-second:
$./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1
rx: 354.8 Kpps
rx: 356.0 Kpps
rx: 356.8 Kpps
rx: 356.3 Kpps
rx: 356.6 Kpps
rx: 356.6 Kpps
rx: 356.7 Kpps
rx: 355.8 Kpps
rx: 356.8 Kpps
rx: 356.8 Kpps
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org
Introduce helper bpf_get_task_stack(), which dumps stack trace of given
task. This is different to bpf_get_stack(), which gets stack track of
current task. One potential use case of bpf_get_task_stack() is to call
it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file.
bpf_get_task_stack() uses stack_trace_save_tsk() instead of
get_perf_callchain() for kernel stack. The benefit of this choice is that
stack_trace_save_tsk() doesn't require changes in arch/. The downside of
using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the
stack trace to unsigned long array. For 32-bit systems, we need to
translate it to u64 array.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200630062846.664389-3-songliubraving@fb.com
The sockmap code currently ignores the value of attach_bpf_fd when
detaching a program. This is contrary to the usual behaviour of
checking that attach_bpf_fd represents the currently attached
program.
Ensure that attach_bpf_fd is indeed the currently attached
program. It turns out that all sockmap selftests already do this,
which indicates that this is unlikely to cause breakage.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
The helper is used in tracing programs to cast a socket
pointer to a udp6_sock pointer.
The return value could be NULL if the casting is illegal.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20200623230815.3988481-1-yhs@fb.com