Try to mitigate potential future driver core api changes by adding a
padding to a lot of different networking structures:
struct ipv6_devconf
struct proto_ops
struct header_ops
struct napi_struct
struct netdev_queue
struct netdev_rx_queue
struct xfrmdev_ops
struct net_device_ops
struct net_device
struct packet_type
struct sk_buff
struct tlsdev_ops
Based on a change made to the RHEL/CENTOS 8 kernel.
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I590f004754dbc8beafa40e71cac70a0938c38b4a
Changes in 5.10.20
vmlinux.lds.h: add DWARF v5 sections
vdpa/mlx5: fix param validation in mlx5_vdpa_get_config()
debugfs: be more robust at handling improper input in debugfs_lookup()
debugfs: do not attempt to create a new file before the filesystem is initalized
scsi: libsas: docs: Remove notify_ha_event()
scsi: qla2xxx: Fix mailbox Ch erroneous error
kdb: Make memory allocations more robust
w1: w1_therm: Fix conversion result for negative temperatures
PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
PCI: Decline to resize resources if boot config must be preserved
virt: vbox: Do not use wait_event_interruptible when called from kernel context
bfq: Avoid false bfq queue merging
ALSA: usb-audio: Fix PCM buffer allocation in non-vmalloc mode
MIPS: vmlinux.lds.S: add missing PAGE_ALIGNED_DATA() section
vmlinux.lds.h: Define SANTIZER_DISCARDS with CONFIG_GCOV_KERNEL=y
random: fix the RNDRESEEDCRNG ioctl
ALSA: pcm: Call sync_stop at disconnection
ALSA: pcm: Assure sync with the pending stop operation at suspend
ALSA: pcm: Don't call sync_stop if it hasn't been stopped
drm/i915/gt: One more flush for Baytrail clear residuals
ath10k: Fix error handling in case of CE pipe init failure
Bluetooth: btqcomsmd: Fix a resource leak in error handling paths in the probe function
Bluetooth: hci_uart: Fix a race for write_work scheduling
Bluetooth: Fix initializing response id after clearing struct
arm64: dts: renesas: beacon kit: Fix choppy Bluetooth Audio
arm64: dts: renesas: beacon: Fix audio-1.8V pin enable
ARM: dts: exynos: correct PMIC interrupt trigger level on Artik 5
ARM: dts: exynos: correct PMIC interrupt trigger level on Monk
ARM: dts: exynos: correct PMIC interrupt trigger level on Rinato
ARM: dts: exynos: correct PMIC interrupt trigger level on Spring
ARM: dts: exynos: correct PMIC interrupt trigger level on Arndale Octa
ARM: dts: exynos: correct PMIC interrupt trigger level on Odroid XU3 family
arm64: dts: exynos: correct PMIC interrupt trigger level on TM2
arm64: dts: exynos: correct PMIC interrupt trigger level on Espresso
memory: mtk-smi: Fix PM usage counter unbalance in mtk_smi ops
Bluetooth: hci_qca: Fix memleak in qca_controller_memdump
staging: vchiq: Fix bulk userdata handling
staging: vchiq: Fix bulk transfers on 64-bit builds
arm64: dts: qcom: msm8916-samsung-a5u: Fix iris compatible
net: stmmac: dwmac-meson8b: fix enabling the timing-adjustment clock
bpf: Add bpf_patch_call_args prototype to include/linux/bpf.h
bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_args
firmware: arm_scmi: Fix call site of scmi_notification_exit
arm64: dts: allwinner: A64: properly connect USB PHY to port 0
arm64: dts: allwinner: H6: properly connect USB PHY to port 0
arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card
arm64: dts: allwinner: H6: Allow up to 150 MHz MMC bus frequency
arm64: dts: allwinner: A64: Limit MMC2 bus frequency to 150 MHz
arm64: dts: qcom: msm8916-samsung-a2015: Fix sensors
cpufreq: brcmstb-avs-cpufreq: Free resources in error path
cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove()
arm64: dts: rockchip: rk3328: Add clock_in_out property to gmac2phy node
ACPICA: Fix exception code class checks
usb: gadget: u_audio: Free requests only after callback
arm64: dts: qcom: sdm845-db845c: Fix reset-pin of ov8856 node
soc: qcom: socinfo: Fix an off by one in qcom_show_pmic_model()
soc: ti: pm33xx: Fix some resource leak in the error handling paths of the probe function
staging: media: atomisp: Fix size_t format specifier in hmm_alloc() debug statemenet
Bluetooth: drop HCI device reference before return
Bluetooth: Put HCI device if inquiry procedure interrupts
memory: ti-aemif: Drop child node when jumping out loop
ARM: dts: Configure missing thermal interrupt for 4430
usb: dwc2: Do not update data length if it is 0 on inbound transfers
usb: dwc2: Abort transaction after errors with unknown reason
usb: dwc2: Make "trimming xfer length" a debug message
staging: rtl8723bs: wifi_regd.c: Fix incorrect number of regulatory rules
x86/MSR: Filter MSR writes through X86_IOC_WRMSR_REGS ioctl too
arm64: dts: renesas: beacon: Fix EEPROM compatible value
can: mcp251xfd: mcp251xfd_probe(): fix errata reference
ARM: dts: armada388-helios4: assign pinctrl to LEDs
ARM: dts: armada388-helios4: assign pinctrl to each fan
arm64: dts: armada-3720-turris-mox: rename u-boot mtd partition to a53-firmware
opp: Correct debug message in _opp_add_static_v2()
Bluetooth: btusb: Fix memory leak in btusb_mtk_wmt_recv
soc: qcom: ocmem: don't return NULL in of_get_ocmem
arm64: dts: msm8916: Fix reserved and rfsa nodes unit address
arm64: dts: meson: fix broken wifi node for Khadas VIM3L
iwlwifi: mvm: set enabled in the PPAG command properly
ARM: s3c: fix fiq for clang IAS
optee: simplify i2c access
staging: wfx: fix possible panic with re-queued frames
ARM: at91: use proper asm syntax in pm_suspend
ath10k: Fix suspicious RCU usage warning in ath10k_wmi_tlv_parse_peer_stats_info()
ath10k: Fix lockdep assertion warning in ath10k_sta_statistics
ath11k: fix a locking bug in ath11k_mac_op_start()
soc: aspeed: snoop: Add clock control logic
iwlwifi: mvm: fix the type we use in the PPAG table validity checks
iwlwifi: mvm: store PPAG enabled/disabled flag properly
iwlwifi: mvm: send stored PPAG command instead of local
iwlwifi: mvm: assign SAR table revision to the command later
iwlwifi: mvm: don't check if CSA event is running before removing
bpf_lru_list: Read double-checked variable once without lock
iwlwifi: pnvm: set the PNVM again if it was already loaded
iwlwifi: pnvm: increment the pointer before checking the TLV
ath9k: fix data bus crash when setting nf_override via debugfs
selftests/bpf: Convert test_xdp_redirect.sh to bash
ibmvnic: Set to CLOSED state even on error
bnxt_en: reverse order of TX disable and carrier off
bnxt_en: Fix devlink info's stored fw.psid version format.
xen/netback: fix spurious event detection for common event case
dpaa2-eth: fix memory leak in XDP_REDIRECT
net: phy: consider that suspend2ram may cut off PHY power
net/mlx5e: Don't change interrupt moderation params when DIM is enabled
net/mlx5e: Change interrupt moderation channel params also when channels are closed
net/mlx5: Fix health error state handling
net/mlx5e: Replace synchronize_rcu with synchronize_net
net/mlx5e: kTLS, Use refcounts to free kTLS RX priv context
net/mlx5: Disable devlink reload for multi port slave device
net/mlx5: Disallow RoCE on multi port slave device
net/mlx5: Disallow RoCE on lag device
net/mlx5: Disable devlink reload for lag devices
net/mlx5e: CT: manage the lifetime of the ct entry object
net/mlx5e: Check tunnel offload is required before setting SWP
mac80211: fix potential overflow when multiplying to u32 integers
libbpf: Ignore non function pointer member in struct_ops
bpf: Fix an unitialized value in bpf_iter
bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocation
bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx
selftests: mptcp: fix ACKRX debug message
tcp: fix SO_RCVLOWAT related hangs under mem pressure
net: axienet: Handle deferred probe on clock properly
cxgb4/chtls/cxgbit: Keeping the max ofld immediate data size same in cxgb4 and ulds
b43: N-PHY: Fix the update of coef for the PHY revision >= 3case
bpf: Clear subreg_def for global function return values
ibmvnic: add memory barrier to protect long term buffer
ibmvnic: skip send_request_unmap for timeout reset
net: dsa: felix: perform teardown in reverse order of setup
net: dsa: felix: don't deinitialize unused ports
net: phy: mscc: adding LCPLL reset to VSC8514
net: amd-xgbe: Reset the PHY rx data path when mailbox command timeout
net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning
net: amd-xgbe: Reset link when the link never comes back
net: amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP
net: mvneta: Remove per-cpu queue mapping for Armada 3700
net: enetc: fix destroyed phylink dereference during unbind
tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer
tty: implement read_iter
fbdev: aty: SPARC64 requires FB_ATY_CT
drm/gma500: Fix error return code in psb_driver_load()
gma500: clean up error handling in init
drm/fb-helper: Add missed unlocks in setcmap_legacy()
drm/panel: mantix: Tweak init sequence
drm/vc4: hdmi: Take into account the clock doubling flag in atomic_check
crypto: sun4i-ss - linearize buffers content must be kept
crypto: sun4i-ss - fix kmap usage
crypto: arm64/aes-ce - really hide slower algos when faster ones are enabled
hwrng: ingenic - Fix a resource leak in an error handling path
media: allegro: Fix use after free on error
kcsan: Rewrite kcsan_prandom_u32_max() without prandom_u32_state()
drm: rcar-du: Fix PM reference leak in rcar_cmm_enable()
drm: rcar-du: Fix crash when using LVDS1 clock for CRTC
drm: rcar-du: Fix the return check of of_parse_phandle and of_find_device_by_node
drm/amdgpu: Fix macro name _AMDGPU_TRACE_H_ in preprocessor if condition
MIPS: c-r4k: Fix section mismatch for loongson2_sc_init
MIPS: lantiq: Explicitly compare LTQ_EBU_PCC_ISTAT against 0
drm/virtio: make sure context is created in gem open
drm/fourcc: fix Amlogic format modifier masks
media: ipu3-cio2: Build only for x86
media: i2c: ov5670: Fix PIXEL_RATE minimum value
media: imx: Unregister csc/scaler only if registered
media: imx: Fix csc/scaler unregister
media: mtk-vcodec: fix error return code in vdec_vp9_decode()
media: camss: missing error code in msm_video_register()
media: vsp1: Fix an error handling path in the probe function
media: em28xx: Fix use-after-free in em28xx_alloc_urbs
media: media/pci: Fix memleak in empress_init
media: tm6000: Fix memleak in tm6000_start_stream
media: aspeed: fix error return code in aspeed_video_setup_video()
ASoC: cs42l56: fix up error handling in probe
ASoC: qcom: qdsp6: Move frontend AIFs to q6asm-dai
evm: Fix memleak in init_desc
crypto: bcm - Rename struct device_private to bcm_device_private
sched/fair: Avoid stale CPU util_est value for schedutil in task dequeue
drm/sun4i: tcon: fix inverted DCLK polarity
media: imx7: csi: Fix regression for parallel cameras on i.MX6UL
media: imx7: csi: Fix pad link validation
media: ti-vpe: cal: fix write to unallocated memory
MIPS: properly stop .eh_frame generation
MIPS: Compare __SYNC_loongson3_war against 0
drm/tegra: Fix reference leak when pm_runtime_get_sync() fails
drm/amdgpu: toggle on DF Cstate after finishing xgmi injection
bsg: free the request before return error code
macintosh/adb-iop: Use big-endian autopoll mask
drm/amd/display: Fix 10/12 bpc setup in DCE output bit depth reduction.
drm/amd/display: Fix HDMI deep color output for DCE 6-11.
media: software_node: Fix refcounts in software_node_get_next_child()
media: lmedm04: Fix misuse of comma
media: vidtv: psi: fix missing crc for PMT
media: atomisp: Fix a buffer overflow in debug code
media: qm1d1c0042: fix error return code in qm1d1c0042_init()
media: cx25821: Fix a bug when reallocating some dma memory
media: mtk-vcodec: fix argument used when DEBUG is defined
media: pxa_camera: declare variable when DEBUG is defined
media: uvcvideo: Accept invalid bFormatIndex and bFrameIndex values
sched/eas: Don't update misfit status if the task is pinned
f2fs: compress: fix potential deadlock
ASoC: qcom: lpass-cpu: Remove bit clock state check
ASoC: SOF: Intel: hda: cancel D0i3 work during runtime suspend
perf/arm-cmn: Fix PMU instance naming
perf/arm-cmn: Move IRQs when migrating context
mtd: parser: imagetag: fix error codes in bcm963xx_parse_imagetag_partitions()
crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
crypto: talitos - Fix ctr(aes) on SEC1
drm/nouveau: bail out of nouveau_channel_new if channel init fails
mm: proc: Invalidate TLB after clearing soft-dirty page state
ata: ahci_brcm: Add back regulators management
ASoC: cpcap: fix microphone timeslot mask
ASoC: codecs: add missing max_register in regmap config
mtd: parsers: afs: Fix freeing the part name memory in failure
f2fs: fix to avoid inconsistent quota data
drm/amdgpu: Prevent shift wrapping in amdgpu_read_mask()
f2fs: fix a wrong condition in __submit_bio
ASoC: qcom: Fix typo error in HDMI regmap config callbacks
KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs
drm/mediatek: Check if fb is null
Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind()
ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A5E
ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A3E
locking/lockdep: Avoid unmatched unlock
ASoC: qcom: lpass: Fix i2s ctl register bit map
ASoC: rt5682: Fix panic in rt5682_jack_detect_handler happening during system shutdown
ASoC: SOF: debug: Fix a potential issue on string buffer termination
btrfs: clarify error returns values in __load_free_space_cache
btrfs: fix double accounting of ordered extent for subpage case in btrfs_invalidapge
KVM: x86: Restore all 64 bits of DR6 and DR7 during RSM on x86-64
s390/zcrypt: return EIO when msg retry limit reached
drm/vc4: hdmi: Move hdmi reset to bind
drm/vc4: hdmi: Fix register offset with longer CEC messages
drm/vc4: hdmi: Fix up CEC registers
drm/vc4: hdmi: Restore cec physical address on reconnect
drm/vc4: hdmi: Compute the CEC clock divider from the clock rate
drm/vc4: hdmi: Update the CEC clock divider on HSM rate change
drm/lima: fix reference leak in lima_pm_busy
drm/dp_mst: Don't cache EDIDs for physical ports
hwrng: timeriomem - Fix cooldown period calculation
crypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key()
io_uring: fix possible deadlock in io_uring_poll
nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUs
nvmet-tcp: fix potential race of tcp socket closing accept_work
nvme-multipath: set nr_zones for zoned namespaces
nvmet: remove extra variable in identify ns
nvmet: set status to 0 in case for invalid nsid
ASoC: SOF: sof-pci-dev: add missing Up-Extreme quirk
ima: Free IMA measurement buffer on error
ima: Free IMA measurement buffer after kexec syscall
ASoC: simple-card-utils: Fix device module clock
fs/jfs: fix potential integer overflow on shift of a int
jffs2: fix use after free in jffs2_sum_write_data()
ubifs: Fix memleak in ubifs_init_authentication
ubifs: replay: Fix high stack usage, again
ubifs: Fix error return code in alloc_wbufs()
irqchip/imx: IMX_INTMUX should not default to y, unconditionally
smp: Process pending softirqs in flush_smp_call_function_from_idle()
drm/amdgpu/display: remove hdcp_srm sysfs on device removal
capabilities: Don't allow writing ambiguous v3 file capabilities
HSI: Fix PM usage counter unbalance in ssi_hw_init
power: supply: cpcap: Add missing IRQF_ONESHOT to fix regression
clk: meson: clk-pll: fix initializing the old rate (fallback) for a PLL
clk: meson: clk-pll: make "ret" a signed integer
clk: meson: clk-pll: propagate the error from meson_clk_pll_set_rate()
selftests/powerpc: Make the test check in eeh-basic.sh posix compliant
regulator: qcom-rpmh-regulator: add pm8009-1 chip revision
arm64: dts: qcom: qrb5165-rb5: fix pm8009 regulators
quota: Fix memory leak when handling corrupted quota file
i2c: iproc: handle only slave interrupts which are enabled
i2c: iproc: update slave isr mask (ISR_MASK_SLAVE)
i2c: iproc: handle master read request
spi: cadence-quadspi: Abort read if dummy cycles required are too many
clk: sunxi-ng: h6: Fix CEC clock
clk: renesas: r8a779a0: Remove non-existent S2 clock
clk: renesas: r8a779a0: Fix parent of CBFUSA clock
HID: core: detect and skip invalid inputs to snto32()
RDMA/siw: Fix handling of zero-sized Read and Receive Queues.
dmaengine: fsldma: Fix a resource leak in the remove function
dmaengine: fsldma: Fix a resource leak in an error handling path of the probe function
dmaengine: owl-dma: Fix a resource leak in the remove function
dmaengine: hsu: disable spurious interrupt
mfd: bd9571mwv: Use devm_mfd_add_devices()
power: supply: cpcap-charger: Fix missing power_supply_put()
power: supply: cpcap-battery: Fix missing power_supply_put()
power: supply: cpcap-charger: Fix power_supply_put on null battery pointer
fdt: Properly handle "no-map" field in the memory region
of/fdt: Make sure no-map does not remove already reserved regions
RDMA/rtrs: Extend ibtrs_cq_qp_create
RDMA/rtrs-srv: Release lock before call into close_sess
RDMA/rtrs-srv: Use sysfs_remove_file_self for disconnect
RDMA/rtrs-clt: Set mininum limit when create QP
RDMA/rtrs: Call kobject_put in the failure path
RDMA/rtrs-srv: Fix missing wr_cqe
RDMA/rtrs-clt: Refactor the failure cases in alloc_clt
RDMA/rtrs-srv: Init wr_cnt as 1
power: reset: at91-sama5d2_shdwc: fix wkupdbc mask
rtc: s5m: select REGMAP_I2C
dmaengine: idxd: set DMA channel to be private
power: supply: fix sbs-charger build, needs REGMAP_I2C
clocksource/drivers/ixp4xx: Select TIMER_OF when needed
clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined
spi: imx: Don't print error on -EPROBEDEFER
RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation
IB/mlx5: Add mutex destroy call to cap_mask_mutex mutex
clk: sunxi-ng: h6: Fix clock divider range on some clocks
platform/chrome: cros_ec_proto: Use EC_HOST_EVENT_MASK not BIT
platform/chrome: cros_ec_proto: Add LID and BATTERY to default mask
regulator: axp20x: Fix reference cout leak
watch_queue: Drop references to /dev/watch_queue
certs: Fix blacklist flag type confusion
regulator: s5m8767: Fix reference count leak
spi: atmel: Put allocated master before return
regulator: s5m8767: Drop regulators OF node reference
power: supply: axp20x_usb_power: Init work before enabling IRQs
power: supply: smb347-charger: Fix interrupt usage if interrupt is unavailable
regulator: core: Avoid debugfs: Directory ... already present! error
isofs: release buffer head before return
watchdog: intel-mid_wdt: Postpone IRQ handler registration till SCU is ready
auxdisplay: ht16k33: Fix refresh rate handling
objtool: Fix error handling for STD/CLD warnings
objtool: Fix retpoline detection in asm code
objtool: Fix ".cold" section suffix check for newer versions of GCC
scsi: lpfc: Fix ancient double free
iommu: Switch gather->end to the inclusive end
IB/umad: Return EIO in case of when device disassociated
IB/umad: Return EPOLLERR in case of when device disassociated
KVM: PPC: Make the VMX instruction emulation routines static
powerpc/47x: Disable 256k page size
powerpc/time: Enable sched clock for irqtime
mmc: owl-mmc: Fix a resource leak in an error handling path and in the remove function
mmc: sdhci-sprd: Fix some resource leaks in the remove function
mmc: usdhi6rol0: Fix a resource leak in the error handling path of the probe
mmc: renesas_sdhi_internal_dmac: Fix DMA buffer alignment from 8 to 128-bytes
ARM: 9046/1: decompressor: Do not clear SCTLR.nTLSMD for ARMv7+ cores
i2c: qcom-geni: Store DMA mapping data in geni_i2c_dev struct
amba: Fix resource leak for drivers without .remove
iommu: Move iotlb_sync_map out from __iommu_map
iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping
IB/mlx5: Return appropriate error code instead of ENOMEM
IB/cm: Avoid a loop when device has 255 ports
tracepoint: Do not fail unregistering a probe due to memory failure
rtc: zynqmp: depend on HAS_IOMEM
perf tools: Fix DSO filtering when not finding a map for a sampled address
perf vendor events arm64: Fix Ampere eMag event typo
RDMA/rxe: Fix coding error in rxe_recv.c
RDMA/rxe: Fix coding error in rxe_rcv_mcast_pkt
RDMA/rxe: Correct skb on loopback path
spi: stm32: properly handle 0 byte transfer
mfd: altera-sysmgr: Fix physical address storing more
mfd: wm831x-auxadc: Prevent use after free in wm831x_auxadc_read_irq()
powerpc/pseries/dlpar: handle ibm, configure-connector delay status
powerpc/8xx: Fix software emulation interrupt
clk: qcom: gcc-msm8998: Fix Alpha PLL type for all GPLLs
kunit: tool: fix unit test cleanup handling
kselftests: dmabuf-heaps: Fix Makefile's inclusion of the kernel's usr/include dir
RDMA/hns: Fixed wrong judgments in the goto branch
RDMA/siw: Fix calculation of tx_valid_cpus size
RDMA/hns: Fix type of sq_signal_bits
RDMA/hns: Disable RQ inline by default
clk: divider: fix initialization with parent_hw
spi: pxa2xx: Fix the controller numbering for Wildcat Point
powerpc/uaccess: Avoid might_fault() when user access is enabled
powerpc/kuap: Restore AMR after replaying soft interrupts
regulator: qcom-rpmh: fix pm8009 ldo7
clk: aspeed: Fix APLL calculate formula from ast2600-A2
selftests/ftrace: Update synthetic event syntax errors
perf symbols: Use (long) for iterator for bfd symbols
regulator: bd718x7, bd71828, Fix dvs voltage levels
spi: dw: Avoid stack content exposure
spi: Skip zero-length transfers in spi_transfer_one_message()
printk: avoid prb_first_valid_seq() where possible
perf symbols: Fix return value when loading PE DSO
nfsd: register pernet ops last, unregister first
svcrdma: Hold private mutex while invoking rdma_accept()
ceph: fix flush_snap logic after putting caps
RDMA/hns: Fixes missing error code of CMDQ
RDMA/ucma: Fix use-after-free bug in ucma_create_uevent
RDMA/rtrs-srv: Fix stack-out-of-bounds
RDMA/rtrs: Only allow addition of path to an already established session
RDMA/rtrs-srv: fix memory leak by missing kobject free
RDMA/rtrs-srv-sysfs: fix missing put_device
RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR()
Input: sur40 - fix an error code in sur40_probe()
perf record: Fix continue profiling after draining the buffer
perf intel-pt: Fix missing CYC processing in PSB
perf intel-pt: Fix premature IPC
perf intel-pt: Fix IPC with CYC threshold
perf test: Fix unaligned access in sample parsing test
Input: elo - fix an error code in elo_connect()
sparc64: only select COMPAT_BINFMT_ELF if BINFMT_ELF is set
sparc: fix led.c driver when PROC_FS is not enabled
Input: zinitix - fix return type of zinitix_init_touch()
ARM: 9065/1: OABI compat: fix build when EPOLL is not enabled
misc: eeprom_93xx46: Fix module alias to enable module autoprobe
phy: rockchip-emmc: emmc_phy_init() always return 0
phy: cadence-torrent: Fix error code in cdns_torrent_phy_probe()
misc: eeprom_93xx46: Add module alias to avoid breaking support for non device tree users
PCI: rcar: Always allocate MSI addresses in 32bit space
soundwire: cadence: fix ACK/NAK handling
pwm: rockchip: Enable APB clock during register access while probing
pwm: rockchip: rockchip_pwm_probe(): Remove superfluous clk_unprepare()
pwm: rockchip: Eliminate potential race condition when probing
PCI: xilinx-cpm: Fix reference count leak on error path
VMCI: Use set_page_dirty_lock() when unregistering guest memory
PCI: Align checking of syscall user config accessors
mei: hbm: call mei_set_devstate() on hbm stop response
drm/msm: Fix MSM_INFO_GET_IOVA with carveout
drm/msm/dsi: Correct io_start for MSM8994 (20nm PHY)
drm/msm/mdp5: Fix wait-for-commit for cmd panels
drm/msm: Fix race of GPU init vs timestamp power management.
drm/msm: Fix races managing the OOB state for timestamp vs timestamps.
drm/msm/dp: trigger unplug event in msm_dp_display_disable
vfio/iommu_type1: Populate full dirty when detach non-pinned group
vfio/iommu_type1: Fix some sanity checks in detach group
vfio-pci/zdev: fix possible segmentation fault issue
ext4: fix potential htree index checksum corruption
phy: USB_LGM_PHY should depend on X86
coresight: etm4x: Skip accessing TRCPDCR in save/restore
nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of()
nvmem: core: skip child nodes not matching binding
soundwire: bus: use sdw_update_no_pm when initializing a device
soundwire: bus: use sdw_write_no_pm when setting the bus scale registers
soundwire: export sdw_write/read_no_pm functions
soundwire: bus: fix confusion on device used by pm_runtime
misc: fastrpc: fix incorrect usage of dma_map_sgtable
remoteproc/mediatek: acknowledge watchdog IRQ after handled
regmap: sdw: use _no_pm functions in regmap_read/write
ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it
mailbox: sprd: correct definition of SPRD_OUTBOX_FIFO_FULL
device-dax: Fix default return code of range_parse()
PCI: pci-bridge-emul: Fix array overruns, improve safety
PCI: cadence: Fix DMA range mapping early return error
i40e: Fix flow for IPv6 next header (extension header)
i40e: Add zero-initialization of AQ command structures
i40e: Fix overwriting flow control settings during driver loading
i40e: Fix addition of RX filters after enabling FW LLDP agent
i40e: Fix VFs not created
Take mmap lock in cacheflush syscall
nios2: fixed broken sys_clone syscall
i40e: Fix add TC filter for IPv6
octeontx2-af: Fix an off by one in rvu_dbg_qsize_write()
pwm: iqs620a: Fix overflow and optimize calculations
vfio/type1: Use follow_pte()
ice: report correct max number of TCs
ice: Account for port VLAN in VF max packet size calculation
ice: Fix state bits on LLDP mode switch
ice: update the number of available RSS queues
net: stmmac: fix CBS idleslope and sendslope calculation
net/mlx4_core: Add missed mlx4_free_cmd_mailbox()
PCI: rockchip: Make 'ep-gpios' DT property optional
vxlan: move debug check after netdev unregister
wireguard: device: do not generate ICMP for non-IP packets
wireguard: kconfig: use arm chacha even with no neon
ocfs2: fix a use after free on error
mm: memcontrol: fix NR_ANON_THPS accounting in charge moving
mm: memcontrol: fix slub memory accounting
mm/memory.c: fix potential pte_unmap_unlock pte error
mm/hugetlb: fix potential double free in hugetlb_register_node() error path
mm/hugetlb: suppress wrong warning info when alloc gigantic page
mm/compaction: fix misbehaviors of fast_find_migrateblock()
r8169: fix jumbo packet handling on RTL8168e
NFSv4: Fixes for nfs4_bitmask_adjust()
KVM: SVM: Intercept INVPCID when it's disabled to inject #UD
KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages
arm64: Add missing ISB after invalidating TLB in __primary_switch
i2c: brcmstb: Fix brcmstd_send_i2c_cmd condition
i2c: exynos5: Preserve high speed master code
mm,thp,shmem: make khugepaged obey tmpfs mount flags
mm: fix memory_failure() handling of dax-namespace metadata
mm/rmap: fix potential pte_unmap on an not mapped pte
proc: use kvzalloc for our kernel buffer
csky: Fix a size determination in gpr_get()
scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc
block: reopen the device in blkdev_reread_part
ide/falconide: Fix module unload
scsi: sd: Fix Opal support
blk-settings: align max_sectors on "logical_block_size" boundary
soundwire: intel: fix possible crash when no device is detected
ACPI: property: Fix fwnode string properties matching
ACPI: configfs: add missing check after configfs_register_default_group()
cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known
HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming
HID: wacom: Ignore attempts to overwrite the touch_max value from HID
Input: raydium_ts_i2c - do not send zero length
Input: xpad - add support for PowerA Enhanced Wired Controller for Xbox Series X|S
Input: joydev - prevent potential read overflow in ioctl
Input: i8042 - add ASUS Zenbook Flip to noselftest list
media: mceusb: Fix potential out-of-bounds shift
USB: serial: option: update interface mapping for ZTE P685M
usb: musb: Fix runtime PM race in musb_queue_resume_work
usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
USB: serial: ftdi_sio: fix FTX sub-integer prescaler
USB: serial: pl2303: fix line-speed handling on newer chips
USB: serial: mos7840: fix error code in mos7840_write()
USB: serial: mos7720: fix error code in mos7720_write()
phy: lantiq: rcu-usb2: wait after clock enable
ALSA: fireface: fix to parse sync status register of latter protocol
ALSA: hda: Add another CometLake-H PCI ID
ALSA: hda/hdmi: Drop bogus check at closing a stream
ALSA: hda/realtek: modify EAPD in the ALC886
ALSA: hda/realtek: Quirk for HP Spectre x360 14 amp setup
MIPS: Ingenic: Disable HPTLB for D0 XBurst CPUs too
MIPS: Support binutils configured with --enable-mips-fix-loongson3-llsc=yes
MIPS: VDSO: Use CLANG_FLAGS instead of filtering out '--target='
Revert "MIPS: Octeon: Remove special handling of CONFIG_MIPS_ELF_APPENDED_DTB=y"
Revert "bcache: Kill btree_io_wq"
bcache: Give btree_io_wq correct semantics again
bcache: Move journal work to new flush wq
Revert "drm/amd/display: Update NV1x SR latency values"
drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth()
drm/amd/display: Remove Assert from dcn10_get_dig_frontend
drm/amd/display: Add vupdate_no_lock interrupts for DCN2.1
drm/amdkfd: Fix recursive lock warnings
drm/amdgpu: Set reference clock to 100Mhz on Renoir (v2)
drm/nouveau/kms: handle mDP connectors
drm/modes: Switch to 64bit maths to avoid integer overflow
drm/sched: Cancel and flush all outstanding jobs before finish.
drm/panel: kd35t133: allow using non-continuous dsi clock
drm/rockchip: Require the YTR modifier for AFBC
ASoC: siu: Fix build error by a wrong const prefix
selinux: fix inconsistency between inode_getxattr and inode_listsecurity
erofs: initialized fields can only be observed after bit is set
tpm_tis: Fix check_locality for correct locality acquisition
tpm_tis: Clean up locality release
KEYS: trusted: Fix incorrect handling of tpm_get_random()
KEYS: trusted: Fix migratable=1 failing
KEYS: trusted: Reserve TPM for seal and unseal operations
btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
btrfs: do not warn if we can't find the reloc root when looking up backref
btrfs: add asserts for deleting backref cache nodes
btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
btrfs: fix reloc root leak with 0 ref reloc roots on recovery
btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
btrfs: account for new extents being deleted in total_bytes_pinned
btrfs: fix extent buffer leak on failure to copy root
drm/i915/gt: Flush before changing register state
drm/i915/gt: Correct surface base address for renderclear
crypto: arm64/sha - add missing module aliases
crypto: aesni - prevent misaligned buffers on the stack
crypto: michael_mic - fix broken misalignment handling
crypto: sun4i-ss - checking sg length is not sufficient
crypto: sun4i-ss - IV register does not work on A10 and A13
crypto: sun4i-ss - handle BigEndian for cipher
crypto: sun4i-ss - initialize need_fallback
soc: samsung: exynos-asv: don't defer early on not-supported SoCs
soc: samsung: exynos-asv: handle reading revision register error
seccomp: Add missing return in non-void function
arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL)
misc: rtsx: init of rts522a add OCP power off when no card is present
drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue
pstore: Fix typo in compression option name
dts64: mt7622: fix slow sd card access
arm64: dts: agilex: fix phy interface bit shift for gmac1 and gmac2
staging/mt7621-dma: mtk-hsdma.c->hsdma-mt7621.c
staging: gdm724x: Fix DMA from stack
staging: rtl8188eu: Add Edimax EW-7811UN V2 to device table
floppy: reintroduce O_NDELAY fix
media: i2c: max9286: fix access to unallocated memory
media: ir_toy: add another IR Droid device
media: ipu3-cio2: Fix mbus_code processing in cio2_subdev_set_fmt()
media: marvell-ccic: power up the device on mclk enable
media: smipcie: fix interrupt handling and IR timeout
x86/virt: Eat faults on VMXOFF in reboot flows
x86/reboot: Force all cpus to exit VMX root if VMX is supported
x86/fault: Fix AMD erratum #91 errata fixup for user code
x86/entry: Fix instrumentation annotation
powerpc/prom: Fix "ibm,arch-vec-5-platform-support" scan
rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
rcu/nocb: Perform deferred wake up before last idle's need_resched() check
kprobes: Fix to delay the kprobes jump optimization
arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55
iommu/arm-smmu-qcom: Fix mask extraction for bootloader programmed SMRs
arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails
arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing
arm64 module: set plt* section addresses to 0x0
arm64: spectre: Prevent lockdep splat on v4 mitigation enable path
riscv: Disable KSAN_SANITIZE for vDSO
watchdog: qcom: Remove incorrect usage of QCOM_WDT_ENABLE_IRQ
watchdog: mei_wdt: request stop on unregister
coresight: etm4x: Handle accesses to TRCSTALLCTLR
mtd: spi-nor: sfdp: Fix last erase region marking
mtd: spi-nor: sfdp: Fix wrong erase type bitmask for overlaid region
mtd: spi-nor: core: Fix erase type discovery for overlaid region
mtd: spi-nor: core: Add erase size check for erase command initialization
mtd: spi-nor: hisi-sfc: Put child node np on error path
fs/affs: release old buffer head on error path
seq_file: document how per-entry resources are managed.
x86: fix seq_file iteration for pat/memtype.c
mm: memcontrol: fix swap undercounting in cgroup2
mm: memcontrol: fix get_active_memcg return value
hugetlb: fix update_and_free_page contig page struct assumption
hugetlb: fix copy_huge_page_from_user contig page struct assumption
mm/vmscan: restore zone_reclaim_mode ABI
mm, compaction: make fast_isolate_freepages() stay within zone
KVM: nSVM: fix running nested guests when npt=0
nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer
module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
mmc: sdhci-esdhc-imx: fix kernel panic when remove module
mmc: sdhci-pci-o2micro: Bug fix for SDR104 HW tuning failure
powerpc/32: Preserve cr1 in exception prolog stack check to fix build error
powerpc/kexec_file: fix FDT size estimation for kdump kernel
powerpc/32s: Add missing call to kuep_lock on syscall entry
spmi: spmi-pmic-arb: Fix hw_irq overflow
mei: fix transfer over dma with extended header
mei: me: emmitsburg workstation DID
mei: me: add adler lake point S DID
mei: me: add adler lake point LP DID
gpio: pcf857x: Fix missing first interrupt
mfd: gateworks-gsc: Fix interrupt type
printk: fix deadlock when kernel panic
exfat: fix shift-out-of-bounds in exfat_fill_super()
zonefs: Fix file size of zones in full condition
kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks
cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument
cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available
proc: don't allow async path resolution of /proc/thread-self components
s390/vtime: fix inline assembly clobber list
virtio/s390: implement virtio-ccw revision 2 correctly
um: mm: check more comprehensively for stub changes
um: defer killing userspace on page table update failures
irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap
f2fs: fix out-of-repair __setattr_copy()
f2fs: enforce the immutable flag on open files
f2fs: flush data when enabling checkpoint back
sparc32: fix a user-triggerable oops in clear_user()
spi: fsl: invert spisel_boot signal on MPC8309
spi: spi-synquacer: fix set_cs handling
gfs2: fix glock confusion in function signal_our_withdraw
gfs2: Don't skip dlm unlock if glock has an lvb
gfs2: Lock imbalance on error path in gfs2_recover_one
gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end
dm: fix deadlock when swapping to encrypted device
dm table: fix iterate_devices based device capability checks
dm table: fix DAX iterate_devices based device capability checks
dm table: fix zoned iterate_devices based device capability checks
dm writecache: fix performance degradation in ssd mode
dm writecache: return the exact table values that were set
dm writecache: fix writing beyond end of underlying device when shrinking
dm era: Recover committed writeset after crash
dm era: Update in-core bitset after committing the metadata
dm era: Verify the data block size hasn't changed
dm era: Fix bitset memory leaks
dm era: Use correct value size in equality function of writeset tree
dm era: Reinitialize bitset cache before digesting a new writeset
dm era: only resize metadata in preresume
drm/i915: Reject 446-480MHz HDMI clock on GLK
kgdb: fix to kill breakpoints on initmem after boot
ipv6: silence compilation warning for non-IPV6 builds
net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending
wireguard: selftests: test multiple parallel streams
wireguard: queueing: get rid of per-peer ring buffers
net: sched: fix police ext initialization
net: qrtr: Fix memory leak in qrtr_tun_open
net_sched: fix RTNL deadlock again caused by request_module()
ARM: dts: aspeed: Add LCLK to lpc-snoop
Linux 5.10.20
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3fbcecd9413ce212dac68d5cc800c9457feba56a
commit ee576c47db upstream.
The icmp{,v6}_send functions make all sorts of use of skb->cb, casting
it with IPCB or IP6CB, assuming the skb to have come directly from the
inet layer. But when the packet comes from the ndo layer, especially
when forwarded, there's no telling what might be in skb->cb at that
point. As a result, the icmp sending code risks reading bogus memory
contents, which can result in nasty stack overflows such as this one
reported by a user:
panic+0x108/0x2ea
__stack_chk_fail+0x14/0x20
__icmp_send+0x5bd/0x5c0
icmp_ndo_send+0x148/0x160
In icmp_send, skb->cb is cast with IPCB and an ip_options struct is read
from it. The optlen parameter there is of particular note, as it can
induce writes beyond bounds. There are quite a few ways that can happen
in __ip_options_echo. For example:
// sptr/skb are attacker-controlled skb bytes
sptr = skb_network_header(skb);
// dptr/dopt points to stack memory allocated by __icmp_send
dptr = dopt->__data;
// sopt is the corrupt skb->cb in question
if (sopt->rr) {
optlen = sptr[sopt->rr+1]; // corrupt skb->cb + skb->data
soffset = sptr[sopt->rr+2]; // corrupt skb->cb + skb->data
// this now writes potentially attacker-controlled data, over
// flowing the stack:
memcpy(dptr, sptr+sopt->rr, optlen);
}
In the icmpv6_send case, the story is similar, but not as dire, as only
IP6CB(skb)->iif and IP6CB(skb)->dsthao are used. The dsthao case is
worse than the iif case, but it is passed to ipv6_find_tlv, which does
a bit of bounds checking on the value.
This is easy to simulate by doing a `memset(skb->cb, 0x41,
sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by
good fortune and the rarity of icmp sending from that context that we've
avoided reports like this until now. For example, in KASAN:
BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0
Write of size 38 at addr ffff888006f1f80e by task ping/89
CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5
Call Trace:
dump_stack+0x9a/0xcc
print_address_description.constprop.0+0x1a/0x160
__kasan_report.cold+0x20/0x38
kasan_report+0x32/0x40
check_memory_region+0x145/0x1a0
memcpy+0x39/0x60
__ip_options_echo+0xa0e/0x12b0
__icmp_send+0x744/0x1700
Actually, out of the 4 drivers that do this, only gtp zeroed the cb for
the v4 case, while the rest did not. So this commit actually removes the
gtp-specific zeroing, while putting the code where it belongs in the
shared infrastructure of icmp{,v6}_ndo_send.
This commit fixes the issue by passing an empty IPCB or IP6CB along to
the functions that actually do the work. For the icmp_send, this was
already trivial, thanks to __icmp_send providing the plumbing function.
For icmpv6_send, this required a tiny bit of refactoring to make it
behave like the v4 case, after which it was straight forward.
Fixes: a2b78e9b2c ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs")
Reported-by: SinYu <liuxyon@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20210223131858.72082-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steps on the way to 5.10-rc1
Resolves merge issues in:
drivers/net/virtio_net.c
net/xfrm/xfrm_state.c
net/xfrm/xfrm_user.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3132e7802f25cb775eb02d0b3a03068da39a6fe2
The arg exact_dif is not used anymore, remove it. inet6_exact_dif_match()
is no longer needed after the above is removed, remove it too.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is no longer used, SCTP now uses a private helper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steps on the way to 5.9-rc1
Resolves conflicts in:
drivers/irqchip/qcom-pdc.c
include/linux/device.h
net/xfrm/xfrm_state.c
security/lsm_audit.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4aeb3d04f4717714a421721eb3ce690c099bb30a
Extend the rfc 4884 read interface introduced for ipv4 in
commit eba75c587e ("icmp: support rfc 4884") to ipv6.
Add socket option SOL_IPV6/IPV6_RECVERR_RFC4884.
Changes v1->v2:
- make ipv6_icmp_error_rfc4884 static (file scope)
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drop the doubled word "by" in a comment.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a quest to divide up the 5.7-rc1 merge chunks into reviewable pieces.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2e5960415348c06e8f10e10cbefb3ee5c3745e73
This patch adds rpl source routing receive handling. Everything works
only if sysconf "rpl_seg_enabled" and source routing is enabled. Mostly
the same behaviour as IPv6 segmentation routing. To handle compression
and uncompression a rpl.c file is created which contains the necessary
functionality. The receive handling will also care about IPv6
encapsulated so far it's specified as possible nexthdr in RFC 6554.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, IPv6 router discovery always puts routes into
RT6_TABLE_MAIN. This causes problems for connection managers
that want to support multiple simultaneous network connections
and want control over which one is used by default (e.g., wifi
and wired).
To work around this connection managers typically take the routes
they prefer and copy them to static routes with low metrics in
the main table. This puts the burden on the connection manager
to watch netlink to see if the routes have changed, delete the
routes when their lifetime expires, etc.
Instead, this patch adds a per-interface sysctl to have the
kernel put autoconf routes into different tables. This allows
each interface to have its own autoconf table, and choosing the
default interface (or using different interfaces at the same
time for different types of traffic) can be done using
appropriate ip rules.
The sysctl behaves as follows:
- = 0: default. Put routes into RT6_TABLE_MAIN as before.
- > 0: manual. Put routes into the specified table.
- < 0: automatic. Add the absolute value of the sysctl to the
device's ifindex, and use that table.
The automatic mode is most useful in conjunction with
net.ipv6.conf.default.accept_ra_rt_table. A connection manager
or distribution could set it to, say, -100 on boot, and
thereafter just use IP rules.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
[AmitP: Refactored original changes to align with
the changes introduced by upstream commits
830218c1ad ("net: ipv6: Fix processing of RAs in presence of VRF"),
8d1c802b28 ("net/ipv6: Flip FIB entries to fib6_info").
Also folded following android-4.9 commit changes into this patch
be65fb01da4d ("ANDROID: net: ipv6: remove unused variable ifindex in")]
Bug: 120445791
Change-Id: I82d16e3737d9cdfa6489e649e247894d0d60cbb1
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
By default IPv6 socket with IPV6_ROUTER_ALERT socket option set will
receive all IPv6 RA packets from all namespaces.
IPV6_ROUTER_ALERT_ISOLATE socket option restricts packets received by
the socket to be only from the socket's namespace.
Signed-off-by: Maxim Martynov <maxim@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch refactors ip_mc_check_igmp(), ipv6_mc_check_mld() and
their callers (more precisely, the Linux bridge) to not rely on
the skb_trimmed parameter anymore.
An skb with its tail trimmed to the IP packet length was initially
introduced for the following three reasons:
1) To be able to verify the ICMPv6 checksum.
2) To be able to distinguish the version of an IGMP or MLD query.
They are distinguishable only by their size.
3) To avoid parsing data for an IGMPv3 or MLDv2 report that is
beyond the IP packet but still within the skb.
The first case still uses a cloned and potentially trimmed skb to
verfiy. However, there is no need to propagate it to the caller.
For the second and third case explicit IP packet length checks were
added.
This hopefully makes ip_mc_check_igmp() and ipv6_mc_check_mld() easier
to read and verfiy, as well as easier to use.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
The socket option will be enabled by default to ensure current behaviour
is not changed. This is the same for the IPv4 version.
A socket bound to in6addr_any and a specific port will receive all traffic
on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
one or more multicast groups were joined (using said socket) and only
pass on multicast traffic from groups, which were explicitly joined via
this socket.
Without this option disabled a socket (system even) joined to multiple
multicast groups is very hard to get right. Filtering by destination
address has to take place in user space to avoid receiving multicast
traffic from other multicast groups, which might have traffic on the same
port.
The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
too, is not done to avoid changing the behaviour of current applications.
Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
Acked-By: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.
The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.
To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.
Note, this changes behavior a little bit. Before commit 42240901f7
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a per-device sysctl to specify the default traffic class to use for
kernel originated IPv6 Neighbour Discovery packets.
Currently this includes:
- Router Solicitation (ICMPv6 type 133)
ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()
- Neighbour Solicitation (ICMPv6 type 135)
ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()
- Neighbour Advertisement (ICMPv6 type 136)
ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()
- Redirect (ICMPv6 type 137)
ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()
and if the kernel ever gets around to generating RA's,
it would presumably also include:
- Router Advertisement (ICMPv6 type 134)
(radvd daemon could pick up on the kernel setting and use it)
Interface drivers may examine the Traffic Class value and translate
the DiffServ Code Point into a link-layer appropriate traffic
prioritization scheme. An example of mapping IETF DSCP values to
IEEE 802.11 User Priority values can be found here:
https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11
The expected primary use case is to properly prioritize ND over wifi.
Testing:
jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
0
jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
-bash: echo: write error: Invalid argument
jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
-bash: echo: write error: Invalid argument
jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
255
jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
34
jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
length 24, tgt is jzem22.pgc, Flags [solicited]
(based on original change written by Erik Kline, with minor changes)
v2: fix 'suspicious rcu_dereference_check() usage'
by explicitly grabbing the rcu_read_lock.
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a second device index, sdif, to udp socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.
Early demux lookups are handled in the next patch as part of INET_MATCH
changes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 67a51780ae ("ipv6: udp: leverage scratch area
helpers") udp6_recvmsg() read the skb len from the scratch area,
to avoid a cache miss.
But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
length exceeds the 16 bits available in the scratch area. As a side
effect the length returned by recvmsg() is:
<ingress datagram len> % (1<<16)
This commit addresses the issue allocating one more bit in the
IP6CB flags field and setting it for incoming jumbograms.
Such field is still in the first cacheline, so at recvmsg()
time we can check it and fallback to access skb->len if
required, without a measurable overhead.
Fixes: 67a51780ae ("ipv6: udp: leverage scratch area helpers")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds a new sysctl accept_ra_rt_info_min_plen that
defines the minimum acceptable prefix length of Route Information
Options. The new sysctl is intended to be used together with
accept_ra_rt_info_max_plen to configure a range of acceptable
prefix lengths. It is useful to prevent misconfigurations from
unintentionally blackholing too much of the IPv6 address space
(e.g., home routers announcing RIOs for fc00::/7, which is
incorrect).
Signed-off-by: Joel Scherpelz <jscherpelz@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides equivalent functionality to the existing ipv4
"disable_policy" systcl. ie. Allows IPsec processing to be skipped
on terminating packets on a per-interface basis.
Signed-off-by: David Forster <dforster@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The address generation mode for IPv6 link-local can only be configured
by netlink messages. This patch adds the ability to change the address
generation mode via sysctl.
v1 -> v2
Removed the rtnl lock and switch to use RCU lock to iterate through
the netdev list.
v2 -> v3
Removed the addrgenmode variable from the idev structure and use the
systcl storage for the flag.
Simplifed the logic for sysctl handling by removing the supported
for all operation.
Added support for more types of tunnel interfaces for link-local
address generation.
Based the patches from net-next.
v3 -> v4
Removed unnecessary whitespace changes.
Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implemented RFC7527 Enhanced DAD.
IPv6 duplicate address detection can fail if there is some temporary
loopback of Ethernet frames. RFC7527 solves this by including a random
nonce in the NS messages used for DAD, and if an NS is received with the
same nonce it is assumed to be a looped back DAD probe and is ignored.
RFC7527 is enabled by default. Can be disabled by setting both of
conf/{all,interface}/enhanced_dad to zero.
Signed-off-by: Erik Nordmark <nordmark@arista.com>
Signed-off-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary functions to compute and check the HMAC signature
of an SR-enabled packet. Two HMAC algorithms are supported: hmac(sha1) and
hmac(sha256).
In order to avoid dynamic memory allocation for each HMAC computation,
a per-cpu ring buffer is allocated for this purpose.
A new per-interface sysctl called seg6_require_hmac is added, allowing a
user-defined policy for processing HMAC-signed SR-enabled packets.
A value of -1 means that the HMAC field will always be ignored.
A value of 0 means that if an HMAC field is present, its validity will
be enforced (the packet is dropped is the signature is incorrect).
Finally, a value of 1 means that any SR-enabled packet that does not
contain an HMAC signature or whose signature is incorrect will be dropped.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement minimal support for processing of SR-enabled packets
as described in
https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02.
This patch implements the following operations:
- Intermediate segment endpoint: incrementation of active segment and rerouting.
- Egress for SR-encapsulated packets: decapsulation of outer IPv6 header + SRH
and routing of inner packet.
- Cleanup flag support for SR-inlined packets: removal of SRH if we are the
penultimate segment endpoint.
A per-interface sysctl seg6_enabled is provided, to accept/deny SR-enabled
packets. Default is deny.
This patch does not provide support for HMAC-signed packets.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
When reading a datagram or raw packet that arrived fragmented, expose
the maximum fragment size if recorded to allow applications to
estimate receive path MTU.
At this point, the field is only recorded when ipv6 connection
tracking is enabled. A follow-up patch will record this field also
in the ipv6 input path.
Tested using the test for IP_RECVFRAGSIZE plus
ip netns exec to ip addr add dev veth1 fc07::1/64
ip netns exec from ip addr add dev veth0 fc07::2/64
ip netns exec to ./recv_cmsg_recvfragsize -6 -u -p 6000 &
ip netns exec from nc -q 1 -u fc07::1 6000 < payload
Both with and without enabling connection tracking
ip6tables -A INPUT -m state --state NEW -p udp -j LOG
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, socket lookups for l3mdev (vrf) use cases can match a socket
that is bound to a port but not a device (ie., a global socket). If the
sysctl tcp_l3mdev_accept is not set this leads to ack packets going out
based on the main table even though the packet came in from an L3 domain.
The end result is that the connection does not establish creating
confusion for users since the service is running and a socket shows in
ss output. Fix by requiring an exact dif to sk_bound_dev_if match if the
skb came through an interface enslaved to an l3mdev device and the
tcp_l3mdev_accept is not set.
skb's through an l3mdev interface are marked by setting a flag in
inet{6}_skb_parm. The IPv6 variant is already set; this patch adds the
flag for IPv4. Using an skb flag avoids a device lookup on the dif. The
flag is set in the VRF driver using the IP{6}CB macros. For IPv4, the
inet_skb_parm struct is moved in the cb per commit 971f10eca1, so the
match function in the TCP stack needs to use TCP_SKB_CB. For IPv6, the
move is done after the socket lookup, so IP6CB is used.
The flags field in inet_skb_parm struct needs to be increased to add
another flag. There is currently a 1-byte hole following the flags,
so it can be expanded to u16 without increasing the size of the struct.
Fixes: 193125dbd8 ("net: Introduce VRF device driver")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements:
https://tools.ietf.org/html/rfc7559
Backoff is performed according to RFC3315 section 14:
https://tools.ietf.org/html/rfc3315#section-14
We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations
to a negative value meaning an unlimited number of retransmits,
and we make this the new default (inline with the RFC).
We also add a new setting:
/proc/sys/net/ipv6/conf/*/router_solicitation_max_interval
defaulting to 1 hour (per RFC recommendation).
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frank Kellermann reported a kernel crash with 4.5.0 when IPv6 is
disabled at boot using the kernel option ipv6.disable=1. Using
current net-next with the boot option:
$ ip link add red type vrf table 1001
Generates:
[12210.919584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000748
[12210.921341] IP: [<ffffffff814b30e3>] fib6_get_table+0x2c/0x5a
[12210.922537] PGD b79e3067 PUD bb32b067 PMD 0
[12210.923479] Oops: 0000 [#1] SMP
[12210.924001] Modules linked in: ipvlan 8021q garp mrp stp llc
[12210.925130] CPU: 3 PID: 1177 Comm: ip Not tainted 4.7.0-rc1+ #235
[12210.926168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[12210.928065] task: ffff8800b9ac4640 ti: ffff8800bacac000 task.ti: ffff8800bacac000
[12210.929328] RIP: 0010:[<ffffffff814b30e3>] [<ffffffff814b30e3>] fib6_get_table+0x2c/0x5a
[12210.930697] RSP: 0018:ffff8800bacaf888 EFLAGS: 00010202
[12210.931563] RAX: 0000000000000748 RBX: ffffffff81a9e280 RCX: ffff8800b9ac4e28
[12210.932688] RDX: 00000000000000e9 RSI: 0000000000000002 RDI: 0000000000000286
[12210.933820] RBP: ffff8800bacaf898 R08: ffff8800b9ac4df0 R09: 000000000052001b
[12210.934941] R10: 00000000657c0000 R11: 000000000000c649 R12: 00000000000003e9
[12210.936032] R13: 00000000000003e9 R14: ffff8800bace7800 R15: ffff8800bb3ec000
[12210.937103] FS: 00007faa1766c700(0000) GS:ffff88013ac00000(0000) knlGS:0000000000000000
[12210.938321] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12210.939166] CR2: 0000000000000748 CR3: 00000000b79d6000 CR4: 00000000000406e0
[12210.940278] Stack:
[12210.940603] ffff8800bb3ec000 ffffffff81a9e280 ffff8800bacaf8c8 ffffffff814b3135
[12210.941818] ffff8800bb3ec000 ffffffff81a9e280 ffffffff81a9e280 ffff8800bace7800
[12210.943040] ffff8800bacaf8f0 ffffffff81397c88 ffff8800bb3ec000 ffffffff81a9e280
[12210.944288] Call Trace:
[12210.944688] [<ffffffff814b3135>] fib6_new_table+0x24/0x8a
[12210.945516] [<ffffffff81397c88>] vrf_dev_init+0xd4/0x162
[12210.946328] [<ffffffff814091e1>] register_netdevice+0x100/0x396
[12210.947209] [<ffffffff8139823d>] vrf_newlink+0x40/0xb3
[12210.948001] [<ffffffff814187f0>] rtnl_newlink+0x5d3/0x6d5
...
The problem above is due to the fact that the fib hash table is not
allocated when IPv6 is disabled at boot.
As for the VRF driver it should not do any IPv6 initializations if IPv6
is disabled, so it needs to know if IPv6 is disabled at boot. The disable
parameter is private to the IPv6 module, so provide an accessor for
modules to determine if IPv6 was disabled at boot time.
Fixes: 35402e3136 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the VRF driver uses the rx_handler to switch the skb device
to the VRF device. Switching the dev prior to the ip / ipv6 layer
means the VRF driver has to duplicate IP/IPv6 processing which adds
overhead and makes features such as retaining the ingress device index
more complicated than necessary.
This patch moves the hook to the L3 layer just after the first NF_HOOK
for PRE_ROUTING. This location makes exposing the original ingress device
trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
in the future.
dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
with the switched device through the packet taps to maintain current
behavior (tcpdump can be used on either the vrf device or the enslaved
devices).
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Struct ctl_table_header holds pointer to sysctl table which could be used
for freeing it after unregistration. IPv4 sysctls already use that.
Remove redundant NULL assignment: ndev allocated using kzalloc.
This also saves some bytes: sysctl table could be shorter than
DEVCONF_MAX+1 if some options are disable in config.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, all ipv6 addresses are flushed when the interface is configured
down, including global, static addresses:
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 2100:1::2/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever
$ ip link set dev eth1 down
$ ip -6 addr show dev eth1
<< nothing; all addresses have been flushed>>
Add a new sysctl to make this behavior optional. The new setting defaults to
flush all addresses to maintain backwards compatibility. When the set global
addresses with no expire times are not flushed on an admin down. The sysctl
is per-interface or system-wide for all interfaces
$ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
or
$ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
Will keep addresses on eth1 on an admin down.
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 2100:1::2/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever
$ ip link set dev eth1 down
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
inet6 2100:1::2/120 scope global tentative
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
valid_lft forever preferred_lft forever
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In certain 802.11 wireless deployments, there will be NA proxies
that use knowledge of the network to correctly answer requests.
To prevent unsolicitd advertisements on the shared medium from
being a problem, on such deployments wireless needs to drop them.
Enable this by providing an option called "drop_unsolicited_na".
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv6 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses multiple problems :
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.
Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())
This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a pinet6
pointer.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like the ipv4 patch with a similar title, this adds a sysctl to allow
the user to change routing behavior based on whether or not the
interface associated with the nexthop was an up or down link. The
default setting preserves the current behavior, but anyone that enables
it will notice that nexthops on down interfaces will no longer be
selected:
net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
...
When the above sysctls are set, not only will link status be reported to
userspace, but an indication that a nexthop is dead and will not be used
is also reported.
1000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
1000::/8 via 8000::2 dev p8p1 metric 1024 pref medium
7000::/8 dev p7p1 proto kernel metric 256 dead linkdown pref medium
8000::/8 dev p8p1 proto kernel metric 256 pref medium
9000::/8 via 8000::2 dev p8p1 metric 2048 pref medium
9000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
fe80::/64 dev p7p1 proto kernel metric 256 dead linkdown pref medium
fe80::/64 dev p8p1 proto kernel metric 256 pref medium
This also adds devconf support and notification when sysctl values
change.
v2: drop use of rt6i_nhflags since it is not needed right now
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6fd99094de ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.
RFC 4861, 6.3.4. Processing Received Router Advertisements
A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
and Retrans Timer) may contain a value denoting that it is
unspecified. In such cases, the parameter should be ignored and the
host should continue using whatever value it is already using.
If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.
So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per RFC 6724, section 4, "Candidate Source Addresses":
It is RECOMMENDED that the candidate source addresses be the set
of unicast addresses assigned to the interface that will be used
to send to the destination (the "outgoing" interface).
Add a sysctl to enable this behaviour.
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the procfs logic for the stable_address knob:
The secret is formatted as an ipv6 address and will be stored per
interface and per namespace. We track initialized flag and return EIO
errors until the secret is set.
We don't inherit the secret to newly created namespaces.
Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull IPv6 cork initialization into its own function that
can be re-used. IPv6 specific cork data did not have an
explicit data structure. This patch creats eone so that
just ipv6 cork data can be as arguemts. Also, since
IPv6 tries to save the flow label into inet_cork_full
tructure, pass the full cork.
Adjust ip6_cork_release() to take cork data structures.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is only used in net/ipv6/inet6_hashtables.c.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a sysctl that causes an interface's optimistic addresses
to be considered equivalent to other non-deprecated addresses
for source address selection purposes. Preferred addresses
will still take precedence over optimistic addresses, subject
to other ranking in the source address selection algorithm.
This is useful where different interfaces are connected to
different networks from different ISPs (e.g., a cell network
and a home wifi network).
The current behaviour complies with RFC 3484/6724, and it
makes sense if the host has only one interface, or has
multiple interfaces on the same network (same or cooperating
administrative domain(s), but not in the multiple distinct
networks case.
For example, if a mobile device has an IPv6 address on an LTE
network and then connects to IPv6-enabled wifi, while the wifi
IPv6 address is undergoing DAD, IPv6 connections will try use
the wifi default route with the LTE IPv6 address, and will get
stuck until they time out.
Also, because optimistic nodes can receive frames, issue
an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
flag appropriately set). A second RTM_NEWADDR is sent if DAD
completes (the address flags have changed), otherwise an
RTM_DELADDR is sent.
Also: add an entry in ip-sysctl.txt for optimistic_dad.
Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.
Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.
By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.
It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.
Performance impact:
Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.
Automatic flow labels disabled:
TCP_RR:
86.53% CPU utilization
127/195/322 90/95/99% latencies
1.40498e+06 tps
UDP_RR:
90.70% CPU utilization
118/168/243 90/95/99% latencies
1.50309e+06 tps
Automatic flow labels enabled:
TCP_RR:
85.90% CPU utilization
128/199/337 90/95/99% latencies
1.40051e+06
UDP_RR
92.61% CPU utilization
115/164/236 90/95/99% latencies
1.4687e+06
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>