linux-uconsole/drivers
David Hildenbrand d98d053efa mm/memory_hotplug: fix try_offline_node()
commit 2c91f8fc6c upstream.

-- snip --

Only contextual issues:
- Unrelated check_and_unmap_cpu_on_node() changes are missing.
- Unrelated walk_memory_blocks() has not been moved/refactored yet.

-- snip --

try_offline_node() is pretty much broken right now:

 - The node span is updated when onlining memory, not when adding it. We
   ignore memory that was mever onlined. Bad.

 - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
   trigger a kernel panic. Bad for memory that is offline but also bad
   for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
   first PFN of a section might contain garbage.

 - Sections belonging to mixed nodes are not properly considered.

As memory blocks might belong to multiple nodes, we would have to walk
all pageblocks (or at least subsections) within present sections.
However, we don't have a way to identify whether a memmap that is not
online was initialized (relevant for ZONE_DEVICE).  This makes things
more complicated.

Luckily, we can piggy pack on the node span and the nid stored in memory
blocks.  Currently, the node span is grown when calling
move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
removing memory, before calling try_offline_node().  Sysfs links are
created via link_mem_sections(), e.g., during boot or when adding
memory.

If the node still spans memory or if any memory block belongs to the
nid, we don't set the node offline.  As memory blocks that span multiple
nodes cannot get offlined, the nid stored in memory blocks is reliable
enough (for such online memory blocks, the node still spans the memory).

Introduce for_each_memory_block() to efficiently walk all memory blocks.

Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
when removing ZONE_DEVICE memory to fix similar issues (access of
garbage memmaps) - until we have a reliable way to identify whether
these memmaps were properly initialized.  This implies later, that once
a node had ZONE_DEVICE memory, we won't be able to set a node offline -
which should be acceptable.

Since commit f1dd2cd13c ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") memory that is added is not
assoziated with a zone/node (memmap not initialized).  The introducing
commit 60a5a19e74 ("memory-hotplug: remove sysfs file of node")
already missed that we could have multiple nodes for a section and that
the zone/node span is updated when onlining pages, not when adding them.

I tested this by hotplugging two DIMMs to a memory-less and cpu-less
NUMA node.  The node is properly onlined when adding the DIMMs.  When
removing the DIMMs, the node is properly offlined.

Masayoshi Mizuma reported:

: Without this patch, memory hotplug fails as panic:
:
:  BUG: kernel NULL pointer dereference, address: 0000000000000000
:  ...
:  Call Trace:
:   remove_memory_block_devices+0x81/0xc0
:   try_remove_memory+0xb4/0x130
:   __remove_memory+0xa/0x20
:   acpi_memory_device_remove+0x84/0x100
:   acpi_bus_trim+0x57/0x90
:   acpi_bus_trim+0x2e/0x90
:   acpi_device_hotplug+0x2b2/0x4d0
:   acpi_hotplug_work_fn+0x1a/0x30
:   process_one_work+0x171/0x380
:   worker_thread+0x49/0x3f0
:   kthread+0xf8/0x130
:   ret_from_fork+0x35/0x40

[david@redhat.com: v3]
  Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
Fixes: 60a5a19e74 ("memory-hotplug: remove sysfs file of node")
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29 16:43:27 +01:00
..
accessibility
acpi mm/memory_hotplug: make remove_memory() take the device_hotplug_lock 2020-01-29 16:43:24 +01:00
amba
android binder: Handle start==NULL in binder_update_page_range() 2019-12-13 08:52:52 +01:00
ata ahci: Do not export local variable ahci_em_messages 2020-01-27 14:51:07 +01:00
atm firestream: fix memory leaks 2020-01-29 16:43:14 +01:00
auxdisplay
base mm/memory_hotplug: fix try_offline_node() 2020-01-29 16:43:27 +01:00
bcma bcma: fix incorrect update of BCMA_CORE_PCI_MDIO_DATA 2020-01-27 14:51:09 +01:00
block signal: Allow cifs and drbd to receive their terminating signals 2020-01-27 14:51:05 +01:00
bluetooth Bluetooth: btusb: fix PM leak in error case of setup 2020-01-09 10:19:04 +01:00
bus bus: ti-sysc: Fix sysc_unprepare() when no clocks have been allocated 2020-01-27 14:50:36 +01:00
cdrom cdrom: respect device capabilities during opening action 2020-01-04 19:13:12 +01:00
char hwrng: omap3-rom - Fix missing clock by probing with device tree 2020-01-27 14:51:20 +01:00
clk clk: actions: Fix factor clk struct member access 2020-01-27 14:51:14 +01:00
clocksource clocksource/drivers/exynos_mct: Fix error path in timer resources initialization 2020-01-27 14:50:27 +01:00
connector
cpufreq cpufreq: brcmstb-avs-cpufreq: Fix types for voltage/frequency 2020-01-27 14:50:53 +01:00
cpuidle cpuidle: Do not unset the driver if it is there already 2019-12-17 20:35:00 +01:00
crypto crypto: geode-aes - switch to skcipher for cbc(aes) fallback 2020-01-29 16:43:23 +01:00
dax
dca
devfreq PM / devfreq: Check NULL governor in available_governors_show 2020-01-09 10:19:03 +01:00
dio
dma dmaengine: ti: edma: fix missed failure handling 2020-01-27 14:51:22 +01:00
dma-buf dma-buf: Fix memory leak in sync_file_merge() 2019-12-21 10:57:38 +01:00
edac EDAC/mc: Fix edac_mc_find() in case no device is found 2020-01-27 14:50:48 +01:00
eisa
extcon extcon: sm5502: Reset registers during initialization 2019-12-31 16:35:11 +01:00
firewire net: add annotations on hh->hh_len lockless accesses 2020-01-09 10:19:09 +01:00
firmware firmware: dmi: Fix unlikely out-of-bounds read in save_mem_devices 2020-01-27 14:51:19 +01:00
fmc
fpga
fsi fsi: sbefifo: Don't fail operations when in SBE IPL state 2020-01-27 14:51:00 +01:00
gnss
gpio gpio/aspeed: Fix incorrect number of banks 2020-01-27 14:51:13 +01:00
gpu drm/radeon: fix bad DMA from INTERRUPT_CNTL2 2020-01-27 14:51:22 +01:00
hid HID: hidraw, uhid: Always report EPOLLOUT 2020-01-17 19:46:55 +01:00
hsi
hv vmbus: keep pointer to ring buffer page 2019-11-20 18:47:31 +01:00
hwmon hwmon: (nct7802) Fix voltage limits to wrong registers 2020-01-29 16:43:21 +01:00
hwspinlock
hwtracing coresight: tmc-etf: Do not call smp_processor_id from preemptible 2020-01-29 16:43:23 +01:00
i2c i2c: stm32f7: report dma error during probe 2020-01-27 14:51:21 +01:00
ide
idle
iio iio: dac: ad5380: fix incorrect assignment to val 2020-01-27 14:51:09 +01:00
infiniband scsi: RDMA/isert: Fix a recently introduced regression related to logout 2020-01-29 16:43:21 +01:00
input Input: sun4i-ts - add a check for devm_thermal_zone_of_sensor_register 2020-01-29 16:43:20 +01:00
iommu iommu/amd: Wait for completion of IOTLB flush in attach_device 2020-01-27 14:51:14 +01:00
ipack
irqchip irqchip: Place CONFIG_SIFIVE_PLIC into the menu 2020-01-23 08:21:36 +01:00
isdn staging: gigaset: add endpoint-type sanity check 2019-12-17 20:34:33 +01:00
leds led: triggers: Fix dereferencing of null pointer 2020-01-27 14:51:10 +01:00
lightnvm lightnvm: pblk: fix lock order in pblk_rb_tear_down_check 2020-01-27 14:50:45 +01:00
macintosh macintosh/windfarm_smu_sat: Fix debug output 2019-12-01 09:16:37 +01:00
mailbox mailbox: qcom-apcs: fix max_register value 2020-01-27 14:51:14 +01:00
mcb
md bcache: Fix an error code in bch_dump_read() 2020-01-27 14:51:09 +01:00
media media: v4l2-ioctl.c: zero reserved fields for S/TRY_FMT 2020-01-29 16:43:24 +01:00
memory memory: tegra: Don't invoke Tegra30+ specific memory timing setup on Tegra20 2020-01-27 14:50:13 +01:00
memstick
message scsi: mptfusion: Fix double fetch bug in ioctl 2020-01-23 08:21:28 +01:00
mfd mfd: intel-lpss: Release IDA resources 2020-01-27 14:50:59 +01:00
misc mic: avoid statically declaring a 'struct device'. 2020-01-27 14:51:02 +01:00
mmc mmc: sdhci: fix minimum clock rate for v3 controller 2020-01-29 16:43:19 +01:00
mtd mtd: devices: fix mchp23k256 read and write 2020-01-23 08:21:37 +01:00
mux
net libertas: Fix two buffer overflows at parsing bss descriptor 2020-01-29 16:43:24 +01:00
nfc NFC: pn533: fix bulk-message timeout 2020-01-23 08:21:34 +01:00
ntb ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev() 2020-01-27 14:50:55 +01:00
nubus
nvdimm libnvdimm/btt: fix variable 'rc' set but not used 2020-01-04 19:13:00 +01:00
nvme nvme: retain split access workaround for capability reads 2020-01-27 14:51:16 +01:00
nvmem nvmem: imx-ocotp: Change TIMING calculation to u-boot algorithm 2020-01-27 14:50:58 +01:00
of of: mdio: Fix a signedness bug in of_phy_get_and_connect() 2020-01-27 14:51:15 +01:00
opp OPP: Fix missing debugfs supply directory for OPPs 2020-01-27 14:50:04 +01:00
oprofile
parisc
parport parport: load lowlevel driver if ports not found 2019-12-31 16:36:01 +01:00
pci PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken 2020-01-29 16:43:18 +01:00
pcmcia
perf
phy phy: usb: phy-brcm-usb: Remove sysfs attributes upon driver removal 2020-01-27 14:50:54 +01:00
pinctrl pinctrl: iproc-gpio: Fix incorrect pinconf configurations 2020-01-27 14:51:12 +01:00
platform MIPS: Loongson: Fix return value of loongson_hwmon_init 2020-01-27 14:51:21 +01:00
pnp
power power: supply: Init device wakeup after device_add() 2020-01-27 14:51:08 +01:00
powercap
pps
ps3
ptp ptp: free ptp device pin descriptors properly 2020-01-23 08:21:35 +01:00
pwm pwm: meson: Don't disable PWM when setting duty repeatedly 2020-01-27 14:50:47 +01:00
rapidio drivers/rapidio/rio_cm.c: fix potential oops in riocm_ch_listen() 2020-01-27 14:50:31 +01:00
ras
regulator regulator: tps65086: Fix tps65086_ldoa1_ranges for selector 0xB 2020-01-27 14:50:33 +01:00
remoteproc remoteproc: qcom: q6v5-mss: Add missing regulator for MSM8996 2020-01-27 14:50:10 +01:00
reset reset: Fix memory leak in reset_control_array_put() 2019-12-05 09:19:36 +01:00
rpmsg rpmsg: glink: Free pending deferred work on remove 2019-12-21 10:57:30 +01:00
rtc rtc: pcf2127: bugfix: read rtc disables watchdog 2020-01-27 14:51:07 +01:00
s390 s390/qeth: Fix initialization of vnicc cmd masks during set online 2020-01-27 14:51:18 +01:00
sbus
scsi scsi: iscsi: Avoid potential deadlock in iscsi_if_rx func 2020-01-29 16:43:24 +01:00
sfi
sh
siox
slimbus slimbus: ngd: Fix build error on x86 2019-12-13 08:51:54 +01:00
sn
soc soc: amlogic: meson-gx-pwrc-vpu: Fix power on/off register bitmask 2020-01-27 14:50:39 +01:00
soundwire soundwire: intel: fix PDI/stream mapping for Bulk 2019-12-31 16:35:55 +01:00
spi spi: bcm-qspi: Fix BSPI QUAD and DUAL mode support when using flex mode 2020-01-27 14:51:03 +01:00
spmi
ssb
staging staging: greybus: light: fix a couple double frees 2020-01-27 14:51:08 +01:00
target scsi: RDMA/isert: Fix a recently introduced regression related to logout 2020-01-29 16:43:21 +01:00
tc
tee tee: optee: add missing of_node_put after of_device_is_available 2019-11-24 08:19:08 +01:00
thermal thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power 2020-01-27 14:50:48 +01:00
thunderbolt thunderbolt: Power cycle the router if NVM authentication fails 2019-12-05 09:21:27 +01:00
tty serial: stm32: fix clearing interrupt error flags 2020-01-27 14:51:22 +01:00
uio driver: uio: fix possible use-after-free in __uio_register_device 2020-01-27 14:50:17 +01:00
usb usb: dwc3: Allow building USB_DWC3_QCOM without EXTCON 2020-01-27 14:51:22 +01:00
uwb
vfio vfio/mdev: Fix aborting mdev child device removal if one fails 2020-01-27 14:50:46 +01:00
vhost vhost/test: stop device before reset 2020-01-27 14:51:19 +01:00
video backlight: pwm_bl: Fix heuristic to determine number of brightness levels 2020-01-27 14:50:58 +01:00
virt
virtio virtio-balloon: fix managed page counts when migrating pages between zones 2019-12-17 20:34:43 +01:00
visorbus
vlynq
vme
w1 w1: IAD Register is yet readable trough iad sys file. Fix snprintf (%u for unsigned, count for max size). 2019-12-01 09:16:22 +01:00
watchdog watchdog: rtd119x_wdt: Fix remove function 2020-01-27 14:50:45 +01:00
xen net: add {READ|WRITE}_ONCE() annotations on ->rskq_accept_head 2020-01-27 14:51:18 +01:00
zorro
Kconfig
Makefile