linux-uconsole/mm
David Hildenbrand 86834898d5 mm/memory_hotplug: shrink zones when offlining memory
commit feee6b2989 upstream.

-- snip --

- Missing arm64 hot(un)plug support
- Missing some vmem_altmap_offset() cleanups
- Missing sub-section hotadd support
- Missing unification of mm/hmm.c and kernel/memremap.c

-- snip --

We currently try to shrink a single zone when removing memory.  We use
the zone of the first page of the memory we are removing.  If that
memmap was never initialized (e.g., memory was never onlined), we will
read garbage and can trigger kernel BUGs (due to a stale pointer):

    BUG: unable to handle page fault for address: 000000000000353d
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 0 P4D 0
    Oops: 0002 [#1] SMP PTI
    CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
    Workqueue: kacpi_hotplug acpi_hotplug_work_fn
    RIP: 0010:clear_zone_contiguous+0x5/0x10
    Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
    RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
    RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
    RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
    R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
    FS:  0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     __remove_pages+0x4b/0x640
     arch_remove_memory+0x63/0x8d
     try_remove_memory+0xdb/0x130
     __remove_memory+0xa/0x11
     acpi_memory_device_remove+0x70/0x100
     acpi_bus_trim+0x55/0x90
     acpi_device_hotplug+0x227/0x3a0
     acpi_hotplug_work_fn+0x1a/0x30
     process_one_work+0x221/0x550
     worker_thread+0x50/0x3b0
     kthread+0x105/0x140
     ret_from_fork+0x3a/0x50
    Modules linked in:
    CR2: 000000000000353d

Instead, shrink the zones when offlining memory or when onlining failed.
Introduce and use remove_pfn_range_from_zone(() for that.  We now
properly shrink the zones, even if we have DIMMs whereby

 - Some memory blocks fall into no zone (never onlined)

 - Some memory blocks fall into multiple zones (offlined+re-onlined)

 - Multiple memory blocks that fall into different zones

Drop the zone parameter (with a potential dubious value) from
__remove_pages() and __remove_section().

Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e86b]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: <stable@vger.kernel.org>	[5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29 16:43:27 +01:00
..
kasan kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN 2018-08-17 16:20:30 -07:00
backing-dev.c writeback: synchronize sync(2) against cgroup writeback membership switches 2019-03-05 17:58:50 +01:00
balloon_compaction.c virtio_balloon: fix deadlock on OOM 2017-11-14 23:57:38 +02:00
bootmem.c docs/mm: bootmem: add overview documentation 2018-08-02 12:17:27 -06:00
cleancache.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-08-06 19:06:51 +02:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-06-15 11:54:01 +02:00
compaction.c mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone 2019-10-05 13:10:13 +02:00
debug.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dmapool.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2017-12-11 14:54:44 +01:00
fadvise.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
failslab.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
filemap.c mm/filemap.c: don't initiate writeback if mapping has no dirty pages 2019-11-12 19:21:20 +01:00
frame_vector.c mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' 2017-12-14 16:00:48 -08:00
frontswap.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
gup.c mm/gup.c: remove some BUG_ONs from get_gate_page() 2019-07-31 07:27:08 +02:00
gup_benchmark.c mm/gup_benchmark.c: prevent integer overflow in ioctl 2019-12-01 09:17:07 +01:00
highmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hmm.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
huge_memory.c mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment 2020-01-23 08:21:32 +01:00
hugetlb.c hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic() 2019-10-29 09:19:59 +01:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-20 18:45:20 +01:00
hwpoison-inject.c mm/memory_failure: Remove unused trapno from memory_failure 2018-01-23 12:17:42 -06:00
init-mm.c mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids 2018-07-17 09:35:30 +02:00
internal.h vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n 2019-12-05 09:20:57 +01:00
interval_tree.c mm/interval_tree.c: use vma_pages() helper 2018-01-31 17:18:37 -08:00
Kconfig mm/hmm: select mmu notifier when selecting HMM 2019-06-15 11:54:00 +02:00
Kconfig.debug mm: clarify CONFIG_PAGE_POISONING and usage 2018-08-22 10:52:44 -07:00
khugepaged.c coredump: fix race condition between collapse_huge_page() and core dumping 2019-06-22 08:15:21 +02:00
kmemleak-test.c
kmemleak.c Revert "kmemleak: allow to coexist with fault injection" 2019-08-25 10:47:58 +02:00
ksm.c mm/ksm.c: don't WARN if page is still mapped in remove_stable_node() 2019-12-01 09:16:11 +01:00
list_lru.c mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node 2019-06-19 08:17:59 +02:00
maccess.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
madvise.c mm: madvise(MADV_DODUMP): allow hugetlbfs pages 2018-10-05 16:32:05 -07:00
Makefile vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
memblock.c mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free() 2019-12-05 09:20:58 +01:00
memcontrol.c mm: handle no memcg case in memcg_kmem_charge() properly 2019-12-01 09:17:14 +01:00
memfd.c memfd: Use radix_tree_deref_slot_protected to avoid the warning. 2019-11-20 18:47:53 +01:00
memory-failure.c mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once 2019-10-29 09:19:59 +01:00
memory.c mm, thp, proc: report THP eligibility for each vma 2019-12-17 20:35:45 +01:00
memory_hotplug.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
mempolicy.c mm: mempolicy: fix the wrong return value and potential pages leak of mbind 2019-11-20 18:45:19 +01:00
mempool.c mm/mempool.c: add missing parameter description 2018-08-22 10:52:44 -07:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c mm: move_pages: return valid node id in status if the page is already on the target node 2020-01-09 10:19:00 +01:00
mincore.c mm/mincore.c: make mincore() more conservative 2019-05-22 07:37:40 +02:00
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-07-10 09:53:40 +02:00
mm_init.c mm: access zone->node via zone_to_nid() and zone_set_nid() 2018-08-22 10:52:45 -07:00
mmap.c arm64: Revert support for execute-only user mappings 2020-01-09 10:19:03 +01:00
mmu_context.c
mmu_notifier.c mm/mmu_notifier: use hlist_add_head_rcu() 2019-07-31 07:27:08 +02:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings 2018-06-20 19:10:01 +02:00
mremap.c mremap: properly flush TLB before releasing the page 2018-10-18 11:30:52 +02:00
msync.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nobootmem.c mm/memblock: add a name for memblock flags enumeration 2018-08-02 12:17:27 -06:00
nommu.c mm: use down_read_killable for locking mmap_sem in access_remote_vm 2019-07-31 07:27:09 +02:00
oom_kill.c memcg, oom: don't require __GFP_FS when invoking memcg OOM killer 2019-10-05 13:10:07 +02:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:21:31 +01:00
page_alloc.c mm/page_alloc.c: use a single function to free page 2019-12-05 09:20:58 +01:00
page_counter.c memcg: introduce memory.min 2018-06-07 17:34:36 -07:00
page_ext.c mm/page_ext.c: fix an imbalance with kmemleak 2019-04-05 22:32:58 +02:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-07-03 13:14:45 +02:00
page_io.c mm/page_io.c: do not free shared swap slots 2019-12-01 09:17:35 +01:00
page_isolation.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
page_owner.c mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo 2019-10-29 09:19:58 +01:00
page_poison.c page_poison: play nicely with KASAN 2019-04-05 22:32:59 +02:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-11-13 11:08:46 -08:00
pagewalk.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
percpu-internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
percpu-km.c percpu: convert spin_lock_irq to spin_lock_irqsave. 2019-02-12 19:47:12 +01:00
percpu-stats.c treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
percpu-vm.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu.c percpu: do not search past bitmap when allocating an area 2019-06-15 11:54:11 +02:00
pgtable-generic.c mm: do not lose dirty and accessed bits in pmdp_invalidate() 2018-01-31 17:18:38 -08:00
process_vm_access.c mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors 2018-02-06 18:32:48 -08:00
quicklist.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readahead.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
rmap.c mm/hmm: fix bad subpage pointer in try_to_unmap_one 2019-08-25 10:47:43 +02:00
rodata_test.c mm: fix RODATA_TEST failure "rodata_test: test data was not read only" 2017-10-03 17:54:24 -07:00
shmem.c mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment 2020-01-23 08:21:30 +01:00
slab.c mm/slab.c: fix an infinite loop in leaks_show() 2019-06-15 11:54:01 +02:00
slab.h mm: add support for kmem caches in DMA32 zone 2019-04-03 06:26:28 +02:00
slab_common.c mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid 2020-01-23 08:21:30 +01:00
slob.c slab: __GFP_ZERO is incompatible with a constructor 2018-06-07 17:34:34 -07:00
slub.c mm/slub: fix a deadlock in show_slab_objects() 2019-10-29 09:19:58 +01:00
sparse-vmemmap.c mm/sparse: delete old sparse_init and enable new one 2018-08-17 16:20:32 -07:00
sparse.c mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section 2020-01-29 16:43:26 +01:00
swap.c mm/swap: fix release_pages() when releasing devmap pages 2019-07-31 07:27:03 +02:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c treewide: kvzalloc() -> kvcalloc() 2018-06-12 16:19:22 -07:00
swapfile.c mm, swap: bounds check swap_info array accesses to avoid NULL derefs 2019-04-05 22:32:58 +02:00
truncate.c mm: cleancache: fix corruption on missed inode invalidation 2018-12-05 19:32:13 +01:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-10-11 18:20:58 +02:00
userfaultfd.c hugetlb: use same fault hash key for shared and private mappings 2019-05-22 07:37:40 +02:00
util.c mm: page_mapped: don't assume compound page is huge or THP 2019-01-16 22:04:36 +01:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() 2019-08-16 10:12:40 +02:00
vmpressure.c mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() 2019-10-17 13:45:19 -07:00
vmscan.c mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker 2019-08-06 19:06:54 +02:00
vmstat.c mm/vmstat.c: fix NUMA statistics updates 2019-12-13 08:51:27 +01:00
workingset.c mm/list_lru: introduce list_lru_shrink_walk_irq() 2018-08-17 16:20:32 -07:00
z3fold.c z3fold: fix possible reclaim races 2018-12-01 09:37:33 +01:00
zbud.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
zpool.c mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc 2018-02-21 15:35:43 -08:00
zsmalloc.c mm/zsmalloc.c: fix the migrated zspage statistics. 2020-01-09 10:19:00 +01:00
zswap.c zswap: re-check zswap_is_full() after do zswap_shrink() 2018-07-26 19:38:03 -07:00