linux-uconsole/mm
Johannes Weiner f26bff271a mm: vmscan: clear kswapd's special reclaim powers before exiting
commit 71abdc15ad upstream.

When kswapd exits, it can end up taking locks that were previously held
by allocating tasks while they waited for reclaim.  Lockdep currently
warns about this:

On Wed, May 28, 2014 at 06:06:34PM +0800, Gu Zheng wrote:
>  inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
>  kswapd2/1151 [HC0[0]:SC0[0]:HE1:SE1] takes:
>   (&sig->group_rwsem){+++++?}, at: exit_signals+0x24/0x130
>  {RECLAIM_FS-ON-W} state was registered at:
>     mark_held_locks+0xb9/0x140
>     lockdep_trace_alloc+0x7a/0xe0
>     kmem_cache_alloc_trace+0x37/0x240
>     flex_array_alloc+0x99/0x1a0
>     cgroup_attach_task+0x63/0x430
>     attach_task_by_pid+0x210/0x280
>     cgroup_procs_write+0x16/0x20
>     cgroup_file_write+0x120/0x2c0
>     vfs_write+0xc0/0x1f0
>     SyS_write+0x4c/0xa0
>     tracesys+0xdd/0xe2
>  irq event stamp: 49
>  hardirqs last  enabled at (49):  _raw_spin_unlock_irqrestore+0x36/0x70
>  hardirqs last disabled at (48):  _raw_spin_lock_irqsave+0x2b/0xa0
>  softirqs last  enabled at (0):  copy_process.part.24+0x627/0x15f0
>  softirqs last disabled at (0):            (null)
>
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
>
>         CPU0
>         ----
>    lock(&sig->group_rwsem);
>    <Interrupt>
>      lock(&sig->group_rwsem);
>
>   *** DEADLOCK ***
>
>  no locks held by kswapd2/1151.
>
>  stack backtrace:
>  CPU: 30 PID: 1151 Comm: kswapd2 Not tainted 3.10.39+ #4
>  Call Trace:
>    dump_stack+0x19/0x1b
>    print_usage_bug+0x1f7/0x208
>    mark_lock+0x21d/0x2a0
>    __lock_acquire+0x52a/0xb60
>    lock_acquire+0xa2/0x140
>    down_read+0x51/0xa0
>    exit_signals+0x24/0x130
>    do_exit+0xb5/0xa50
>    kthread+0xdb/0x100
>    ret_from_fork+0x7c/0xb0

This is because the kswapd thread is still marked as a reclaimer at the
time of exit.  But because it is exiting, nobody is actually waiting on
it to make reclaim progress anymore, and it's nothing but a regular
thread at this point.  Be tidy and strip it of all its powers
(PF_MEMALLOC, PF_SWAPWRITE, PF_KSWAPD, and the lockdep reclaim state)
before returning from the thread function.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:09:42 -07:00
..
backing-dev.c bdi: avoid oops on device removal 2014-04-26 17:15:35 -07:00
balloon_compaction.c mm: introduce a common interface for balloon pages mobility 2012-12-11 17:22:26 -08:00
bootmem.c mm: Add alloc_bootmem_low_pages_nopanic() 2013-01-29 19:32:59 -08:00
bounce.c mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored 2013-10-13 16:08:33 -07:00
cleancache.c mm: cleancache: clean up cleancache_enabled 2013-04-30 17:04:01 -07:00
compaction.c mm/compaction: make isolate_freepages start at pageblock boundary 2014-06-16 13:42:53 -07:00
debug-pagealloc.c
dmapool.c dmapool: make DMAPOOL_DEBUG detect corruption of free marker 2012-12-11 17:22:24 -08:00
fadvise.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
failslab.c
filemap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
filemap_xip.c lift sb_start_write() out of ->write() 2013-04-09 14:12:56 -04:00
fremap.c mm: fix use-after-free in sys_remap_file_pages 2014-01-09 12:24:24 -08:00
frontswap.c frontswap: fix incorrect zeroing and allocation size for frontswap_map 2013-06-12 16:29:46 -07:00
highmem.c Some nice cleanups, and even a patch my wife did as a "live" demo for 2012-12-20 08:37:05 -08:00
huge_memory.c thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only 2014-01-25 08:27:12 -08:00
hugetlb.c mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages() 2014-05-30 21:52:12 -07:00
hugetlb_cgroup.c mm/hugetlb: create hugetlb cgroup file in hugetlb_init 2012-12-18 15:02:15 -08:00
hwpoison-inject.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
init-mm.c
internal.h mm: accelerate munlock() treatment of THP pages 2013-02-27 19:10:09 -08:00
interval_tree.c mm: add CONFIG_DEBUG_VM_RB build option 2012-10-09 16:22:42 +09:00
Kconfig mmKconfig: add an option to disable bounce 2013-04-29 15:54:40 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ksm.c mm: close PageTail race 2014-04-03 12:01:05 -07:00
maccess.c
madvise.c mm: madvise: complete input validation before taking lock 2013-04-29 15:54:37 -07:00
Makefile memcg: add memory.pressure_level events 2013-04-29 15:54:38 -07:00
memblock.c memblock: fix missing comment of memblock_insert_region() 2013-04-29 15:54:38 -07:00
memcontrol.c memcg: reparent charges of children before processing parent 2014-03-23 21:38:20 -07:00
memory-failure.c mm/memory-failure.c: don't let collect_procs() skip over processes for MF_ACTION_REQUIRED 2014-06-30 20:09:42 -07:00
memory.c mm: make fixup_user_fault() check the vma access rights too 2014-06-07 13:25:29 -07:00
memory_hotplug.c mm/memory_hotplug.c: fix printk format warnings 2013-05-24 16:22:52 -07:00
mempolicy.c mm/mempolicy.c: fix mempolicy printing in numa_maps 2014-02-06 11:08:12 -08:00
mempool.c mempool: add @gfp_mask to mempool_create_node() 2012-06-25 11:53:47 +02:00
migrate.c mm: numa: avoid unnecessary work on the failure path 2014-01-09 12:24:23 -08:00
mincore.c swap: make each swap partition have one address_space 2013-02-23 17:50:17 -08:00
mlock.c mm: try_to_unmap_cluster() should lock_page() before mlocking 2014-05-06 07:55:32 -07:00
mm_init.c mm: init: report on last-nid information stored in page->flags 2013-02-23 17:50:18 -08:00
mmap.c mm: ensure get_unmapped_area() returns higher address than mmap_min_addr 2013-12-04 10:56:39 -08:00
mmu_context.c mm: remove old aio use_mm() comment 2013-05-07 18:38:27 -07:00
mmu_notifier.c mm: mmu_notifier: re-fix freed page still mapped in secondary MMU 2013-05-24 16:22:51 -07:00
mmzone.c mm: rename page struct field helpers 2013-02-23 17:50:18 -08:00
mprotect.c mm: fix TLB flush race between migration, and change_protection_range 2014-01-09 12:24:23 -08:00
mremap.c mm, thp: close race between mremap() and split_huge_page() 2014-06-07 13:25:31 -07:00
msync.c
nobootmem.c mm, nobootmem: do memset() after memblock_reserve() 2013-04-29 15:54:39 -07:00
nommu.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-05-01 07:21:43 -07:00
oom_kill.c mm, oom: base root bonus on current usage 2014-02-13 13:48:02 -08:00
page-writeback.c mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() 2014-02-20 11:06:11 -08:00
page_alloc.c mm: close PageTail race 2014-04-03 12:01:05 -07:00
page_cgroup.c memcontrol: use N_MEMORY instead N_HIGH_MEMORY 2012-12-12 17:38:32 -08:00
page_io.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
page_isolation.c mm: fix zone_watermark_ok_safe() accounting of isolated pages 2013-01-04 16:11:46 -08:00
pagewalk.c mm/pagewalk.c: fix walk_page_range() access of wrong PTEs 2013-11-13 12:05:34 +09:00
percpu-km.c
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree() 2014-06-07 13:25:37 -07:00
pgtable-generic.c mm: fix TLB flush race between migration, and change_protection_range 2014-01-09 12:24:23 -08:00
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-12 11:05:45 -07:00
quicklist.c
readahead.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
rmap.c mm: fix sleeping function warning from __put_anon_vma 2014-06-30 20:09:42 -07:00
shmem.c cope with potentially long ->d_dname() output for shmem/hugetlb 2013-10-18 07:45:45 -07:00
slab.c slab: fix init_lock_keys 2013-07-21 18:21:26 -07:00
slab.h memcg: check that kmem_cache has memcg_params before accessing it 2013-09-07 22:09:58 -07:00
slab_common.c slab: prevent warnings when allocating with __GFP_NOWARN 2013-06-13 10:01:58 +03:00
slob.c mm: rename page struct field helpers 2013-02-23 17:50:18 -08:00
slub.c slub: Fix calculation of cpu slabs 2014-02-13 13:48:00 -08:00
sparse-vmemmap.c sparse-vmemmap: specify vmemmap population range in bytes 2013-04-29 15:54:35 -07:00
sparse.c mm, hotplug: avoid compiling memory hotremove functions when disabled 2013-04-29 15:54:37 -07:00
swap.c mm: close PageTail race 2014-04-03 12:01:05 -07:00
swap_state.c swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion 2013-06-12 16:29:45 -07:00
swapfile.c frontswap: fix incorrect zeroing and allocation size for frontswap_map 2013-06-12 16:29:46 -07:00
truncate.c mm: drop vmtruncate 2012-12-20 18:46:29 -05:00
util.c swap: make each swap partition have one address_space 2013-02-23 17:50:17 -08:00
vmalloc.c mm/vmalloc.c: fix an overflow bug in alloc_vmap_area() 2013-11-13 12:05:34 +09:00
vmpressure.c memcg: add memory.pressure_level events 2013-04-29 15:54:38 -07:00
vmscan.c mm: vmscan: clear kswapd's special reclaim powers before exiting 2014-06-30 20:09:42 -07:00
vmstat.c mm: numa: return the number of base pages altered by protection changes 2013-12-08 07:29:27 -08:00