linux-uconsole/drivers/md
Guoqing Jiang 5d8a64e6f2 md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit
[ Upstream commit fadcbd2901 ]

We need to move "spin_lock_irq(&bitmap->counts.lock)" before unmap previous
storage, otherwise panic like belows could happen as follows.

[  902.353802] sdl: detected capacity change from 1077936128 to 3221225472
[  902.616948] general protection fault: 0000 [#1] SMP
[snip]
[  902.618588] CPU: 12 PID: 33698 Comm: md0_raid1 Tainted: G           O    4.14.144-1-pserver #4.14.144-1.1~deb10
[  902.618870] Hardware name: Supermicro SBA-7142G-T4/BHQGE, BIOS 3.00       10/24/2012
[  902.619120] task: ffff9ae1860fc600 task.stack: ffffb52e4c704000
[  902.619301] RIP: 0010:bitmap_file_clear_bit+0x90/0xd0 [md_mod]
[  902.619464] RSP: 0018:ffffb52e4c707d28 EFLAGS: 00010087
[  902.619626] RAX: ffe8008b0d061000 RBX: ffff9ad078c87300 RCX: 0000000000000000
[  902.619792] RDX: ffff9ad986341868 RSI: 0000000000000803 RDI: ffff9ad078c87300
[  902.619986] RBP: ffff9ad0ed7a8000 R08: 0000000000000000 R09: 0000000000000000
[  902.620154] R10: ffffb52e4c707ec0 R11: ffff9ad987d1ed44 R12: ffff9ad0ed7a8360
[  902.620320] R13: 0000000000000003 R14: 0000000000060000 R15: 0000000000000800
[  902.620487] FS:  0000000000000000(0000) GS:ffff9ad987d00000(0000) knlGS:0000000000000000
[  902.620738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  902.620901] CR2: 000055ff12aecec0 CR3: 0000001005207000 CR4: 00000000000406e0
[  902.621068] Call Trace:
[  902.621256]  bitmap_daemon_work+0x2dd/0x360 [md_mod]
[  902.621429]  ? find_pers+0x70/0x70 [md_mod]
[  902.621597]  md_check_recovery+0x51/0x540 [md_mod]
[  902.621762]  raid1d+0x5c/0xeb0 [raid1]
[  902.621939]  ? try_to_del_timer_sync+0x4d/0x80
[  902.622102]  ? del_timer_sync+0x35/0x40
[  902.622265]  ? schedule_timeout+0x177/0x360
[  902.622453]  ? call_timer_fn+0x130/0x130
[  902.622623]  ? find_pers+0x70/0x70 [md_mod]
[  902.622794]  ? md_thread+0x94/0x150 [md_mod]
[  902.622959]  md_thread+0x94/0x150 [md_mod]
[  902.623121]  ? wait_woken+0x80/0x80
[  902.623280]  kthread+0x119/0x130
[  902.623437]  ? kthread_create_on_node+0x60/0x60
[  902.623600]  ret_from_fork+0x22/0x40
[  902.624225] RIP: bitmap_file_clear_bit+0x90/0xd0 [md_mod] RSP: ffffb52e4c707d28

Because mdadm was running on another cpu to do resize, so bitmap_resize was
called to replace bitmap as below shows.

PID: 38801  TASK: ffff9ad074a90e00  CPU: 0   COMMAND: "mdadm"
   [exception RIP: queued_spin_lock_slowpath+56]
   [snip]
-- <NMI exception stack> --
 #5 [ffffb52e60f17c58] queued_spin_lock_slowpath at ffffffff9c0b27b8
 #6 [ffffb52e60f17c58] bitmap_resize at ffffffffc0399877 [md_mod]
 #7 [ffffb52e60f17d30] raid1_resize at ffffffffc0285bf9 [raid1]
 #8 [ffffb52e60f17d50] update_size at ffffffffc038a31a [md_mod]
 #9 [ffffb52e60f17d70] md_ioctl at ffffffffc0395ca4 [md_mod]

And the procedure to keep resize bitmap safe is allocate new storage
space, then quiesce, copy bits, replace bitmap, and re-start.

However the daemon (bitmap_daemon_work) could happen even the array is
quiesced, which means when bitmap_file_clear_bit is triggered by raid1d,
then it thinks it should be fine to access store->filemap since
counts->lock is held, but resize could change the storage without the
protection of the lock.

Cc: Jack Wang <jinpu.wang@cloud.ionos.com>
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31 16:35:25 +01:00
..
bcache bcache: do not mark writeback_running too early 2019-12-05 09:20:07 +01:00
persistent-data dm btree: increase rebalance threshold in __rebalance2() 2019-12-21 10:57:41 +01:00
dm-bio-prison-v1.c
dm-bio-prison-v1.h
dm-bio-prison-v2.c
dm-bio-prison-v2.h
dm-bio-record.h
dm-bufio.c Revert "dm bufio: fix deadlock with loop device" 2019-08-29 08:28:49 +02:00
dm-builtin.c
dm-cache-background-tracker.c
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c dm cache metadata: Fix loading discard bitset 2019-05-25 18:23:38 +02:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c dm cache: fix bugs when a GFP_NOWAIT allocation fails 2019-10-29 09:20:03 +01:00
dm-core.h dm: disable DISCARD if the underlying storage no longer supports it 2019-08-25 10:48:01 +02:00
dm-crypt.c dm crypt: move detailed message into debug level 2019-09-16 08:22:14 +02:00
dm-delay.c dm delay: fix a crash when invalid device is specified 2019-05-25 18:23:39 +02:00
dm-era-target.c
dm-exception-store.c
dm-exception-store.h
dm-flakey.c dm flakey: Properly corrupt multi-page bios. 2019-12-05 09:20:36 +01:00
dm-integrity.c dm integrity: fix a crash due to BUG_ON in __journal_read_write() 2019-08-29 08:28:55 +02:00
dm-io.c
dm-ioctl.c dm ioctl: harden copy_params()'s copy_from_user() from malicious users 2018-11-13 11:08:49 -08:00
dm-kcopyd.c dm kcopyd: always complete failed jobs 2019-08-29 08:28:55 +02:00
dm-linear.c dm: Check for device sector overflow if CONFIG_LBDAF is not set 2019-01-26 09:32:42 +01:00
dm-log-userspace-base.c
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c dm log writes: make sure super sector log updates are written in order 2019-07-03 13:14:45 +02:00
dm-log.c
dm-mpath.c dm mpath: remove harmful bio-based optimization 2019-12-21 10:57:40 +01:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid.c dm raid: fix false -EBUSY when handling check/repair message 2019-12-05 09:20:37 +01:00
dm-raid1.c dm: Check for device sector overflow if CONFIG_LBDAF is not set 2019-01-26 09:32:42 +01:00
dm-region-hash.c
dm-round-robin.c
dm-rq.c blk-mq: add callback of .cleanup_rq 2019-10-05 13:10:03 +02:00
dm-rq.h
dm-service-time.c
dm-snap-persistent.c
dm-snap-transient.c
dm-snap.c dm snapshot: rework COW throttling to fix deadlock 2019-11-06 13:05:11 +01:00
dm-stats.c
dm-stats.h
dm-stripe.c
dm-switch.c
dm-sysfs.c
dm-table.c dm table: fix invalid memory accesses with too high sector number 2019-08-29 08:28:56 +02:00
dm-target.c dm mpath: fix missing call of path selector type->end_io 2019-09-16 08:22:12 +02:00
dm-thin-metadata.c dm thin metadata: check if in fail_io mode when setting needs_check 2019-09-16 08:22:21 +02:00
dm-thin-metadata.h dm thin: fix passdown_double_checking_shared_status() 2019-01-31 08:14:38 +01:00
dm-thin.c dm thin: add sanity checks to thin-pool and external snapshot creation 2019-04-05 22:32:59 +02:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c dm: Check for device sector overflow if CONFIG_LBDAF is not set 2019-01-26 09:32:42 +01:00
dm-verity-fec.c
dm-verity-fec.h
dm-verity-target.c dm verity: use message limit for data block corruption message 2019-07-21 09:03:08 +02:00
dm-verity.h
dm-writecache.c dm writecache: handle REQ_FUA 2019-12-17 20:34:52 +01:00
dm-zero.c
dm-zoned-metadata.c dm zoned: reduce overhead of backing device checks 2019-12-17 20:34:53 +01:00
dm-zoned-reclaim.c dm zoned: reduce overhead of backing device checks 2019-12-17 20:34:53 +01:00
dm-zoned-target.c dm zoned: reduce overhead of backing device checks 2019-12-17 20:34:53 +01:00
dm-zoned.h dm zoned: reduce overhead of backing device checks 2019-12-17 20:34:53 +01:00
dm.c dm: disable DISCARD if the underlying storage no longer supports it 2019-08-25 10:48:01 +02:00
dm.h
Kconfig
Makefile
md-bitmap.c md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit 2019-12-31 16:35:25 +01:00
md-bitmap.h
md-cluster.c md-cluster: release RESYNC lock after the last resync message 2018-08-31 17:38:10 -07:00
md-cluster.h
md-faulty.c
md-linear.c md: improve handling of bio with REQ_PREFLUSH in md_flush_request() 2019-12-17 20:34:55 +01:00
md-linear.h
md-multipath.c md: improve handling of bio with REQ_PREFLUSH in md_flush_request() 2019-12-17 20:34:55 +01:00
md-multipath.h
md.c md: improve handling of bio with REQ_PREFLUSH in md_flush_request() 2019-12-17 20:34:55 +01:00
md.h md: improve handling of bio with REQ_PREFLUSH in md_flush_request() 2019-12-17 20:34:55 +01:00
raid0.c md: improve handling of bio with REQ_PREFLUSH in md_flush_request() 2019-12-17 20:34:55 +01:00
raid0.h md/raid0: avoid RAID0 data corruption due to layout confusion. 2019-10-05 13:10:12 +02:00
raid1-10.c
raid1.c md: improve handling of bio with REQ_PREFLUSH in md_flush_request() 2019-12-17 20:34:55 +01:00
raid1.h
raid5-cache.c md/raid5: fix 'out of memory' during raid cache recovery 2019-02-06 17:30:16 +01:00
raid5-log.h md/raid5-cache: disable reshape completely 2018-08-31 17:38:09 -07:00
raid5-ppl.c
raid5.c raid5: need to set STRIPE_HANDLE for batch head 2019-12-17 20:36:00 +01:00
raid5.h
raid10.c md: improve handling of bio with REQ_PREFLUSH in md_flush_request() 2019-12-17 20:34:55 +01:00
raid10.h