linux-uconsole/drivers/md
Guilherme G. Piccoli 869eec8946 md/raid0: Do not bypass blocking queue entered for raid0 bios
-----------------------------------------------------------------
This patch is not on mainline and is meant to 4.19 stable *only*.
After the patch description there's a reasoning about that.
-----------------------------------------------------------------

Commit cd4a4ae468 ("block: don't use blocking queue entered for
recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
split bios bypass the blocking queue entering routine and use the live
non-blocking version. It was a result of an extensive discussion in
a linux-block thread[0], and the purpose of this change was to prevent
a hung task waiting on a reference to drop.

Happens that md raid0 split bios all the time, and more important,
it changes their underlying device to the raid member. After the change
introduced by this flag's usage, we experience various crashes if a raid0
member is removed during a large write. This happens because the bio
reaches the live queue entering function when the queue of the raid0
member is dying.

A simple reproducer of this behavior is presented below:
a) Build kernel v4.19.56-stable with CONFIG_BLK_DEV_THROTTLING=y.

b) Create a raid0 md array with 2 NVMe devices as members, and mount
it with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme1n1/device/device/remove
(whereas nvme1n1 is the 2nd array member)

This will trigger the following warning/oops:

------------[ cut here ]------------
BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:blk_throtl_bio+0x45/0x970
[...]
Call Trace:
 generic_make_request_checks+0x1bf/0x690
 generic_make_request+0x64/0x3f0
 raid0_make_request+0x184/0x620 [raid0]
 ? raid0_make_request+0x184/0x620 [raid0]
 md_handle_request+0x126/0x1a0
 md_make_request+0x7b/0x180
 generic_make_request+0x19e/0x3f0
 submit_bio+0x73/0x140
[...]

This patch changes raid0 driver to fallback to the "old" blocking queue
entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
This prevents the crashes and restores the regular behavior of raid0
arrays when a member is removed during a large write.

[0] lore.kernel.org/linux-block/343bbbf6-64eb-879e-d19e-96aebb037d47@I-love.SAKURA.ne.jp

----------------------------
Why this is not on mainline?
----------------------------

The patch was originally submitted upstream in linux-raid and
linux-block mailing-lists - it was initially accepted by Song Liu,
but Christoph Hellwig[1] observed that there was a clean-up series
ready to be accepted from Ming Lei[2] that fixed the same issue.

The accepted patches from Ming's series in upstream are: commit
47cdee29ef ("block: move blk_exit_queue into __blk_release_queue") and
commit fe2008640a ("block: don't protect generic_make_request_checks
with blk_queue_enter"). Those patches basically do a clean-up in the
block layer involving:

1) Putting back blk_exit_queue() logic into __blk_release_queue(); that
path was changed in the past and the logic from blk_exit_queue() was
added to blk_cleanup_queue().

2) Removing the guard/protection in generic_make_request_checks() with
blk_queue_enter().

The problem with Ming's series for -stable is that it relies in the
legacy request IO path removal. So it's "backport-able" to v5.0+,
but doing that for early versions (like 4.19) would incur in complex
code changes. Hence, it was suggested by Christoph and Song Liu that
this patch was submitted to stable only; otherwise merging it upstream
would add code to fix a path removed in a subsequent commit.

[1] lore.kernel.org/linux-block/20190521172258.GA32702@infradead.org
[2] lore.kernel.org/linux-block/20190515030310.20393-1-ming.lei@redhat.com

Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: cd4a4ae468 ("block: don't use blocking queue entered for recursive bio submits")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:53:30 +02:00
..
bcache bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached 2019-06-19 08:18:00 +02:00
persistent-data dm: Avoid namespace collision with bitmap API 2018-08-01 15:49:38 -07:00
dm-bio-prison-v1.c dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-bio-prison-v1.h
dm-bio-prison-v2.c dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-bio-prison-v2.h
dm-bio-record.h block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
dm-bufio.c dm bufio: fix buffer alignment 2018-04-30 11:51:39 -04:00
dm-builtin.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dm-cache-background-tracker.c dm cache background tracker: fix sparse warning 2018-04-30 15:40:40 -04:00
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c dm cache metadata: Fix loading discard bitset 2019-05-25 18:23:38 +02:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c dm cache: destroy migration_cache if cache target registration failed 2018-10-09 13:53:03 -04:00
dm-core.h dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-crypt.c dm crypt: don't overallocate the integrity tag space 2019-02-20 10:25:49 +01:00
dm-delay.c dm delay: fix a crash when invalid device is specified 2019-05-25 18:23:39 +02:00
dm-era-target.c dm: allow targets to return output from messages they are sent 2018-04-03 15:04:10 -04:00
dm-exception-store.c
dm-exception-store.h
dm-flakey.c dm: Check for device sector overflow if CONFIG_LBDAF is not set 2019-01-26 09:32:42 +01:00
dm-integrity.c dm integrity: correctly calculate the size of metadata area 2019-05-25 18:23:39 +02:00
dm-io.c dm: Use kzalloc for all structs with embedded biosets/mempools 2018-06-05 08:47:43 -06:00
dm-ioctl.c dm ioctl: harden copy_params()'s copy_from_user() from malicious users 2018-11-13 11:08:49 -08:00
dm-kcopyd.c dm kcopyd: Fix bug causing workqueue stalls 2019-01-26 09:32:41 +01:00
dm-linear.c dm: Check for device sector overflow if CONFIG_LBDAF is not set 2019-01-26 09:32:42 +01:00
dm-log-userspace-base.c dm: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c dm log writes: make sure super sector log updates are written in order 2019-07-03 13:14:45 +02:00
dm-log.c
dm-mpath.c dm mpath: always free attached_handler_name in parse_path() 2019-05-25 18:23:40 +02:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-raid.c dm raid: remove bogus const from decipher_sync_action() return type 2018-09-17 22:46:50 -04:00
dm-raid1.c dm: Check for device sector overflow if CONFIG_LBDAF is not set 2019-01-26 09:32:42 +01:00
dm-region-hash.c - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
dm-round-robin.c
dm-rq.c dm: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
dm-rq.h dm rq: do not update rq partially in each ending bio 2017-08-28 10:23:28 -04:00
dm-service-time.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-snap-persistent.c dm bufio: move dm-bufio.h to include/linux/ 2018-04-03 15:04:23 -04:00
dm-snap-transient.c
dm-snap.c dm snapshot: Fix excessive memory usage and workqueue stalls 2019-01-26 09:32:41 +01:00
dm-stats.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
dm-stats.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dm-stripe.c dax: Introduce a ->copy_to_iter dax operation 2018-05-22 23:18:31 -07:00
dm-switch.c treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
dm-sysfs.c
dm-table.c dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors 2019-04-17 08:38:54 +02:00
dm-target.c dm: remove unused macro DM_MOD_NAME_SIZE 2018-04-03 15:04:15 -04:00
dm-thin-metadata.c dm thin: fix passdown_double_checking_shared_status() 2019-01-31 08:14:38 +01:00
dm-thin-metadata.h dm thin: fix passdown_double_checking_shared_status() 2019-01-31 08:14:38 +01:00
dm-thin.c dm thin: add sanity checks to thin-pool and external snapshot creation 2019-04-05 22:32:59 +02:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c dm: Check for device sector overflow if CONFIG_LBDAF is not set 2019-01-26 09:32:42 +01:00
dm-verity-fec.c Refactors rslib and callers to provide a per-instance allocation area 2018-06-05 10:48:05 -07:00
dm-verity-fec.h dm: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
dm-verity-target.c dm verity: fix crash on bufio buffer that was allocated with vmalloc 2018-09-04 11:25:25 -04:00
dm-verity.h dm verity: add 'check_at_most_once' option to only validate hashes once 2018-04-03 15:04:29 -04:00
dm-writecache.c libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
dm-zero.c
dm-zoned-metadata.c dm zoned: Fix zone report handling 2019-05-25 18:23:39 +02:00
dm-zoned-reclaim.c dm kcopyd: return void from dm_kcopyd_copy() 2018-07-31 17:33:21 -04:00
dm-zoned-target.c dm zoned: Fix target BIO completion handling 2018-12-19 19:19:53 +01:00
dm-zoned.h
dm.c dm: revert 8f50e35815 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") 2019-04-17 08:38:54 +02:00
dm.h dm: move dm_table_destroy() to same header as dm_table_create() 2018-01-17 09:16:06 -05:00
Kconfig dm: add writecache target 2018-06-08 11:59:51 -04:00
Makefile dm: add writecache target 2018-06-08 11:59:51 -04:00
md-bitmap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2018-08-18 16:48:07 -07:00
md-bitmap.h md: Avoid namespace collision with bitmap API 2018-08-01 15:49:39 -07:00
md-cluster.c md-cluster: release RESYNC lock after the last resync message 2018-08-31 17:38:10 -07:00
md-cluster.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
md-faulty.c md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
md-linear.c md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
md-linear.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-multipath.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
md-multipath.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
md.c md: add mddev->pers to avoid potential NULL pointer dereference 2019-05-25 18:23:25 +02:00
md.h md: batch flush requests. 2019-05-25 18:23:25 +02:00
raid0.c md/raid0: Do not bypass blocking queue entered for raid0 bios 2019-07-10 09:53:30 +02:00
raid0.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raid1-10.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raid1.c md/raid1: don't clear bitmap bits on interrupted recovery. 2019-02-20 10:25:48 +01:00
raid1.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
raid5-cache.c md/raid5: fix 'out of memory' during raid cache recovery 2019-02-06 17:30:16 +01:00
raid5-log.h md/raid5-cache: disable reshape completely 2018-08-31 17:38:09 -07:00
raid5-ppl.c md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
raid5.c md/raid: raid5 preserve the writeback action after the parity check 2019-05-25 18:23:47 +02:00
raid5.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2018-06-09 12:01:36 -07:00
raid10.c md: Fix failed allocation of md_register_thread 2019-03-23 20:10:11 +01:00
raid10.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00