linux-uconsole/drivers
Guilherme G. Piccoli 869eec8946 md/raid0: Do not bypass blocking queue entered for raid0 bios
-----------------------------------------------------------------
This patch is not on mainline and is meant to 4.19 stable *only*.
After the patch description there's a reasoning about that.
-----------------------------------------------------------------

Commit cd4a4ae468 ("block: don't use blocking queue entered for
recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
split bios bypass the blocking queue entering routine and use the live
non-blocking version. It was a result of an extensive discussion in
a linux-block thread[0], and the purpose of this change was to prevent
a hung task waiting on a reference to drop.

Happens that md raid0 split bios all the time, and more important,
it changes their underlying device to the raid member. After the change
introduced by this flag's usage, we experience various crashes if a raid0
member is removed during a large write. This happens because the bio
reaches the live queue entering function when the queue of the raid0
member is dying.

A simple reproducer of this behavior is presented below:
a) Build kernel v4.19.56-stable with CONFIG_BLK_DEV_THROTTLING=y.

b) Create a raid0 md array with 2 NVMe devices as members, and mount
it with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme1n1/device/device/remove
(whereas nvme1n1 is the 2nd array member)

This will trigger the following warning/oops:

------------[ cut here ]------------
BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:blk_throtl_bio+0x45/0x970
[...]
Call Trace:
 generic_make_request_checks+0x1bf/0x690
 generic_make_request+0x64/0x3f0
 raid0_make_request+0x184/0x620 [raid0]
 ? raid0_make_request+0x184/0x620 [raid0]
 md_handle_request+0x126/0x1a0
 md_make_request+0x7b/0x180
 generic_make_request+0x19e/0x3f0
 submit_bio+0x73/0x140
[...]

This patch changes raid0 driver to fallback to the "old" blocking queue
entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
This prevents the crashes and restores the regular behavior of raid0
arrays when a member is removed during a large write.

[0] lore.kernel.org/linux-block/343bbbf6-64eb-879e-d19e-96aebb037d47@I-love.SAKURA.ne.jp

----------------------------
Why this is not on mainline?
----------------------------

The patch was originally submitted upstream in linux-raid and
linux-block mailing-lists - it was initially accepted by Song Liu,
but Christoph Hellwig[1] observed that there was a clean-up series
ready to be accepted from Ming Lei[2] that fixed the same issue.

The accepted patches from Ming's series in upstream are: commit
47cdee29ef ("block: move blk_exit_queue into __blk_release_queue") and
commit fe2008640a ("block: don't protect generic_make_request_checks
with blk_queue_enter"). Those patches basically do a clean-up in the
block layer involving:

1) Putting back blk_exit_queue() logic into __blk_release_queue(); that
path was changed in the past and the logic from blk_exit_queue() was
added to blk_cleanup_queue().

2) Removing the guard/protection in generic_make_request_checks() with
blk_queue_enter().

The problem with Ming's series for -stable is that it relies in the
legacy request IO path removal. So it's "backport-able" to v5.0+,
but doing that for early versions (like 4.19) would incur in complex
code changes. Hence, it was suggested by Christoph and Song Liu that
this patch was submitted to stable only; otherwise merging it upstream
would add code to fix a path removed in a subsequent commit.

[1] lore.kernel.org/linux-block/20190521172258.GA32702@infradead.org
[2] lore.kernel.org/linux-block/20190515030310.20393-1-ming.lei@redhat.com

Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: cd4a4ae468 ("block: don't use blocking queue entered for recursive bio submits")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:53:30 +02:00
..
accessibility
acpi ACPI/PCI: PM: Add missing wakeup.flags.valid checks 2019-06-22 08:15:17 +02:00
amba
android binder: fix race between munmap() and direct reclaim 2019-06-09 09:17:23 +02:00
ata libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk 2019-06-19 08:17:59 +02:00
atm atm: he: fix sign-extension overflow on large shift 2019-02-27 10:08:57 +01:00
auxdisplay auxdisplay: hd44780: Fix memory leak on ->remove() 2019-04-20 09:15:55 +02:00
base PM / core: Propagate dev->power.wakeup_path when no callbacks 2019-05-31 06:46:23 -07:00
bcma
block xen-blkfront: switch kcalloc to kvcalloc for large array allocation 2019-06-11 12:20:53 +02:00
bluetooth Bluetooth: hci_qca: Give enough time to ROME controller to bootup. 2019-05-31 06:46:16 -07:00
bus
cdrom cdrom: Fix race condition in cdrom_sysctl_register 2019-04-05 22:33:10 +02:00
char hwrng: omap - Set default quality 2019-05-31 06:46:31 -07:00
clk clk: socfpga: stratix10: fix divider entry for the emac clocks 2019-07-03 13:14:44 +02:00
clocksource clocksource/drivers/oxnas: Fix OX820 compatible 2019-05-16 19:41:21 +02:00
connector connector: fix unsafe usage of ->real_parent 2019-03-19 13:12:38 +01:00
cpufreq cpufreq: kirkwood: fix possible object reference leak 2019-05-31 06:46:24 -07:00
cpuidle cpuidle: big.LITTLE: fix refcount leak 2019-02-12 19:47:08 +01:00
crypto crypto: vmx - ghash: do nosimd fallback manually 2019-06-04 08:02:34 +02:00
dax mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses 2019-05-22 07:37:40 +02:00
dca
devfreq
dio
dma dmaengine: sprd: Fix block length overflow 2019-06-25 11:35:55 +08:00
dma-buf
edac EDAC/mpc85xx: Prevent building as a module 2019-06-15 11:54:03 +02:00
eisa
extcon extcon: arizona: Disable mic detect if running when driver is removed 2019-05-31 06:46:23 -07:00
firewire
firmware pstore: Convert buf_lock to semaphore 2019-06-11 12:20:52 +02:00
fmc
fpga fpga: dfl: Add lockdep classes for pdata->lock 2019-06-25 11:35:55 +08:00
fsi
gnss gnss: sirf: fix premature wakeup interrupt enable 2019-03-10 07:17:21 +01:00
gpio gpio: fix gpio-adp5588 build errors 2019-06-22 08:15:16 +02:00
gpu drm/vmwgfx: Use the backdoor port if the HB port is not available 2019-06-25 11:36:01 +08:00
hid HID: wacom: Sync INTUOSP2_BT touch state after each frame if necessary 2019-06-19 08:17:59 +02:00
hsi
hv Drivers: hv: vmbus: Remove the undesired put_cpu_ptr() in hv_synic_cleanup() 2019-05-10 17:54:04 +02:00
hwmon hwmon: (pmbus/core) Treat parameters as paged if on multiple pages 2019-06-25 11:35:59 +08:00
hwspinlock
hwtracing intel_th: msu: Fix single mode with IOMMU 2019-05-25 18:23:26 +02:00
i2c i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr 2019-06-22 08:15:17 +02:00
ide ide: fix a typo in the settings proc file name 2019-01-31 08:14:42 +01:00
idle x86/cpu: Sanitize FAM6_ATOM naming 2019-05-14 19:17:53 +02:00
iio iio: temperature: mlx90632 Relax the compatibility check 2019-06-25 11:35:54 +08:00
infiniband RDMA: Directly cast the sockaddr union to sockaddr 2019-07-03 13:14:49 +02:00
input Input: silead - add MSSL0017 to acpi_device_id 2019-06-25 11:35:54 +08:00
iommu iommu/arm-smmu: Avoid constant zero in TLBI writes 2019-06-19 08:18:00 +02:00
ipack
irqchip irqchip/mips-gic: Use the correct local interrupt map registers 2019-07-03 13:14:46 +02:00
isdn mISDN: make sure device name is NUL terminated 2019-06-22 08:15:16 +02:00
leds leds: trigger: netdev: use memcpy in device_name_store 2019-05-04 09:20:22 +02:00
lightnvm lightnvm: pblk: add lock protection to list operations 2019-02-12 19:47:08 +01:00
macintosh
mailbox mailbox: stm32-ipcc: check invalid irq 2019-06-15 11:54:04 +02:00
mcb
md md/raid0: Do not bypass blocking queue entered for raid0 bios 2019-07-10 09:53:30 +02:00
media media: uvcvideo: Fix uvc_alloc_entity() allocation alignment 2019-06-09 09:17:24 +02:00
memory memory: tegra: Fix integer overflow on tick value calculation 2019-05-25 18:23:32 +02:00
memstick memstick: Prevent memstick host from getting runtime suspended during card detection 2019-02-12 19:47:10 +01:00
message
mfd mfd: twl6040: Fix device init errors for ACCCTL register 2019-06-15 11:54:03 +02:00
misc eeprom: at24: fix unexpected timeout under high load 2019-07-03 13:14:46 +02:00
mmc mmc: core: Prevent processing SDIO IRQs when the card is suspended 2019-06-25 11:35:53 +08:00
mtd mtd: spinand: macronix: Fix ECC Status Read 2019-06-11 12:20:50 +02:00
mux
net tun: wake up waitqueues after IFF_UP is set 2019-07-03 13:14:48 +02:00
nfc spi: ST ST95HF NFC: declare missing of table 2019-05-16 19:41:25 +02:00
ntb
nubus
nvdimm libnvdimm: Fix compilation warnings with W=1 2019-06-19 08:18:03 +02:00
nvme nvme: Fix u32 overflow in the number of namespace list calculation 2019-06-25 11:35:59 +08:00
nvmem nvmem: sunxi_sid: Support SID on A83T and H5 2019-06-15 11:54:07 +02:00
of of: overlay: set node fields from properties when add new overlay node 2019-06-09 09:17:24 +02:00
opp OPP: Use opp_table->regulators to verify no regulator case 2019-02-12 19:47:08 +01:00
oprofile
parisc parisc: Use implicit space register selection for loading the coherence index of I/O pdirs 2019-06-11 12:20:51 +02:00
parport parport: Fix mem leak in parport_register_dev_model 2019-06-25 11:35:55 +08:00
pci ACPI/PCI: PM: Add missing wakeup.flags.valid checks 2019-06-22 08:15:17 +02:00
pcmcia
perf perf/arm-cci: Remove broken race mitigation 2019-05-31 06:46:17 -07:00
phy phy: mapphone-mdm6600: add gpiolib dependency 2019-05-31 06:46:20 -07:00
pinctrl pinctrl: samsung: fix leaked of_node references 2019-05-31 06:46:17 -07:00
platform platform/x86: pmc_atom: Add several Beckhoff Automation boards to critclk_systems DMI table 2019-06-19 08:18:03 +02:00
pnp
power power: supply: max14656: fix potential use-before-alloc 2019-06-15 11:54:09 +02:00
powercap x86/cpu: Sanitize FAM6_ATOM naming 2019-05-14 19:17:53 +02:00
pps
ps3
ptp ptp: Fix pass zero to ERR_PTR() in ptp_clock_register 2019-02-12 19:47:01 +01:00
pwm pwm: Fix deadlock warning when removing PWM device 2019-06-15 11:54:10 +02:00
rapidio rapidio: fix a NULL pointer dereference when create_workqueue() fails 2019-06-15 11:53:59 +02:00
ras RAS/CEC: Fix binary search function 2019-06-19 08:18:06 +02:00
regulator regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting 2019-04-05 22:33:15 +02:00
remoteproc
reset reset: meson-audio-arb: Fix missing .owner setting of reset_controller_dev 2019-05-08 07:21:47 +02:00
rpmsg
rtc rtc: pcf8523: don't return invalid date when battery is low 2019-06-19 08:18:07 +02:00
s390 s390/qeth: fix VLAN attribute in bridge_hostnotify udev event 2019-06-25 11:35:59 +08:00
sbus
scsi scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck() 2019-07-03 13:14:45 +02:00
sfi
sh
siox
slimbus slimbus: fix a potential NULL pointer dereference in of_qcom_slim_ngd_register 2019-05-31 06:46:14 -07:00
sn
soc soc: renesas: Identify R-Car M3-W ES1.3 2019-06-15 11:54:11 +02:00
soundwire
spi dmaengine: idma64: Use actual device for DMA transfers 2019-06-15 11:54:10 +02:00
spmi
ssb ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit 2019-05-31 06:46:04 -07:00
staging staging: erofs: add requirements field in superblock 2019-06-25 11:36:01 +08:00
target scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock 2019-03-23 20:09:59 +01:00
tc
tee tee: optee: avoid possible double list_del() 2019-02-12 19:47:08 +01:00
thermal drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER 2019-06-15 11:54:02 +02:00
thunderbolt thunderbolt: property: Fix a NULL pointer dereference 2019-05-31 06:46:31 -07:00
tty sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg 2019-06-22 08:15:14 +02:00
uio
usb usb: dwc3: Reset num_trbs after skipping 2019-07-03 13:14:49 +02:00
uwb
vfio vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING" 2019-06-15 11:54:07 +02:00
vhost vhost: reject zero size iova range 2019-04-27 09:36:31 +02:00
video video: imsttfb: fix potential NULL pointer dereferences 2019-06-15 11:54:10 +02:00
virt drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl 2019-05-16 19:41:31 +02:00
virtio virtio_pci: fix a NULL pointer reference in vp_del_vqs 2019-05-10 17:54:08 +02:00
visorbus
vlynq
vme
w1 w1: fix the resume command API 2019-05-31 06:46:14 -07:00
watchdog watchdog: fix compile time error of pretimeout governors 2019-06-15 11:54:06 +02:00
xen xenbus: Avoid deadlock during suspend due to open transactions 2019-06-22 08:15:19 +02:00
zorro
Kconfig
Makefile