linux-uconsole/drivers
Kevin Barnett 2ba55c9851 scsi: smartpqi: correct lun reset issues
Problem:
The Linux kernel takes a logical volume offline after a LUN reset.  This is
generally accompanied by this message in the dmesg output:

Device offlined - not ready after error recovery

Root Cause:
The root cause is a "quirk" in the timeout handling in the Linux SCSI
layer. The Linux kernel places a 30-second timeout on most media access
commands (reads and writes) that it send to device drivers.  When a media
access command times out, the Linux kernel goes into error recovery mode
for the LUN that was the target of the command that timed out. Every
command that timed out is kept on a list inside of the Linux kernel to be
retried later. The kernel attempts to recover the command(s) that timed out
by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and
TEST UNIT READY commands are successful, the kernel retries the command(s)
that timed out.

Each SCSI command issued by the kernel has a result field associated with
it. This field indicates the final result of the command (success or
error). When a command times out, the kernel places a value in this result
field indicating that the command timed out.

The "quirk" is that after the LUN reset and TEST UNIT READY commands are
completed, the kernel checks each command on the timed-out command list
before retrying it. If the result field is still "timed out", the kernel
treats that command as not having been successfully recovered for a
retry. If the number of commands that are in this state are greater than
two, the kernel takes the LUN offline.

Fix:
When our RAIDStack receives a LUN reset, it simply waits until all
outstanding commands complete. Generally, all of these outstanding commands
complete successfully. Therefore, the fix in the smartpqi driver is to
always set the command result field to indicate success when a request
completes successfully. This normally isn’t necessary because the result
field is always initialized to success when the command is submitted to the
driver. So when the command completes successfully, the result field is
left untouched. But in this case, the kernel changes the result field
behind the driver’s back and then expects the field to be changed by the
driver as the commands that timed-out complete.

Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-19 22:23:35 -05:00
..
accessibility
acpi pwm: Changes for v4.20-rc1 2018-11-02 11:22:45 -07:00
amba
android
ata libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9 2018-10-26 08:21:04 -06:00
atm atm: zatm: Fix empty body Clang warnings 2018-10-18 15:39:10 -07:00
auxdisplay The Compiler Attributes series 2018-11-01 18:34:46 -07:00
base mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock 2018-10-31 08:54:17 -07:00
bcma
block for-linus-20181102 2018-11-02 11:25:48 -07:00
bluetooth Merge branch 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-10-24 14:43:41 +01:00
bus ARM: SoC driver updates for 4.17 2018-10-29 15:16:01 -07:00
cdrom gdrom: fix mistake in assignment of error 2018-10-25 11:17:40 -06:00
char RTC for 4.20 2018-10-27 09:24:24 -07:00
clk This time it looks like a quieter release cycle in the clk tree. I guess that's 2018-10-31 11:08:30 -07:00
clocksource Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-11-04 08:15:15 -08:00
connector
cpufreq cpufreq: remove unused arm_big_little_dt driver 2018-10-25 18:39:02 +02:00
cpuidle More power management updates for 4.20-rc1 2018-10-30 09:08:07 -07:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-10-25 16:43:35 -07:00
dax
dca
devfreq
dio
dma pci-v4.20-changes 2018-10-25 06:50:48 -07:00
dma-buf
edac * skx_edac: Address translation for NVDIMMs (Tony Luck and Qiuxu Zhuo) 2018-11-02 11:17:22 -07:00
eisa
extcon
firewire scsi: flip the default on use_clustering 2018-12-18 23:13:12 -05:00
firmware Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-11-03 18:25:17 -07:00
fmc
fpga fpga: add devm_fpga_region_create 2018-10-16 11:13:50 +02:00
fsi iov_iter: Separate type from direction and use accessor functions 2018-10-24 00:41:07 +01:00
gnss
gpio pci-v4.20-changes 2018-10-25 06:50:48 -07:00
gpu drm, i915, amdgpu, bridge + core quirk 2018-11-02 10:58:20 -07:00
hid platform-drivers-x86 for v4.20-1 2018-11-01 08:42:21 -07:00
hsi
hv hv_balloon: Replace spin_is_locked() with lockdep 2018-10-15 20:54:17 +02:00
hwmon Lots of small changes to the IPMI driver. Most of the changes 2018-10-23 09:42:05 +01:00
hwspinlock
hwtracing
i2c i2c: Clear client->irq in i2c_device_remove 2018-10-31 23:33:34 +00:00
ide
idle Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
iio Staging/IIO patches for 4.20-rc1 2018-10-29 10:38:10 -07:00
infiniband scsi: flip the default on use_clustering 2018-12-18 23:13:12 -05:00
input Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax 2018-10-28 11:35:40 -07:00
iommu mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
ipack
irqchip irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function 2018-11-01 12:38:48 +01:00
isdn Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
leds leds: gpio: set led_dat->gpiod pointer for OF defined GPIO leds 2018-10-26 20:51:36 +02:00
lightnvm
macintosh memblock: stop using implicit alignment to SMP_CACHE_BYTES 2018-10-31 08:54:16 -07:00
mailbox - Convert print users to use the %pOFn format specifier 2018-10-29 10:30:44 -07:00
mcb
md for-linus-20181102 2018-11-02 11:25:48 -07:00
media media updates for v4.20-rc1 2018-10-31 10:53:29 -07:00
memory
memstick
message scsi: flip the default on use_clustering 2018-12-18 23:13:12 -05:00
mfd chrome-platform for v4.20 2018-10-31 16:47:55 -07:00
misc Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
mmc Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
mtd This pull request contains updates for UBIFS: 2018-11-04 14:46:04 -08:00
mux This is the bulk of GPIO changes for the v4.20 series: 2018-10-23 08:45:05 +01:00
net NTB IDT thermal changes and hook into hwmon, ntb_netdev clean-up of 2018-11-04 08:12:44 -08:00
nfc NFC: nfcmrvl_uart: fix OF child-node lookup 2018-10-23 13:28:53 -05:00
ntb ntb: idt: Alter the driver info comments 2018-11-01 10:33:12 -04:00
nubus
nvdimm libnvdimm for 4.20 2018-10-25 06:31:56 -07:00
nvme for-linus-20181102 2018-11-02 11:25:48 -07:00
nvmem nvmem: hide unused nvmem_find_cell_by_index function 2018-10-15 15:56:15 +02:00
of Devicetree fixes for v4.20-rc1: 2018-11-01 14:45:38 -07:00
opp
oprofile
parisc parisc: Add alternative coding infrastructure 2018-10-17 17:22:26 +02:00
parport
pci Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax 2018-10-28 11:35:40 -07:00
pcmcia powerpc updates for 4.20 2018-10-26 14:36:21 -07:00
perf arm64 updates for 4.20: 2018-10-22 17:30:06 +01:00
phy USB/PHY patches for 4.20-rc1 2018-10-26 08:14:13 -07:00
pinctrl This is the bulk of GPIO changes for the v4.20 series: 2018-10-23 08:45:05 +01:00
platform platform-drivers-x86 for v4.20-1 2018-11-01 08:42:21 -07:00
pnp
power Devicetree updates for 4.20: 2018-10-26 12:09:58 -07:00
powercap Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
pps
ps3
ptp ptp: drop redundant kasprintf() to create worker name 2018-10-28 19:20:06 -07:00
pwm pwm: lpss: Only set update bit if we are actually changing the settings 2018-10-16 13:16:15 +02:00
rapidio
ras
regulator regulator: Regulator updates for next release 2018-10-23 01:54:44 +01:00
remoteproc remoteproc: qcom: q6v5-mss: Register segments/dumpfn for coredump 2018-10-19 12:54:03 -07:00
reset ARM: SoC driver updates for 4.17 2018-10-29 15:16:01 -07:00
rpmsg
rtc rtc: sc27xx: Always read normal alarm when registering RTC device 2018-10-25 02:35:42 +02:00
s390 scsi: flip the default on use_clustering 2018-12-18 23:13:12 -05:00
sbus
scsi scsi: smartpqi: correct lun reset issues 2018-12-19 22:23:35 -05:00
sfi mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sh
siox
slimbus
sn
soc soc: ti: QMSS: Fix usage of irq_set_affinity_hint 2018-11-02 11:22:09 -07:00
soundwire
spi - New Drivers 2018-10-25 06:19:15 -07:00
spmi
ssb
staging scsi: flip the default on use_clustering 2018-12-18 23:13:12 -05:00
target scsi: remove the use_clustering flag 2018-12-18 23:19:21 -05:00
tc
tee
thermal Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2018-10-31 11:28:12 -07:00
thunderbolt
tty mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
uio
usb scsi: remove the use_clustering flag 2018-12-18 23:19:21 -05:00
uwb
vfio VFIO updates for v4.20 2018-10-31 11:01:38 -07:00
vhost scsi: target: replace fabric_ops.name with fabric_alias 2018-11-28 18:50:59 -05:00
video fbdev changes for v4.20: 2018-10-31 11:41:37 -07:00
virt
virtio virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON 2018-10-24 20:57:55 -04:00
visorbus
vlynq
vme
w1 w1: IAD Register is yet readable trough iad sys file. Fix snprintf (%u for unsigned, count for max size). 2018-10-15 20:50:32 +02:00
watchdog watchdog: ts4800: release syscon device node in ts4800_wdt_probe() 2018-10-22 10:16:28 +02:00
xen scsi: target: replace fabric_ops.name with fabric_alias 2018-11-28 18:50:59 -05:00
zorro
Kconfig
Makefile