linux-uconsole/drivers
Lyude Paul e0547c81bf PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary
On ThinkPad P50 SKUs with an Nvidia Quadro M1000M instead of the M2000M
variant, the BIOS does not always reset the secondary Nvidia GPU during
reboot if the laptop is configured in Hybrid Graphics mode.  The reason is
unknown, but the following steps and possibly a good bit of patience will
reproduce the issue:

  1. Boot up the laptop normally in Hybrid Graphics mode
  2. Make sure nouveau is loaded and that the GPU is awake
  3. Allow the Nvidia GPU to runtime suspend itself after being idle
  4. Reboot the machine, the more sudden the better (e.g. sysrq-b may help)
  5. If nouveau loads up properly, reboot the machine again and go back to
     step 2 until you reproduce the issue

This results in some very strange behavior: the GPU will be left in exactly
the same state it was in when the previously booted kernel started the
reboot.  This has all sorts of bad side effects: for starters, this
completely breaks nouveau starting with a mysterious EVO channel failure
that happens well before we've actually used the EVO channel for anything:

  nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000 00000002

This causes a timeout trying to bring up the GR ctx:

  nouveau 0000:01:00.0: timeout
  WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547 gf100_grctx_generate+0x7b2/0x850 [nouveau]
  Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 ) 12/18/2018
  Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
  ...
  nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
  nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
  nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine 00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000 unknown]

The GPU never manages to recover.  Booting without loading nouveau causes
issues as well, since the GPU starts sending spurious interrupts that cause
other device's IRQs to get disabled by the kernel:

  irq 16: nobody cared (try booting with the "irqpoll" option)
  ...
  handlers:
  [<000000007faa9e99>] i801_isr [i2c_i801]
  Disabling IRQ #16
  ...
  serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
  i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
  i801_smbus 0000:00:1f.4: Transaction timeout
  rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-110).
  i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
  i801_smbus 0000:00:1f.4: Transaction timeout
  rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts!

This causes the touchpad and sometimes other things to get disabled.

Since this happens without nouveau, we can't fix this problem from nouveau
itself.

Add a PCI quirk for the specific P50 variant of this GPU.  Make sure the
GPU is advertising NoReset- so we don't reset the GPU when the machine is
in Dedicated graphics mode (where the GPU being initialized by the BIOS is
normal and expected).  Map the GPU MMIO space and read the magic 0x2240c
register, which will have bit 1 set if the device was POSTed during a
previous boot.  Once we've confirmed all of this, reset the GPU and
re-disable it - bringing it back to a healthy state.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=203003
Link: https://lore.kernel.org/lkml/20190212220230.1568-1-lyude@redhat.com
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Ben Skeggs <skeggsb@gmail.com>
Cc: stable@vger.kernel.org
2019-04-25 07:55:18 -05:00
..
accessibility
acpi device-dax for 5.1 2019-03-16 13:05:32 -07:00
amba ARM: 8836/1: drivers: amba: Update component matching to use the CoreSight UCI values. 2019-02-26 11:23:49 +00:00
android
ata SCSI misc on 20190306 2019-03-09 16:53:47 -08:00
atm
auxdisplay
base device-dax for 5.1 2019-03-16 13:05:32 -07:00
bcma
block for-5.1/block-post-20190315 2019-03-16 12:36:39 -07:00
bluetooth Bluetooth: mediatek: add support for MediaTek MT7663U and MT7668U UART devices 2019-03-02 19:51:23 +01:00
bus ARM: SoC driver updates for 5.1 2019-03-06 09:41:12 -08:00
cdrom
char Merge branch 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-03-10 17:37:29 -07:00
clk We have a fairly balanced mix of clk driver updates and clk framework 2019-03-14 08:46:17 -07:00
clocksource ARM: some cleanups, direct physical timer assignment, cache sanitization 2019-03-15 15:00:28 -07:00
connector connector: fix unsafe usage of ->real_parent 2019-03-08 15:06:38 -08:00
cpufreq cpufreq: intel_pstate: Fix up iowait_boost computation 2019-03-12 09:47:30 +01:00
cpuidle cpuidle: governor: Add new governors to cpuidle_governors again 2019-03-12 23:46:55 +01:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2019-03-13 09:51:17 -07:00
dax device-dax for 5.1 2019-03-16 13:05:32 -07:00
dca
devfreq
dio
dma dmaengine updates for v5.1-rc1 2019-03-14 09:11:54 -07:00
dma-buf
edac Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-03-08 09:11:39 -08:00
eisa
extcon
firewire
firmware memblock: drop memblock_alloc_*_nopanic() variants 2019-03-12 10:04:02 -07:00
fmc
fpga
fsi
gnss
gpio pci-v5.1-changes 2019-03-09 14:57:08 -08:00
gpu drm i915, amdgpu, qxl and etnaviv fixes 2019-03-15 13:58:35 -07:00
hid Merge branch 'for-5.1/wacom' into for-linus 2019-03-05 15:43:05 +01:00
hsi
hv Char/Misc driver patches for 5.1-rc1 2019-03-06 14:18:59 -08:00
hwmon hwmon: (ad7418) Add device tree probing 2019-02-25 09:06:00 -08:00
hwspinlock
hwtracing ARM updates for 5.1-rc1 2019-03-15 14:37:46 -07:00
i2c i2c: i2c-designware-platdrv: Always use a dynamic adapter number 2019-03-13 18:07:10 +01:00
i3c - Add a /* fall-through */ comment in the dw-i3c-master driver 2019-03-04 19:05:02 -08:00
ide Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide 2019-03-11 09:34:00 -07:00
idle
iio - New Drivers 2019-03-08 10:02:58 -08:00
infiniband XArray updates for 5.1-rc1 2019-03-11 20:06:18 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2019-03-11 10:57:11 -07:00
interconnect
iommu IOMMU Fix for Linux v5.1-rc1 2019-03-15 14:41:30 -07:00
ipack
irqchip arm64 updates for 5.1: 2019-03-10 10:17:23 -07:00
isdn isdn: hfcpci: fix potential NULL pointer dereference 2019-03-12 14:36:02 -07:00
leds platform-drivers-x86 for v5.1-1 2019-03-10 13:16:37 -07:00
lightnvm pblk: fix max_io calculation 2019-03-07 08:59:26 -07:00
macintosh treewide: add checks for the return value of memblock_alloc*() 2019-03-12 10:04:02 -07:00
mailbox mailbox: imx: keep MU irq working during suspend/resume 2019-03-11 02:51:43 -05:00
mcb
md for-5.1/block-post-20190315 2019-03-16 12:36:39 -07:00
media DMA mapping updates for 5.1 2019-03-10 11:54:48 -07:00
memory
memstick
message
mfd DMA mapping updates for 5.1 2019-03-10 11:54:48 -07:00
misc 5.1 Merge Window Pull Request 2019-03-09 15:53:03 -08:00
mmc for-5.1/block-20190302 2019-03-08 14:12:17 -08:00
mtd This pull request contains updates for both UBI and UBIFS: 2019-03-13 09:34:35 -07:00
mux
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-03-14 09:28:12 -07:00
nfc
ntb Fixes for switchtec debugability and mapping table entries, NTB 2019-03-15 14:32:59 -07:00
nubus
nvdimm device-dax for 5.1 2019-03-16 13:05:32 -07:00
nvme for-5.1/block-post-20190315 2019-03-16 12:36:39 -07:00
nvmem Char/Misc driver patches for 5.1-rc1 2019-03-06 14:18:59 -08:00
of of: fix kmemleak crash caused by imbalance in early memory reservation 2019-03-12 10:04:02 -07:00
opp PM / OPP: Update performance state when freq == old_freq 2019-03-12 09:45:56 +01:00
oprofile
parisc DMA mapping updates for 5.1 2019-03-10 11:54:48 -07:00
parport
pci PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary 2019-04-25 07:55:18 -05:00
pcmcia
perf arm64 updates for 5.1: 2019-03-10 10:17:23 -07:00
phy drm next pull request for 5.1 2019-03-08 08:23:15 -08:00
pinctrl This is the bulk of pin control changes for the v5.1 kernel cycle. 2019-03-11 11:12:50 -07:00
platform chrome platform changes for v5.1 2019-03-12 09:46:32 -07:00
pnp ACPI/ACPICA: Trivial: fix spelling mistakes and fix whitespace formatting 2019-02-24 21:12:01 +01:00
power
powercap
pps
ps3
ptp Merge branch 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-03-05 14:08:26 -08:00
pwm pwm: atmel: Remove useless symbolic definitions 2019-03-04 12:52:49 +01:00
rapidio rapidio/mport_cdev: mark expected switch fall-through 2019-03-07 18:32:02 -08:00
ras
regulator regulator: mc13xxx: Constify regulator_ops variables 2019-03-04 00:01:08 +00:00
remoteproc remoteproc updates for v5.1 2019-03-14 09:00:06 -07:00
reset
rpmsg
rtc chrome platform changes for v5.1 2019-03-12 09:46:32 -07:00
s390 ARM: some cleanups, direct physical timer assignment, cache sanitization 2019-03-15 15:00:28 -07:00
sbus
scsi SCSI misc on 20190315 2019-03-16 12:51:50 -07:00
sfi
sh
siox
slimbus
sn
soc ARM: SoC driver updates for 5.1 2019-03-06 09:41:12 -08:00
soundwire
spi pci-v5.1-changes 2019-03-09 14:57:08 -08:00
spmi
ssb
staging media updates for v5.1-rc1 2019-03-09 14:45:54 -08:00
target SCSI misc on 20190315 2019-03-16 12:51:50 -07:00
tc
tee ARM: SoC driver updates for 5.1 2019-03-06 09:41:12 -08:00
thermal Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal 2019-03-08 09:52:41 -08:00
thunderbolt
tty dmaengine updates for v5.1-rc1 2019-03-14 09:11:54 -07:00
uio
usb memblock: drop memblock_alloc_*_nopanic() variants 2019-03-12 10:04:02 -07:00
uwb
vfio powerpc updates for 5.1 2019-03-07 12:56:26 -08:00
vhost virtio: fixes, cleanups 2019-03-10 12:47:57 -07:00
video fbdev changes for v5.1: 2019-03-15 14:22:59 -07:00
virt virt: vbox: Mark expected switch fall-through 2019-02-27 16:00:20 +01:00
virtio virtio: hint if callbacks surprisingly might sleep 2019-03-06 11:19:57 -05:00
visorbus
vlynq
vme
w1
watchdog linux-watchdog 5.1-rc1 tag 2019-03-11 11:22:15 -07:00
xen xen/balloon: Fix mapping PG_offline pages to user space 2019-03-15 15:35:35 +01:00
zorro
Kconfig
Makefile IOMMU Updates for Linux v5.1 2019-03-10 12:29:52 -07:00