linux-uconsole/drivers
Daniel Vetter d553bc6d55 drm/i915: fix gpu hang vs. flip stall deadlocks
commit 122f46bada upstream.

Since we've started to clean up pending flips when the gpu hangs in

commit 96a02917a0
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Mon Feb 18 19:08:49 2013 +0200

    drm/i915: Finish page flips and update primary planes after a GPU reset

the gpu reset work now also grabs modeset locks. But since work items
on our private work queue are not allowed to do that due to the
flush_workqueue from the pageflip code this results in a neat
deadlock:

INFO: task kms_flip:14676 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kms_flip        D ffff88019283a5c0     0 14676  13344 0x00000004
 ffff88018e62dbf8 0000000000000046 ffff88013bdb12e0 ffff88018e62dfd8
 ffff88018e62dfd8 00000000001d3b00 ffff88019283a5c0 ffff88018ec21000
 ffff88018f693f00 ffff88018eece000 ffff88018e62dd60 ffff88018eece898
Call Trace:
 [<ffffffff8138ee7b>] schedule+0x60/0x62
 [<ffffffffa046c0dd>] intel_crtc_wait_for_pending_flips+0xb2/0x114 [i915]
 [<ffffffff81050ff4>] ? finish_wait+0x60/0x60
 [<ffffffffa0478041>] intel_crtc_set_config+0x7f3/0x81e [i915]
 [<ffffffffa031780a>] drm_mode_set_config_internal+0x4f/0xc6 [drm]
 [<ffffffffa0319cf3>] drm_mode_setcrtc+0x44d/0x4f9 [drm]
 [<ffffffff810e44da>] ? might_fault+0x38/0x86
 [<ffffffffa030d51f>] drm_ioctl+0x2f9/0x447 [drm]
 [<ffffffff8107a722>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffffa03198a6>] ? drm_mode_setplane+0x343/0x343 [drm]
 [<ffffffff8112222f>] ? mntput_no_expire+0x3e/0x13d
 [<ffffffff81117f33>] vfs_ioctl+0x18/0x34
 [<ffffffff81118776>] do_vfs_ioctl+0x396/0x454
 [<ffffffff81396b37>] ? sysret_check+0x1b/0x56
 [<ffffffff81118886>] SyS_ioctl+0x52/0x7d
 [<ffffffff81396b12>] system_call_fastpath+0x16/0x1b
2 locks held by kms_flip/14676:
 #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa0316545>] drm_modeset_lock_all+0x22/0x59 [drm]
 #1:  (&crtc->mutex){+.+.+.}, at: [<ffffffffa031656b>] drm_modeset_lock_all+0x48/0x59 [drm]
INFO: task kworker/u8:4:175 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u8:4    D ffff88018de9a5c0     0   175      2 0x00000000
Workqueue: i915 i915_error_work_func [i915]
 ffff88018e37dc30 0000000000000046 ffff8801938ab8a0 ffff88018e37dfd8
 ffff88018e37dfd8 00000000001d3b00 ffff88018de9a5c0 ffff88018ec21018
 0000000000000246 ffff88018e37dca0 000000005a865a86 ffff88018de9a5c0
Call Trace:
 [<ffffffff8138ee7b>] schedule+0x60/0x62
 [<ffffffff8138f23d>] schedule_preempt_disabled+0x9/0xb
 [<ffffffff8138d0cd>] mutex_lock_nested+0x205/0x3b1
 [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915]
 [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915]
 [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915]
 [<ffffffffa044e0a2>] i915_error_work_func+0x128/0x147 [i915]
 [<ffffffff8104a89a>] process_one_work+0x1d4/0x35a
 [<ffffffff8104a821>] ? process_one_work+0x15b/0x35a
 [<ffffffff8104b4a5>] worker_thread+0x144/0x1f0
 [<ffffffff8104b361>] ? rescuer_thread+0x275/0x275
 [<ffffffff8105076d>] kthread+0xac/0xb4
 [<ffffffff81059d30>] ? finish_task_switch+0x3b/0xc0
 [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60
 [<ffffffff81396a6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60
3 locks held by kworker/u8:4/175:
 #0:  (i915){.+.+.+}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a
 #1:  ((&dev_priv->gpu_error.work)){+.+.+.}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a
 #2:  (&crtc->mutex){+.+.+.}, at: [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915]

This blew up while running kms_flip/flip-vs-panning-vs-hang-interruptible
on one of my older machines.

Unfortunately (despite the proper lockdep annotations for
flush_workqueue) lockdep still doesn't detect this correctly, so we
need to rely on chance to discover these bugs.

Apply the usual bugfix and schedule the reset work on the system
workqueue to keep our own driver workqueue free of any modeset lock
grabbing.

Note that this is not a terribly serious regression since before the
offending commit we'd simply have stalled userspace forever due to
failing to abort all outstanding pageflips.

v2: Add a comment as requested by Chris.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-01 09:17:46 -07:00
..
accessibility
acpi ACPI / LPSS: don't crash if a device has no MMIO resources 2013-09-26 17:18:05 -07:00
amba
ata libata: apply behavioral quirks to sil3826 PMP 2013-08-29 09:47:38 -07:00
atm atm: he: use mdelay instead of large udelay constants 2013-04-29 13:26:48 -04:00
auxdisplay
base regmap: rbtree: Fix overlapping rbnodes. 2013-09-07 22:10:00 -07:00
bcma bcma: add more core IDs 2013-05-17 14:31:05 -04:00
block rbd: fix I/O error propagation for reads 2013-09-26 17:18:29 -07:00
bluetooth Bluetooth: fix wrong use of PTR_ERR() in btusb 2013-08-11 18:35:23 -07:00
bus
cdrom drivers/cdrom/cdrom.c: use kzalloc() for failing hardware 2013-07-13 11:42:26 -07:00
char virtio: console: return -ENODEV on all read operations after unplug 2013-08-14 22:59:09 -07:00
clk clk: wm831x: Initialise wm831x pointer on init 2013-09-26 17:18:30 -07:00
clocksource clocksource: dw_apb: Fix error check 2013-07-25 14:07:29 -07:00
connector
cpufreq cpufreq: rename ignore_nice as ignore_nice_load 2013-08-14 22:59:06 -07:00
cpuidle cpuidle: coupled: fix race condition between pokes and safe state 2013-09-26 17:18:02 -07:00
crypto crypto: caam - Fixed the memory out of bound overwrite issue 2013-08-04 16:50:57 +08:00
dca
devfreq
dio
dma dma: pl330: Fix cyclic transfers 2013-08-11 18:35:21 -07:00
edac amd64_edac: Fix single-channel setups 2013-09-26 17:18:28 -07:00
eisa PCI changes for the v3.10 merge window: 2013-04-29 09:30:25 -07:00
extcon Removal of GENERIC_GPIO for v3.10 2013-05-09 09:59:16 -07:00
firewire firewire: fix libdc1394/FlyCap2 iso event regression 2013-08-04 16:50:38 +08:00
firmware efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse 2013-05-13 20:20:02 +01:00
gpio gpio/omap: don't use linear domain mapping for OMAP1 2013-06-25 23:13:40 -07:00
gpu drm/i915: fix gpu hang vs. flip stall deadlocks 2013-10-01 09:17:46 -07:00
hid HID: logitech-dj: validate output report details 2013-10-01 09:17:46 -07:00
hsi
hv Drivers: hv: balloon: Do not post pressure status if interrupted 2013-08-04 16:50:58 +08:00
hwmon hwmon: (k10temp) Add support for Fam16h (Kabini) 2013-09-07 22:09:59 -07:00
hwspinlock A single patch from Vincent extending OMAP's hwspinlock support to OMAP5. 2013-05-07 14:01:27 -07:00
i2c i2c: i2c-mxs: Use DMA mode even for small transfers 2013-08-14 22:59:06 -07:00
ide block_device_operations->release() should return void 2013-05-07 02:16:21 -04:00
idle Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux 2013-05-11 15:23:17 -07:00
iio iio: inkern: fix iio_convert_raw_to_processed_unlocked 2013-07-25 14:07:23 -07:00
infiniband iscsi-target: Fix iscsit_sequence_cmd reject handling for iser 2013-08-04 16:51:17 +08:00
input HID: Correct the USB IDs for the new Macbook Air 6 2013-09-26 17:18:15 -07:00
iommu intel-iommu: Fix leaks in pagetable freeing 2013-09-26 17:18:27 -07:00
ipack
irqchip ARM: SoC fixes for 3.10-rc 2013-06-22 09:44:45 -10:00
isdn isdn/kcapi: fix a small underflow 2013-05-20 13:38:14 -07:00
leds leds: wm831x-status: Request a REG resource 2013-09-26 17:18:27 -07:00
lguest lguest: clear cached last cpu when guest_set_pgd() called. 2013-05-08 10:49:18 +09:30
macintosh powerpc/windfarm: Fix noisy slots-fan on Xserve (rm31) 2013-08-11 18:35:20 -07:00
mailbox
md bcache: FUA fixes 2013-08-29 09:47:40 -07:00
media media: siano: fix divide error on 0 counters 2013-09-26 17:18:27 -07:00
memory drivers/memory: don't check resource with devm_ioremap_resource 2013-05-18 11:55:52 +02:00
memstick block_device_operations->release() should return void 2013-05-07 02:16:21 -04:00
message Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
mfd mfd: tps6586x: correct device name of the regulator cell 2013-06-24 12:37:47 +01:00
misc drivers/misc/hpilo: Correct panic when an AUX iLO is detected 2013-09-07 22:09:59 -07:00
mmc mmc: tmio_mmc_dma: fix PIO fallback on SDHI 2013-09-26 17:18:29 -07:00
mtd mtd: nand: fix NAND_BUSWIDTH_AUTO for x16 devices 2013-09-26 17:18:29 -07:00
net rt2800: fix wrong TX power compensation 2013-10-01 09:17:45 -07:00
nfc NFC: mei: Do not disable MEI devices from their remove routine 2013-05-21 10:48:41 +02:00
ntb NTB: Multiple NTB client fix 2013-05-15 10:58:22 -07:00
nubus nubus: Kill nubus_proc_detach_device() 2013-05-04 14:47:26 -04:00
of of: Fix missing memory initialization on FDT unflattening 2013-09-26 17:18:29 -07:00
oprofile
parisc parisc: Fix interrupt routing for C8000 serial ports 2013-08-11 18:35:21 -07:00
parport parisc: parport0: fix this legacy no-device port driver! 2013-06-01 14:46:42 +02:00
pci PCI / ACPI / PM: Clear pme_poll for devices in D3cold on wakeup 2013-10-01 09:17:45 -07:00
pcmcia pcmcia: at91_cf: fix gpio_get_value in at91_cf_get_status 2013-07-21 18:21:25 -07:00
pinctrl pinctrl: at91: fix get_pullup/down function return 2013-09-26 17:18:14 -07:00
platform drivers/platform/olpc/olpc-ec.c: initialise earlier 2013-08-29 09:47:38 -07:00
pnp Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2013-05-02 10:16:16 -07:00
power charger-manager: Ensure event is not used as format string 2013-07-13 11:42:26 -07:00
pps Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
ps3
ptp ptp_pch: fix error handling in pch_probe() 2013-05-25 21:24:15 -07:00
pwm drivers/pwm: don't check resource with devm_ioremap_resource 2013-05-18 11:55:58 +02:00
rapidio RAPIDIO: IDT_GEN2: Fix build error. 2013-07-28 16:30:07 -07:00
regulator mfd: tps6586x: correct device name of the regulator cell 2013-06-24 12:37:47 +01:00
remoteproc This pull request contains: 2013-05-07 14:04:56 -07:00
reset
rpmsg A small pull request consisting of: 2013-05-07 14:02:00 -07:00
rtc drivers/rtc/rtc-max77686.c: Fix wrong register 2013-09-14 06:54:57 -07:00
s390 SCSI: zfcp: fix schedule-inside-lock in scsi_device list loops 2013-08-29 09:47:39 -07:00
sbus
scsi SCSI: sd: Fix potential out-of-bounds access 2013-09-26 17:18:01 -07:00
sfi
sh
sn
spi spi: spi-davinci: Fix direction in dma_map_single() 2013-08-11 18:35:25 -07:00
ssb - Lots of cleanups from Artem, including deletion of some obsolete drivers 2013-05-09 10:15:46 -07:00
ssbi
staging iio: mxs-lradc: Remove useless check in read_raw 2013-09-26 17:18:05 -07:00
target target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out 2013-09-26 17:18:27 -07:00
tc
thermal drivers/thermal: don't check resource with devm_ioremap_resource 2013-05-18 11:57:30 +02:00
tty tty: disassociate_ctty() sends the extra SIGCONT 2013-09-26 17:18:04 -07:00
uio uio: UIO_DMEM_GENIRQ should depend on HAS_DMA 2013-05-21 10:13:23 -07:00
usb usb: gadget: fix a bug and a WARN_ON in dummy-hcd 2013-10-01 09:17:46 -07:00
uwb uwb: rename random32() to prandom_u32() 2013-04-29 18:28:43 -07:00
vfio vfio: fix crash on rmmod 2013-06-05 08:54:16 -06:00
vhost vhost_net: poll vhost queue after marking DMA is done 2013-09-14 06:54:56 -07:00
video atmel_lcdfb: blank the backlight on remove 2013-06-01 03:18:55 +08:00
virt
virtio virtio: support unlocked queue poll 2013-07-28 16:29:55 -07:00
vlynq
vme
w1 drivers/w1/masters: don't check resource with devm_ioremap_resource 2013-05-18 11:58:03 +02:00
watchdog drivers/watchdog: don't check resource with devm_ioremap_resource 2013-05-18 11:58:04 +02:00
xen xen-gnt: prevent adding duplicate gnt callbacks 2013-09-26 17:18:02 -07:00
zorro proc: Supply PDE attribute setting accessor functions 2013-05-01 17:29:18 -04:00
Kconfig ARM: arm-soc driver changes for 3.10 2013-05-04 12:31:18 -07:00
Makefile ARM: arm-soc driver changes for 3.10 2013-05-04 12:31:18 -07:00