linux-uconsole/drivers
Jack Morgenstein ab7ebca418 IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level
commit fb7a91746a upstream.

A warning message during SRIOV multicast cleanup should have actually been
a debug level message. The condition generating the warning does no harm
and can fill the message log.

In some cases, during testing, some tests were so intense as to swamp the
message log with these warning messages, causing a stall in the console
message log output task. This stall caused an NMI to be sent to all CPUs
(so that they all dumped their stacks into the message log).
Aside from the message flood causing an NMI, the tests all passed.

Once the message flood which caused the NMI is removed (by reducing the
warning message to debug level), the NMI no longer occurs.

Sample message log (console log) output illustrating the flood and
resultant NMI (snippets with comments and modified with ... instead
of hex digits, to satisfy checkpatch.pl):

 <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
 *** About 4000 almost identical lines in less than one second ***
 <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
 INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...)
 *** { 17} above indicates that CPU 17 was the one that stalled ***
 sending NMI to all CPUs:
 ...
 NMI backtrace for cpu 17
 CPU: 17 PID: 45909 Comm: kworker/17:2
 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013
 Workqueue: events fb_flashcursor
 task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e......
 RIP: 0010:[ffffffff81......]  [ffffffff81......] io_serial_in+0x15/0x20
 RSP: 0018:ffff88064e257cb0  EFLAGS: 00000002
 RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000......
 RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81......
 RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000......
 R10: 0000000000...... R11: ffff88064e...... R12: 0000000000......
 R13: 0000000000...... R14: ffffffff81...... R15: 0000000000......
 FS:  0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080......
 CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000......
 DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000......
 DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000......
 Stack:
 ffff88064e...... ffffffff81...... ffffffff81...... 0000000000......
 ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81......
 ffffffff81...... ffff88064e...... ffffffff81...... 0000000000......
 Call Trace:
[<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0
[<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30
[<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140
[<ffffffff813cb5fa>] uart_console_write+0x3a/0x80
[<ffffffff813d0aae>] serial8250_console_write+0xae/0x140
[<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0
[<ffffffff8107d6cf>] console_unlock+0x3bf/0x400
[<ffffffff813503cd>] fb_flashcursor+0x5d/0x140
[<ffffffff81355c30>] ? bit_clear+0x120/0x120
[<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[<ffffffff810a5aef>] kthread+0xcf/0xe0
[<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[<ffffffff81645858>] ret_from_fork+0x58/0x90
[<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6

As indicated in the stack trace above, the console output task got swamped.

Fixes: b9c5d6a643 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:27:01 +02:00
..
accessibility
acpi ACPI / power: Avoid maybe-uninitialized warning 2017-04-27 09:09:33 +02:00
amba
android ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct 2016-11-10 16:36:33 +01:00
ata libata: apply MAX_SEC_1024 to all CX1-JB*-HP devices 2017-02-09 08:02:45 +01:00
atm
auxdisplay
base base/memory, hotplug: fix a kernel oops in show_valid_zones() 2017-02-09 08:02:47 +01:00
bcma bcma: use (get|put)_device when probing/removing device driver 2017-03-12 06:37:30 +01:00
block drbd: avoid redefinition of BITS_PER_PAGE 2017-05-08 07:46:01 +02:00
bluetooth Bluetooth: Add another AR3012 04ca:3018 device 2017-03-15 09:57:11 +08:00
bus bus: vexpress-config: fix device reference leak 2017-01-19 20:17:22 +01:00
cdrom
char virtio-console: avoid DMA from stack 2017-04-21 09:30:07 +02:00
clk clk: Make x86/ conditional on CONFIG_COMMON_CLK 2017-05-14 13:32:55 +02:00
clocksource clocksource/exynos_mct: Clear interrupt when cpu is shut down 2017-01-26 08:23:48 +01:00
connector
cpufreq cpufreq: Restore policy min/max limits on CPU online 2017-03-30 09:35:18 +02:00
cpuidle ARM: cpuidle: Fix error return code 2016-10-16 17:36:15 +02:00
crypto crypto: caam - fix RNG deinstantiation error checking 2017-04-18 07:14:36 +02:00
dca
devfreq
dio
dma dmaengine: ipu: Make sure the interrupt routine checks all interrupts. 2017-03-12 06:37:30 +01:00
dma-buf
edac EDAC: Increment correct counter in edac_inc_ue_error() 2016-09-07 08:32:41 +02:00
eisa
extcon extcon: max77843: Use correct size for reading the interrupt register 2016-05-04 14:48:54 -07:00
firewire firewire: net: fix fragmented datagram_size off-by-one 2016-11-10 16:36:35 +01:00
firmware efi: Expose non-blocking set_variable() wrapper to efivars 2016-05-04 14:48:49 -07:00
fmc
fpga
gpio gpio: mpc8xxx: Correct irq handler function 2016-10-28 03:01:25 -04:00
gpu drm/ttm: fix use-after-free races in vm fault handling 2017-05-14 13:32:59 +02:00
hid HID: wacom: Fix poor prox handling in 'wacom_pl_irq' 2017-02-09 08:02:46 +01:00
hsi
hv hv: don't reset hv_context.tsc_page on crash 2017-04-27 09:09:34 +02:00
hwmon hwmon: (g762) Fix overflows and crash seen when writing limit attributes 2017-01-12 11:22:48 +01:00
hwspinlock drivers/hwspinlock: fix race between radix tree insertion and lookup 2016-02-25 12:01:23 -08:00
hwtracing intel_th: Fix a deadlock in modprobing 2016-08-10 11:49:30 +02:00
i2c i2c: fix kernel memory disclosure in dev interface 2017-01-19 20:17:20 +01:00
ide
idle intel_idle: Support for Intel Xeon Phi Processor x200 Product Family 2016-09-15 08:27:46 +02:00
iio iio: bmg160: reset chip when probing 2017-04-12 12:38:33 +02:00
infiniband IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level 2017-05-20 14:27:01 +02:00
input Input: i8042 - add Clevo P650RS to the i8042 reset list 2017-05-02 21:19:55 -07:00
iommu iommu/vt-d: Fix NULL pointer dereference in device_to_iommu 2017-03-30 09:35:18 +02:00
ipack
irqchip irqchip/irq-imx-gpcv2: Fix spinlock initialization 2017-04-21 09:30:06 +02:00
isdn isdn/gigaset: fix NULL-deref at probe 2017-03-26 12:13:19 +02:00
leds leds: ktd2692: avoid harmless maybe-uninitialized warning 2017-05-14 13:32:55 +02:00
lguest
lightnvm lightnvm: put bio before return 2016-09-24 10:07:35 +02:00
macintosh
mailbox
mcb mcb: Fixed bar number assignment for the gdd 2016-06-01 12:15:53 -07:00
md dm era: save spacemap metadata root after the pre-commit 2017-05-20 14:27:00 +02:00
media xc2028: unlock on error in xc2028_set_config() 2017-05-02 21:19:47 -07:00
memory memory: omap-gpmc: Fix omap gpmc EXTRADELAY timing 2016-07-27 09:47:35 -07:00
memstick memstick: rtsx_usb_ms: Manage runtime PM when accessing the device 2016-10-28 03:01:35 -04:00
message
mfd mfd: core: Fix device reference leak in mfd_clone_cell 2016-11-26 09:54:53 +01:00
misc mei: bus: fix mei_cldev_enable KDoc 2017-01-12 11:22:47 +01:00
mmc mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card 2017-04-27 09:09:33 +02:00
mtd mtd: avoid stack overflow in MTD CFI code 2017-05-08 07:46:01 +02:00
net bnxt_en: allocate enough space for ->ntp_fltr_bmap 2017-05-14 13:32:58 +02:00
nfc mei: bus: fix received data size check in NFC fixup 2016-11-18 10:48:36 +01:00
ntb ntb_transport: Pick an unused queue 2017-02-23 17:43:10 +01:00
nubus
nvdimm libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat 2017-04-21 09:30:06 +02:00
nvme nvme: Call pci_disable_device on the error path. 2016-09-15 08:27:51 +02:00
nvmem nvmem: mxs-ocotp: fix buffer overflow in read 2016-05-11 11:21:21 +02:00
of of: silence warnings due to max() usage 2016-11-15 07:46:39 +01:00
oprofile
parisc
parport parport: fix attempt to write duplicate procfiles 2017-03-30 09:35:17 +02:00
pci PCI: Do any VF BAR updates before enabling the BARs 2017-03-30 09:35:20 +02:00
pcmcia pcmcia: db1xxx_ss: fix last irq_to_gpio user 2016-04-20 15:42:09 +09:00
perf drivers/perf: arm_pmu: Fix leak in error path 2016-10-07 15:23:41 +02:00
phy phy: qcom-usb-hs: Add depends on EXTCON 2017-05-14 13:32:57 +02:00
pinctrl pinctrl: qcom: Don't clear status bit on irq_unmask 2017-03-31 09:49:53 +02:00
platform platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event 2017-04-21 09:30:07 +02:00
pnp PNP: Add Broadwell to Intel MCH size workaround 2016-08-16 09:30:48 +02:00
power power: supply: bq24190_charger: Handle fault before status on interrupt 2017-05-14 13:32:54 +02:00
powercap
pps pps: do not crash when failed to register 2016-08-10 11:49:25 +02:00
ps3
ptp
pwm pwm: pca9685: Fix period change with same duty cycle 2017-03-15 09:57:14 +08:00
rapidio
ras
regulator regulator: core: Clear the supply pointer if enabling fails 2017-05-02 21:19:49 -07:00
remoteproc remoteproc: Fix potential race condition in rproc_add 2016-08-20 18:09:20 +02:00
reset
rpmsg
rtc rtc: tegra: Implement clock handling 2017-04-21 09:30:07 +02:00
s390 s390/zcrypt: Introduce CEX6 toleration 2017-03-30 09:35:20 +02:00
sbus
scsi scsi: mac_scsi: Fix MAC_SCSI=m option when SCSI=m 2017-05-14 13:32:57 +02:00
sfi
sh drivers: sh: Restore legacy clock domain on SuperH platforms 2016-03-09 15:34:49 -08:00
sn
soc soc: qcom/spm: shut up uninitialized variable warning 2016-09-24 10:07:42 +02:00
spi spi: mvebu: fix baudrate calculation for armada variant 2017-01-15 13:41:36 +01:00
spmi
ssb ssb: Fix error routine when fallback SPROM fails 2017-01-09 08:07:42 +01:00
staging staging: comedi: jr3_pci: cope with jiffies wraparound 2017-05-20 14:26:59 +02:00
target iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement 2017-05-20 14:26:58 +02:00
tc
thermal thermal: hwmon: Properly report critical temperature in sysfs 2017-01-09 08:07:44 +01:00
thunderbolt thunderbolt: Fix double free of drom buffer 2016-06-01 12:15:53 -07:00
tty serial: 8250_omap: Fix probe and remove for PM runtime 2017-05-14 13:32:57 +02:00
uio uio: fix dmem_region_start computation 2016-10-31 04:13:59 -06:00
usb usb: hub: Do not attempt to autosuspend disconnected devices 2017-05-20 14:26:59 +02:00
uwb uwb: hwa-rc: fix NULL-deref at probe 2017-03-30 09:35:17 +02:00
vfio vfio/type1: Remove locked page accounting workqueue 2017-05-20 14:27:00 +02:00
vhost vhost/scsi: fix reuse of &vq->iov[out] in response 2016-09-15 08:27:53 +02:00
video xen, fbfront: fix connecting to backend 2017-04-21 09:30:06 +02:00
virt
virtio virtio_balloon: init 1st buffer in stats vq 2017-03-31 09:49:53 +02:00
vlynq
vme vme: Fix wrong pointer utilization in ca91cx42_slave_get 2017-01-19 20:17:21 +01:00
w1 w1: ds2490: USB transfer buffers need to be DMAable 2017-03-12 06:37:29 +01:00
watchdog watchdog: rc32434_wdt: fix ioctl error handling 2016-04-12 09:08:54 -07:00
xen xen/acpi: upload PM state from init-domain to Xen 2017-03-30 09:35:18 +02:00
zorro
Kconfig
Makefile usb: Make sure usb/phy/of gets built-in 2017-05-20 14:26:59 +02:00