When EEH error happens to one specific PE, some devices with drivers
supporting EEH won't except hotplug on the device. However, there
might have other deivces without driver, or with driver without EEH
support. For the case, we need do partial hotplug in order to make
sure that the PE becomes absolutely quite during reset. Otherise,
the PE reset might fail and leads to failure of error recovery.
The current code doesn't handle that 'mixed' case properly, it either
uses the error callbacks to the drivers, or tries hotplug, but doesn't
handle a PE (EEH domain) composed of a combination of the two.
The patch intends to support so-called "partial" hotplug for EEH:
Before we do reset, we stop and remove those PCI devices without
EEH sensitive driver. The corresponding EEH devices are not detached
from its PE, but with special flag. After the reset is done, those
EEH devices with the special flag will be scanned one by one.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When EEH error happens to one specific PE, the device drivers
of its attached EEH devices (PCI devices) are checked to see
the further action: reset with complete hotplug, or reset without
hotplug. However, that's not enough for those PCI devices whose
drivers can't support EEH, or those PCI devices without driver.
So we need do so-called "partial hotplug" on basis of PCI devices.
In the situation, part of PCI devices of the specific PE are
unplugged and plugged again after PE reset.
The patch changes pcibios_add_pci_devices() so that it can support
full hotplug and so-called "partial" hotplug based on device-tree
or real hardware. It's notable that pci_of_scan.c has been changed
for a bit in order to support the "partial" hotplug based on dev-tree.
Most of the generic code already supports that, we just need to
plumb it properly on our side.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, we're trasversing the EEH devices list using list_for_each_entry().
That's not safe enough because the EEH devices might be removed from
its parent PE while doing iteration. The patch replaces that with
list_for_each_entry_safe().
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we do normal hotplug, the PE (shadow EEH structure) shouldn't be
kept around.
However, we need to keep it if the hotplug an artifial one caused by
EEH errors recovery.
Since we remove EEH device through the PCI hook pcibios_release_device(),
the flag "purge_pe" passed to various functions is meaningless. So the patch
removes the meaningless flag and introduce new flag "EEH_PE_KEEP"
to save the PE while doing hotplug during EEH error recovery.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since pcibios_release_device() called by pci_stop_and_remove_bus_device()
has removed the device from the EEH cache, we needn't do that again.
Cc: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make some functions public in order to support hotplug on either specific
PCI bus or PCI device in future.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We will rely on pcibios_release_device() to remove the EEH cache
and unbind EEH device for the specific PCI device. So we shouldn't
hold the reference to the PCI device from EEH cache and EEH device.
Otherwise, pcibios_release_device() won't be called as we expected.
The patch removes the reference to the PCI device in EEH core.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
During Machine Check interrupt on pseries platform, R3 generally points to
memory region inside RTAS (FWNMI) area. We see r3 corruption because when RTAS
delivers the machine check exception it passes the address inside FWNMI area
with the top most bit set. This patch fixes this issue by masking top two bit
in machine check exception handler.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The presence or absence of EBB is advertised to userspace via the presence
or absence of PPC_FEATURE2_EBB in cpu_user_features2.
Because the kernel can be built without PMU support, we should only add
PPC_FEATURE2_EBB to cpu_user_features2 when we successfully register the
power8 PMU support.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The Kconfig symbol HOTPLUG was removed with commit 40b313608a ("Finally
eradicate CONFIG_HOTPLUG"). But there's still one select statement for
that symbol. It seems that select statement was added after the patch to
remove CONFIG_HOTPLUG was submitted. Anyhow, it is useless and can be
dropped.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In hard_irq_disable(), we accessed the PACA before we hard disabled
the interrupts, potentially causing a warning as get_paca() will
us debug_smp_processor_id().
Move that to after the disabling, and also use local_paca directly
rather than get_paca() to avoid several redundant and useless checks.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Module CRCs are implemented as absolute symbols that get resolved by
a linker script. We build an intermediate .o that contains an
unresolved symbol for each CRC. genksysms parses this .o, calculates
the CRCs and writes a linker script that "resolves" the symbols to
the calculated CRC.
Unfortunately the ppc64 relocatable kernel sees these CRCs as symbols
that need relocating and relocates them at boot. Commit d4703aef
(module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y)
added a hook to reverse the bogus relocations. Part of this patch
created a symbol at 0x0:
# head -2 /proc/kallsyms
0000000000000000 T reloc_start
c000000000000000 T .__start
This reloc_start symbol is causing lots of confusion to perf. It
thinks reloc_start is a massive function that stretches from 0x0 to
0xc000000000000000 and we get various cryptic errors out of perf,
including:
problem incrementing symbol count, skipping event
This patch removes the reloc_start linker script label and instead
defines it as PHYSICAL_START. We also need to wrap it with
CONFIG_PPC64 because the ppc32 kernel can set a non zero
PHYSICAL_START at compile time and we wouldn't want to subtract
it from the CRCs in that case.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
POWER8 comes with two different PVRs. This patch enables the additional
PVR in the cputable.
The existing entry (PVR=0x4b) is renamed to POWER8E and the new entry
(PVR=0x4d) is given POWER8.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
1. MEI_INTEROP_TIMEOUT is in seconds not in jiffies
so we use mei_secs_to_jiffies macro
While cold boot is fast this is relevant in resume
2. wait_event_interruptible_timeout can return with
-ERESTARTSYS so do not override it with -ETIMEDOUT
3.Adjust error message
Tested-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When powering up, we don't have to clean up the device state
nothing is connected.
Tested-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ME HW ready bit is down after hw reset was asserted or on error.
Only on error we need to enter the reset flow, additional reset
need to be prevented when reset was triggered during
initialization , power up/down or a reset is already in progress
Tested-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The recent addition of lockdep support to reservations and their subsequent
use by TTM showed up a number of potential problems with the way qxl was using
TTM objects.
a) it was allocating objects, and reserving them later without validating
underneath the reservation, which meant in extreme conditions the objects could
be evicted before the reservation ever used them.
b) it was reserving objects straight after allocating them, but with no
ability to back off should the reservations fail. It now allocates the necessary
objects then does a complete reservation pass on them to avoid deadlocks.
c) it had two lists per release tracking objects, unnecessary complicating
the reservation process.
This patch removes the dual object tracking, adds reservations ticket support
to the release and fence object handling. It then ports the internal fb
drawing code and the userspace facing ioctl to use the new interfaces properly,
along with cleanup up the error path handling in some codepaths.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In order to fix an issue with reservations we need to create the releases
as pre-pinned objects, this changes the placement interface and bo creation
interface to allow creating pinned objects to save nested reservations later.
This is just a stepping stone to main fix which follows to actually fix how
qxl deals with reservations.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Due to the nature of qxl hw we cannot queue operations while in an irq
context, so we queue these operations as best we can until atomic allocations
fail, and dequeue them later in a work queue.
Daniel looked over the locking on the list and agrees it should be sufficent.
The atomic allocs use no warn, as the last thing we want if we haven't memory
to allocate space for a printk in an irq context is more printks.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use the generic gma functions instead of the medfield functions where
they are identical.
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Use the generic gma functions instead of the oaktrail functions where
they are identical.
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
This takes care of the remaining chips using the old generic code.
We don't check if the pipe number is valid but the old code peeked in
the register map before checking anyways so just ignore it.
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
There is a slight difference in how we pick the palette register in the
generic function but we should be ok as long as psb_intel_crtc->pipe and
the register map is sane.
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
This patch makes psb use the gma_xxx counterparts that are identical. I
took them in one sweep as they should not cause any regressions.
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
This patch makes cdv use the gma_xxx counterparts that are identical. I
took them in one sweep as they should not cause any regressions.
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>