These were introduced by commit 03875ad52f (arm64: add
kc_offset_to_vaddr and kc_vaddr_to_offset macro).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Certain SoCs require the PCIe outbound mapping to be configured in
software. Add support for those chips.
[jonmason: Use %pap format when printing size_t to avoid warnings in 32-bit
build.]
[arnd: Use div64_u64() instead of "%" to avoid __aeabi_uldivmod link error
in 32-bit build.]
Signed-off-by: Ray Jui <rjui@broadcom.com>
Signed-off-by: Jon Mason <jonmason@broadcom.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Since polling for tx completion is handled whenever target to host
messages are received, removing the unnecessary polling mechanism for
send completion at HTC level.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Since polling for received messages not supported, remove unused
dl_is_polled.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Currently target to host (T2H) HTT messages are received at copy engine 1.
These messages are processed by HTC layer in both host and target.
To avoid HTC level processing overhead in both host and target,
the unused copy engine 5 is being used for receiving HTT T2H messages.
This will speedup the receive data processing as well as htt tx completion.
Hence host and target copy engine configuration tables are updated
to enable CE5 pipe. The in-direction HTT mapping is now pointing to CE5
for all HTT T2H.
Moreover HTT send completion messages are polled from HTC handler
as CE 4 is not interrupt-driven. For faster tx completion, CE4 polling
needs to be done whenever CE pipe which transports HTT Rx (target->host)
is processed. This avoids overhead of polling HTT messages from HTC
layer. Servicing CE 4 faster is helping to solve "failed to transmit
packet, dropping: -105".
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Some special copy engines delivers messages directly to HTT by
bypassing HTC layer. Hence exporting tx_completion and rx_handler
for delivering the data to HTT layer.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Register receive callbacks for every copy engines (CE) separately
instead of having common receive handler. Some of the copy engines
receives different type of messages (i.e HTT/HTC/pktlog) from target.
Hence to service them accordingly, register per copy engine receive
callbacks.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Register send completion callbacks for every copy engines (CE) separately
instead of having common completion handler. Since some of the copy
engines delivers different type of messages, per-CE callbacks help to
service them differently.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Export HTC layer tx and rx handlers. This will be used by HIF layer
for per-CE data processing. Instead of callback mechanism, HIF will
call appropriate upper layers API directly.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
PCM timer is not always used. For embedded device, we need an interface
to disable it when it is not needed, to shrink the kernel size and
memory footprint, here add CONFIG_SND_PCM_TIMER for it.
When both CONFIG_SND_PCM_TIMER and CONFIG_SND_TIMER is unselected,
about 25KB saving bonus we can get.
Please be noted that when disabled, those stubs who using pcm timer
(e.g. dmix, dsnoop & co) may work incorrectlly.
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jie Yang <yang.jie@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The Nocturn needs the MIDI_RAW_BYTES quirk, like other Novation devices.
Tested that the Nocturn shows up in aconnect, and that it can be used
as a control surface (using the xtor synthesizer patch editor).
Signed-off-by: Ricard Wanderlof <ricardw@axis.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
- New map-map property to describe the remapping of requester-ids,
and the routing of MSIs to controllers
- New hooks to make MSI domains per-device if required
- Extension of msi-parent to provide sideband information
- Extensive documentation for both msi-map and msi-parent
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWIOjtAAoJECPQ0LrRPXpDU9gP/iq0eIWB0t4ZssF/+SUhS3DJ
0UyinQojUhwU4NYEL7hO1F0A2ZEHPmlkx+pT6OaJPydzXSAN8rtHvL/jYkNHEdDZ
V6lHFCKDQvlmlO/bUAyqH86cnqzjGXrbah5w+lfKuxmN6yoj8YnKbRLFKK/Kc8XJ
iJ4ULspEMJnv+IXq45rwXcO1VYydQopTAmeMin4ebCT8p28dEjkkdpWl31k6jPVZ
d400pa2McTJQWb12w+Y/fAsKYN4NrvA+mfh12fyIDWcY0xnQX+abdZ4fnm6Y6hH5
Z0rnfsTfs3oz04szD2hnd+cxkAtVaGA0uxvqQC+0YAtHcOUXBI9/6WidQHU53gph
nrXygI7X4msKuJybjNcv+F5wxmVWwrTt4SqJZNYd2FLBQ19gnNMrUPTDc+qe80Ax
Z8N3/nYUxbKchObbrYjRE2qTN08RUNTng1jQKN/YiCHKPdSE1RjWc6pYDC1SjaZw
3Y1qS8mNNOocvUYERl7whiBTMc1JWsiAA+yJMzuj1uymb5z33BYshumvNyGI7c4/
YdU3pvwpTKZJQjoLHhH95hxyPdxO2oBC9wj3cjLI1lxMQrSow4ejEJLjLdZsu6mX
BnicSIrxWeXNFagK+/zMlh8nK6cLMeQGlypR+s0eFlKvrtdw2BN6WHAjJB6Iw7Sx
GiuOIoHIp1bVBFXHJltC
=7PT3
-----END PGP SIGNATURE-----
Merge tag 'msi-map-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Support for msi-map, and msi-parent update from Marc Zyngier:
- New map-map property to describe the remapping of requester-ids,
and the routing of MSIs to controllers
- New hooks to make MSI domains per-device if required
- Extension of msi-parent to provide sideband information
- Extensive documentation for both msi-map and msi-parent
This driver fails to link without CONFIG_IIO, since
there are no stubs for the iio_channels functions.
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Marek Belisko <marek@goldelico.com>
Acked-by: Nikolaus Schaller <hns@goldelico.com>
Sigma Designs Tango platforms provide a 27 MHz crystal oscillator.
Use it for clocksource, sched_clock, and delay_timer.
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
So far, we've always considered that for a given PCI device, its
MSI controller was either set by the architecture-specific
pcibios hook, or simply inherited from the host bridge.
This doesn't cover things like firmware-defined topologies like
msi-map (DT) or IORT (ACPI), which can provide information about
which MSI controller to use on a per-device basis.
This patch adds the necessary hook into the MSI code to allow this
feature, and provides the msi-map functionnality as a first
implementation.
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
So far, we have considered that the MSI domain for a device was
either set via the architecture-dependent pcibios implementation
or inherited from the host bridge.
As we're about to break that assumption, add pci_dev_msi_domain
which is the equivalent of pci_host_bridge_msi_domain, but for
a single device.
Other than moving things around a bit, this patch on its own
has no effect.
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
While msi-parent is used to point to the MSI controller that
works for all the devices behind a root complex, it doesn't
allow configurations where each individual device can be routed
to a separate MSI controller.
The msi-map property provides this flexibility (and much more),
so let's add a utility function that parses it, and return the
corresponding MSI domain.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The msi-map property is also used to identify the MSI controller
as a form of grown-up msi-parent property.
Looking it up is complicated enough, and since of_msi_map_rid
already implements this, let's turn it into an internal utility
function. We'll put that to good use later on.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that 126b16e2ad ("Docs: dt: add generic MSI bindings")
has made it into the tree, the time has come to get rid of the
old hack, and to parse msi-parent in its full glory.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that we have a function that implements the complexity of the
"msi-parent" property parsing, switch to that.
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that we have a function that implements the complexity of the
"msi-parent" property parsing, switch to that.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Since 126b16e2ad ("Docs: dt: add generic MSI bindings"),
the definition of "msi-parent" has evolved, while maintaining
some degree of compatibility. It can now express multiple MSI
controllers as parents, as well as some sideband data being
communicated to the controller.
This patch adds the parsing of the property, iterating over
the multiple parents until a suitable irqdomain is found.
It can also fallback to the original parsing if the old
binding is detected.
This support code gets used in the subsequent patches.
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Replace open coded generation PCI/MSI requester id with call to the
new function pci_msi_domain_get_msi_rid() which applies the "msi-map"
to the id value.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add pci_msi_domain_get_msi_rid() to return the MSI requester id (RID).
Initially needed by gic-v3 based systems. It will be used by follow on
patch to drivers/irqchip/irq-gic-v3-its-pci-msi.c
Initially supports mapping the RID via OF device tree. In the future,
this could be extended to use ACPI _IORT tables as well.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
With 64 VFs, we can easily overwhelm the AQ on the PF if we have too low
a limit on the number of AQ requests. This leads to ARQ overflow errors,
and occasionally VFs that fail to initialize.
Since we really only hit this condition on initial VF driver load, the
requests that we process are lightweight, so this extra work doesn't
cause problems for the PF driver.
Change-ID: I620221520d8af987df6ace9ba938ffaf22107681
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On some devices, in some systems, in some configurations, the VFs would
fail to initialize the first time you loaded the driver.
To correct this, increase the delay time for the init task slightly, and
wait longer before giving up.
If we enable VFs and load the VF driver in the same kernel as the PF
driver, we can totally overwhelm the PF driver with AQ requests because
all of the instances try to initialize at the same time.
To help alleviate this, stagger the initial scheduling of the init task
using the PCIe function as a multiplier. We mask off the function to
only three bits so no instance has to wait too long.
With these two changes, initializing 128 VFs on a single device goes
from four minutes to just a few seconds.
Change-ID: If3d8720c1c4e838ab36d8781d9ec295a62380936
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
1000Base_T_Optical got added to the function that figures out what
is supported when link is down but not when link is up. Add it in there
too so that we display the correct information.
Change-ID: I85ebcdfa7c02d898c44c673b1500552a53c8042e
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The vlan_features field was correctly being set to the same value as the
netdev features field. However, this was being done before the features
were actually being set up, leaving the vlan_features empty.
Also, after a reset, vlan_features will be incorrectly assigned the
previous netdev feature flags, which can contain VLAN feature bits. This
makes the VLAN code angry and will cause a stack dump.
To fix these issues, set up the netdev features first, then mask out the
VLAN feature bits when assigning vlan_features.
Change-ID: Ib0548869dc83cf6a841cb8697dd94c12359ba4d2
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the number of invalid messages from a VF is exceeded, the VF
will be disabled, due to the invalid messages. This happens if
other VF drivers (like DPDK) send a message through the driver's
mailbox (aka virtchannel) interface, but the message is not
supported by the i40e pf driver, such as CONFIG_PROMISCUOUS_MODE.
This patch changes the num_invalid_msgs in struct i40e_vf to record
the continuous invalid msgs, and it will be reset when a valid msg
is received.
Change-ID: Iaec42fd3dcdd281476b3518be23261dd46fc3718
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The XL710 hardware has a different interrupt moderation design
that can support a limit of total interrupts per second per
vector, in addition to the "number of interrupts per second"
controls already established in the driver. This combination
of hardware features allows us to set very low default latency
settings but minimize the total CPU utilization by not
making too many interrupts, should the user desire.
The current driver implementation is still enabling the dynamic
moderation in the driver, and only using the rx/tx-usecs
limit in ethtool to limit the interrupt rate per second, by default.
The new code implemented in this patch
2) adds init/use of the new "Interrupt Limit" register
3) adds ethtool knob to control/report the limits above
Usage is ethtool -C ethx rx-usecs-high <value> Where <value> is number
of microseconds to create a rate of 1/N interrupts per second,
regardless of rx-usecs or tx-usecs values. Since there is a credit based
scheme in the hardware, the rx-usecs and tx-usecs can be configured for
very low latency for short bursts, but once the credit runs out the
refill rate on the credits is limited by rx-usecs-high.
Change-ID: I3a1075d3296123b0f4f50623c779b027af5b188d
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adds support for setting a new bit in the Set Local LLDP MIB AQ command
Type field. When set to 1, the bit indicates to FW that Apps should be
treated as non-willing. When 0, FW behaves as before.
Change-ID: I0d2101c1606c59c7188d3e6a0c7810e0f205233a
Signed-off-by: Greg Bowers <gregory.j.bowers@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add an ethtool priv flag to enable and disable printing
the VEB statistics.
Change-ID: I7654054a3a73b08aa8310d94ee8fce6219107dd8
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Two defines that are not used are causing customer confusion - remove
them.
Change-ID: Icef0325aca8e0f4fcdfc519e026bdd375e791200
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Allow the nvmupdate application to decide when a read or write error
should be exposed to the user. Since the application needs to use
write probes to find the ReadOnly sections on a potentially unknown NVM
version in the HW and read probes to check the status of the last write,
some error messages are expected, but need not be shown to the users.
The driver doesn't know which are ignorable from real errors, so needs
to let the application make the decision.
Change-ID: I78fca8ab672bede11c10c820b83c26adfd536d03
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add const to functions that return strings that aren't going to be
modified. This addresses some reported compile complaints.
Change-ID: Ic56b1e814ab4d23a50480e7fdec652445f776ee8
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cut down on the number of startup log entries by putting a couple behind
debug flags and combining a couple others into a single line.
Change-ID: I708089f086308f84d43f8b6f0e8a634a02d058fb
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
As per Eric Dumazet's previous patches:
(see commit (24d2e4a507) - tg3: use napi_complete_done())
Quoting verbatim:
Using napi_complete_done() instead of napi_complete() allows
us to use /sys/class/net/ethX/gro_flush_timeout
GRO layer can aggregate more packets if the flush is delayed a bit,
without having to set too big coalescing parameters that impact
latencies.
</end quote>
Tested
configuration: low latency via ethtool -C ethx adaptive-rx off
rx-usecs 10 adaptive-tx off tx-usecs 15
workload: streaming rx using netperf TCP_MAERTS
igb:
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result: 941.48 10^6bits/s over 1.000 seconds ending at 1440193171.589
Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 1176930056 1475.36 797726 16384.00 71905
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result: 941.49 10^6bits/s over 0.997 seconds ending at 1440193142.763
Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 1175182320 50476.00 23282 16384.00 71816
i40e:
Hard to test because the traffic is incoming so fast (24Gb/s) that GRO
always receives 87kB, even at the highest interrupt rate.
Other drivers were only compile tested.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The code in i40e and i40evf is using an "IN_NETPOLL" flag that has never
added any value due to the fact that the Rx clean-up is handled in NAPI.
As such the flag was set, the queue was scheduled via NAPI, and then polled
from the netpoll controller and if any Rx packets were processed the were
processed in the wrong context.
In addition the flag itself just added an unneeded conditional to the
hot-path so it can safely be dropped and save us a few instructions.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The polling routine for i40e was rounding up the budget for Rx cleanup to
1. This is incorrect as the netpoll poll call is expecting no Rx to be
processed as the budget passed was 0.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The device tree property "msi-map" specifies how to create the PCI
requester id used in some MSI controllers. Add a new function
of_msi_map_rid() that finds the msi-map property and applies its
translation to a given requester id.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Currently msi-parent is used by a few bindings to describe the
relationship between a PCI root complex and a single MSI controller, but
this property does not have a generic binding document.
Additionally, msi-parent is insufficient to describe more complex
relationships between MSI controllers and devices under a root complex,
where devices may be able to target multiple MSI controllers, or where
MSI controllers use (non-probeable) sideband information to distinguish
devices.
This patch adds a generic binding for mapping PCI devices to MSI
controllers. This document covers msi-parent, and a new msi-map property
(specific to PCI*) which may be used to map devices (identified by their
Requester ID) to sideband data for each MSI controller that they may
target.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When we create a generic MSI domain, that MSI_FLAG_USE_DEF_CHIP_OPS
is set, and that any of .mask or .unmask are NULL in the irq_chip
structure, we set them to pci_msi_[un]mask_irq.
This is a bad idea for at least two reasons:
- PCI_MSI might not be selected, kernel fails to build (yes, this is
legitimate, at least on arm64!)
- This may not be a PCI/MSI domain at all (platform MSI, for example)
Either way, this looks wrong. Move the overriding of mask/unmask to
the PCI counterpart, and panic is any of these two methods is not
set in the core code (they really should be present).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1444760085-27857-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
New users of these files still start showing up in linux-next, so it's
better to have a migration strategy. All existing users as of 4.3-rc4
are converted to use linux/io-64-nonatomic-*.h, and after 4.4-rc1
we can change all the new ones that have come in since, and then
remove this file again.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: LKP project <lkp@linux.intel.com>
to cover the entire kernel range. This fixes a triple fault on
non-PAE kernels when booting on 32-bit EFI due to accessing an
unmapped GDT in efi_call_phys_prolog() - Paolo Bonzini
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWIMkHAAoJEC84WcCNIz1VS0cQAIsUoM/Rr4kDBeDsRDEGF02E
gxjIejflBnis75dXigPZjT830w4bcs64HdivtCTQUgX4dqcQOFUK6NUXmxzUvTGd
uDBZR7H7uvFlQ8XnW3NkS6D4dRQJXtcI37nLcgWM6lPlR7+s5IAaeY3tGx0rRr/p
mpDeyknK7KjHXdJWBjzOdQaIFT0fyzZjIWI0+M1XpUntT5dHu6PUQuxSl367aq5p
B13JYLD99Ahm320yg0eP3w3hB5aQiNS6IBXP+JiQrUpzqbhsEmS410tCkFCIh9xO
1qzGZh8acs8RwONJePYgb2P8Vox4alAy5nBjUhJMu1dvycFaDfGX70GVhUCRv3sx
aldBv+SQ8tyDuuxEPQjlW5RKUlqq9aYuzIBD76SODqneQgA5eJveZPUOOBjK3eUD
TAWKo4XmT6J2QDY7P2+lVGqpIH3VqHriZ2yR05+R4rlmYvh8r85aQKOeGK72x4YS
atnQUK4z3hVCF1/rpeZJiCekgcxwmVmJjk5uwR+dcitXU+uu+owb5n6b3e/1rJrP
mumX+2NXD0GCrjaW5t1s+pDjQXub0ahx1SA2GArDKH493w3tAjc6Yp1FC3OxgOA1
AOhtR33TEUbjaULBS0DOt/GbkkDzN2bRj/eI2zMyXmxK0K5cE6JcrsWcICdHubLu
81+ewHa8yd7tQz6+YwjV
=2uOg
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fix from Matt Fleming:
- Ensure that the identity mapping in initial_page_table is updated
to cover the entire kernel range. This fixes a triple fault on
non-PAE kernels when booting on 32-bit EFI due to accessing an
unmapped GDT in efi_call_phys_prolog(). (Paolo Bonzini)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On 32-bit systems, the initial_page_table is reused by
efi_call_phys_prolog as an identity map to call
SetVirtualAddressMap. efi_call_phys_prolog takes care of
converting the current CPU's GDT to a physical address too.
For PAE kernels the identity mapping is achieved by aliasing the
first PDPE for the kernel memory mapping into the first PDPE
of initial_page_table. This makes the EFI stub's trick "just work".
However, for non-PAE kernels there is no guarantee that the identity
mapping in the initial_page_table extends as far as the GDT; in this
case, accesses to the GDT will cause a page fault (which quickly becomes
a triple fault). Fix this by copying the kernel mappings from
swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
identity mapping.
For some reason, this is only reproducible with QEMU's dynamic translation
mode, and not for example with KVM. However, even under KVM one can clearly
see that the page table is bogus:
$ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize
$ gdb
(gdb) target remote localhost:1234
(gdb) hb *0x02858f6f
Hardware assisted breakpoint 1 at 0x2858f6f
(gdb) c
Continuing.
Breakpoint 1, 0x02858f6f in ?? ()
(gdb) monitor info registers
...
GDT= 0724e000 000000ff
IDT= fffbb000 000007ff
CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690
...
The page directory is sane:
(gdb) x/4wx 0x32b7000
0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063
(gdb) x/4wx 0x3398000
0x3398000: 0x00000163 0x00001163 0x00002163 0x00003163
(gdb) x/4wx 0x3399000
0x3399000: 0x00400003 0x00401003 0x00402003 0x00403003
but our particular page directory entry is empty:
(gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4
0x32b7070: 0x00000000
[ It appears that you can skate past this issue if you don't receive
any interrupts while the bogus GDT pointer is loaded, or if you avoid
reloading the segment registers in general.
Andy Lutomirski provides some additional insight:
"AFAICT it's entirely permissible for the GDTR and/or LDT
descriptor to point to unmapped memory. Any attempt to use them
(segment loads, interrupts, IRET, etc) will try to access that memory
as if the access came from CPL 0 and, if the access fails, will
generate a valid page fault with CR2 pointing into the GDT or
LDT."
Up until commit 23a0d4e8fa ("efi: Disable interrupts around EFI
calls, not in the epilog/prolog calls") interrupts were disabled
around the prolog and epilog calls, and the functional GDT was
re-installed before interrupts were re-enabled.
Which explains why no one has hit this issue until now. ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[ Updated changelog. ]