OpenFirmware wasn't quite following the protocol described in boot.txt
and the kernel has detected this through use of the sentinel value
in boot_params. OFW does zero out almost all of the stuff that it should
do, but not the sentinel.
This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC
support.
OpenFirmware has now been fixed. However, it would be nice if we could
maintain Linux compatibility with old firmware versions. To do that, we just
have to avoid zeroing out olpc_ofw_header.
OFW does not write to any other parts of the header that are being zapped
by the sentinel-detection code, and all users of olpc_ofw_header are
somewhat protected through checking for the OLPC_OFW_SIG magic value
before using it. So this should not cause any problems for anyone.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/20130809221420.618E6FAB03@dev.laptop.org
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.9+
- ACPI-based memory hotplug stopped working after a recent change,
because it's not possible to associate sufficiently many "physical"
devices with one ACPI device object due to an artificial limit.
Fix from Rafael J Wysocki removes that limit and makes memory
hotplug work again.
- A change made in 3.9 uncovered a bug in the ACPI processor driver
preventing NUMA nodes from being put offline due to an ordering
issue. Fix from Yasuaki Ishimatsu changes the ordering to make
things work again.
- One of the recent ACPI video commits (that hasn't been reverted
so far) uncovered a bug in the code handling quirky BIOSes that
caused some Asus machines to boot with backlight completely off
which made it quite difficult to use them afterward. Fix from
Felipe Contreras improves the quirk to cover this particular
case correctly.
- A cpufreq user space interface change made in 3.10 inadvertently
renamed the ignore_nice_load sysfs attribute to ignore_nice which
resulted in some confusion. Fix from Viresh Kumar changes the name
back to ignore_nice_load.
- An initialization ordering change made in 3.9 broke cpufreq on
loongson2 boards. Fix from Aaro Koskinen restores the correct
initialization ordering there.
- Fix breakage resulting from a mistake made in 3.9 and causing the
detection of some graphics adapters (that were detected correctly
before) to fail. There are two objects representing the same PCIe
port in the affected systems' ACPI tables and both appear as
"enabled" and we are expected to guess which one to use. We used
to choose the right one before by pure luck, but when we tried to
address another similar corner case, the luck went away. This time
we try to make our guessing a bit more educated which is reported
to work on those systems.
- The /proc/acpi/wakeup interface code is missing some locking
which may lead to breakage if that file is written or read during
hotplug of wakeup devices. That should be rare but still possible,
so it's better to start using the appropriate locking there.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJSBQJtAAoJEKhOf7ml8uNs9FQP/1dhuih2SxwGG07//58AdQg8
lIktdutgAxO8NvhYwP8SWH+SasBywkn2I6LSK2Ys8UMfUrfCMI/8jK1JMRmj+IfN
tcd/13CdxV/sFDnnoQf1N/FCoskn8nNfbSw030lR3vHcBSc6CpqGHnQc8A78FNuC
HohtV4xhHU3SCHOOiCP0HeVC3F3JMAlfRMbm/du1U9lrRpKXvU65J0JGvVG/9jn/
HNd3OVvvJ7l7kbNy7D99MO2QbX8O21qgKnd26Rk2gdERygSdrrBG4uCL1TMH1+Kn
Q4tKvUR6Ja/b7f8lyWjPApjhCsGYmzeYfSJtEFZunZwcHShmgliiofD1E3lLhnY+
ZhUTgaalIETdcZy2GhzX42hi9YfqGUSeLccvqBEAC2eWCHhvKGw6A7kyBf6F9Sre
eQpnnmOeKMNFwqEniYEcABWoG+XEtKWzxgIn9sX2aW/qkWLa/8i6PYOWe6Aze1Mr
8DduFy6HuU9LSALdlPKhl8zlD1y3vY5UDNFN3X2Vpzx33NYv5Ft2aykFSxWpzgb9
hVzWaYLS25SrxPncTgJZL8eR+/u7eY2B/OMczpfP+PmTztyCSTdPa0t6zn5xr5vq
vLgfCh9lPXEaMz85Jlw5PZ+ZBAQ5PfjmIVB7ac0yM/G/J9N+h7q/J/BzbQEJDQI0
qovn9XAIW6wpG9MiMJs3
=0u8M
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
- ACPI-based memory hotplug stopped working after a recent change,
because it's not possible to associate sufficiently many "physical"
devices with one ACPI device object due to an artificial limit. Fix
from Rafael J Wysocki removes that limit and makes memory hotplug
work again.
- A change made in 3.9 uncovered a bug in the ACPI processor driver
preventing NUMA nodes from being put offline due to an ordering
issue. Fix from Yasuaki Ishimatsu changes the ordering to make
things work again.
- One of the recent ACPI video commits (that hasn't been reverted so
far) uncovered a bug in the code handling quirky BIOSes that caused
some Asus machines to boot with backlight completely off which made
it quite difficult to use them afterward. Fix from Felipe Contreras
improves the quirk to cover this particular case correctly.
- A cpufreq user space interface change made in 3.10 inadvertently
renamed the ignore_nice_load sysfs attribute to ignore_nice which
resulted in some confusion. Fix from Viresh Kumar changes the name
back to ignore_nice_load.
- An initialization ordering change made in 3.9 broke cpufreq on
loongson2 boards. Fix from Aaro Koskinen restores the correct
initialization ordering there.
- Fix breakage resulting from a mistake made in 3.9 and causing the
detection of some graphics adapters (that were detected correctly
before) to fail. There are two objects representing the same PCIe
port in the affected systems' ACPI tables and both appear as
"enabled" and we are expected to guess which one to use. We used to
choose the right one before by pure luck, but when we tried to
address another similar corner case, the luck went away. This time
we try to make our guessing a bit more educated which is reported to
work on those systems.
- The /proc/acpi/wakeup interface code is missing some locking which
may lead to breakage if that file is written or read during hotplug
of wakeup devices. That should be rare but still possible, so it's
better to start using the appropriate locking there.
* tag 'pm+acpi-3.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: Try harder to resolve _ADR collisions for bridges
cpufreq: rename ignore_nice as ignore_nice_load
cpufreq: loongson2: fix regression related to clock management
ACPI / processor: move try_offline_node() after acpi_unmap_lsapic()
ACPI: Drop physical_node_id_bitmap from struct acpi_device
ACPI / PM: Walk physical_node_list under physical_node_lock
ACPI / video: improve quirk check in acpi_video_bqc_quirk()
Pull media fixes from Mauro Carvalho Chehab:
"Some driver fixes (em28xx, coda, usbtv, s5p, hdpvr and ml86v7667) and
a fix for media DocBook"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] em28xx: fix assignment of the eeprom data
[media] hdpvr: fix iteration over uninitialized lists in hdpvr_probe()
[media] usbtv: fix dependency
[media] usbtv: Throw corrupted frames away
[media] usbtv: Fix deinterlacing
[media] v4l2: added missing mutex.h include to v4l2-ctrls.h
[media] DocBook: upgrade media_api DocBook version to 4.2
[media] ml86v7667: fix compile warning: 'ret' set but not used
[media] s5p-g2d: Fix registration failure
[media] media: coda: Fix DT driver data pointer for i.MX27
[media] s5p-mfc: Fix input/output format reporting
This patch fixed the condition of extend_desc for jumbo frame.
There is no check routine for extend_desc in the stmmac_jumbo_frm function.
Even though extend_desc is set if dma_tx is used instead of dma_etx.
It causes kernel panic.
Signed-off-by: Byungho An <bh74.an@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull HID fix from Jiri Kosina:
"Revert of a patch which breaks enumeration workaround in
hid-logitech-dj"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
Revert "HID: hid-logitech-dj: querying_devices was never set"
- omapdss: compilation fix and DVI fix for PandaBoard
- mxsfb: fix colors when using 18bit LCD bus
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSBMT+AAoJEPo9qoy8lh71q+EP/jALUWxJQUpswFsuhKu6Mubf
IUg9F06EH5D+P5IIUWTE5rU5rtEQVm6ANg3gfTiKmjepxgF0zb/odpFjNmhKiFXu
HKqiJWixi07veAxWUidkF2YEPzmoRHQ7rLCVrE4Tg4n0MOzD43sp6HrNWHY3YN4m
XT+V8ZFN0AyLBQveE70tGjuk8Th+tMb4Xoja6vGhpE4MvQJpdBHu9IencBIVlINW
IXbVTCbApOVSbkLuZkXIivrhbWZQhmIYqDO5/kwgaTI7xpTcYlSzjHrhnbHZ/Lw9
ZNCC6DP63v2PsPwakjSbAyQeFIj8k/iqvlpcGK7p7RHq2vIA/UL1Vr3gN/nNGC+k
/ZeVyGOFfLKWJCvIcVEkQpZcQPg1bJafIEc4DexQc5kKU07GcnLi2bEwY7JezbYm
skmiMTaDN4M4JFBltVy69bW/92Z1b1O2Ei+zulwP39OcgHLTooNqk8Y780EQROSM
enuPtgxWiKjJJ1VJy1p4ZWqIn7mtaXuKsFsdMOC9ClQBpBdxrBw9PtV3rjkht7ov
uU/5YPb/ccuyGW7dRK7zrmMMryaEq/YhJS/XKzT/5Xju2XvZk1XSSTJjvLeFyD3+
Qwh+lYPCHLQM2ICajiAGSPmpwdvTRY9ViexnWESaZNwe8p1if0PIyE/tHpYsAYCy
mzzoatK5N96HjIdrXBPK
=bHE/
-----END PGP SIGNATURE-----
Merge tag 'fbdev-fixes-3.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux
Pull fbdev fixes from Tomi Valkeinen:
- omapdss: compilation fix and DVI fix for PandaBoard
- mxsfb: fix colors when using 18bit LCD bus
* tag 'fbdev-fixes-3.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
ARM: OMAP: dss-common: fix Panda's DVI DDC channel
video: mxsfb: fix color settings for 18bit data bus and 32bpp
OMAPDSS: analog-tv-connector: compile fix
Pull drm fixes from Dave Airlie:
"Mostly radeon, more fixes for dynamic power management which is is off
by default for this release anyways, but there are a large number of
testers, so I'd like to keep merging the fixes.
Otherwise, radeon UVD fixes affecting suspend/resume regressions, i915
regression fixes, one for your mac mini, ast, mgag200, cirrus ttm fix
and one regression fix in the core"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (25 commits)
drm: Don't pass negative delta to ktime_sub_ns()
drm/radeon: make missing smc ucode non-fatal
drm/radeon/dpm: require rlc for dpm
drm/radeon/cik: use a mutex to properly lock srbm instanced registers
drm/radeon: remove unnecessary unpin
drm/radeon: add more UVD CS checking
drm/radeon: stop sending invalid UVD destroy msg
drm/radeon: only save UVD bo when we have open handles
drm/radeon: always program the MC on startup
drm/radeon: fix audio dto calculation on DCE3+ (v3)
drm/radeon/dpm: disable sclk ss on rv6xx
drm/radeon: fix halting UVD
drm/radeon/dpm: adjust power state properly for UVD on SI
drm/radeon/dpm: fix spread spectrum setup (v2)
drm/radeon/dpm: adjust thermal protection requirements
drm/radeon: select audio dto based on encoder id for DCE3
drm/radeon: properly handle pm on gpu reset
drm/i915: do not disable backlight on vgaswitcheroo switch off
drm/i915: Don't call encoder's get_config unless encoder is active
drm/i915: avoid brightness overflow when doing scale
...
This is a regression introduced by:
commit fe5c3561e6
Author: stephen hemminger <stephen@networkplumber.org>
Date: Sat Jul 13 10:18:18 2013 -0700
vxlan: add necessary locking on device removal
The problem is that vxlan_dellink(), which is called with RTNL lock
held, tries to flush the workqueue synchronously, but apparently
igmp_join and igmp_leave work need to hold RTNL lock too, therefore we
have a soft lockup!
As suggested by Stephen, probably the flush_workqueue can just be
removed and let the normal refcounting work. The workqueue has a
reference to device and socket, therefore the cleanups should work
correctly.
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a regression introduced by:
commit 3fc2de2fab
Author: stephen hemminger <stephen@networkplumber.org>
Date: Thu Jul 18 08:40:15 2013 -0700
vxlan: fix igmp races
Before this commit, the old code was:
if (vxlan_group_used(vn, vxlan->default_dst.remote_ip))
ip_mc_join_group(sk, &mreq);
else
ip_mc_leave_group(sk, &mreq);
therefore we shoud check vxlan_group_used(), not its opposite,
for igmp_join.
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename mib counter from "low latency" to "busy poll"
v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer)
Eric Dumazet suggested that the current location is better.
So v2 just renames the counter to fit the new naming convention.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduced in cf3c4c0306
("8139cp: Add dma_mapping_error checking")
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same behavior than 802.1q : finds the encapsulated protocol and
skip 32bit header.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ipgre_header() (header_ops->create) to return the correct
amount of bytes pushed. Most callers of dev_hard_header() seem
to care only if it was success, but af_packet.c uses it as
offset to the skb to copy from userspace only once. In practice
this fixes packet socket sendto()/sendmsg() to gre tunnels.
Regression introduced in c544193214
("GRE: Refactor GRE tunneling code.")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want the data stored in "addr" and "qual", but the extra ampersands
mean we are copying stack data instead.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Patch fixes zd1201 not to use stack as URB transfer_buffer. URB buffers need
to be DMA-able, which stack is not.
Patch is only compile tested.
Cc: stable@vger.kernel.org
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
device_close()->recalc_sigpending() is not needed, sigprocmask() takes
care of TIF_SIGPENDING correctly.
And without ->siglock it is racy and wrong, it can wrongly clear
TIF_SIGPENDING and miss a signal.
But even with this patch device_close() is still buggy:
1. sigprocmask() should not be used, we have set_task_blocked(),
but this is minor.
2. We should never block SIGKILL or SIGSTOP, and this is what
the code tries to do.
3. This can't protect against SIGKILL or SIGSTOP anyway. Another
thread can do signal_wake_up(), say, do_signal_stop() or
complete_signal() or debugger.
4. sigprocmask(SIG_BLOCK, allsigs) doesn't necessarily clears
TIF_SIGPENDING, say, freezing() or ->jobctl.
5. device_write() looks equally wrong by the same reason.
Looks like, this tries to protect some wait_event_interruptible() logic
from signals, it should be turned into uninterruptible wait. Or we need
to implement something like signals_stop/start for such a use-case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ACTIVITY and ERROR signals were reversed in the original commit.
Fix that so that hard drive activity does not show up on the error
light, and attempts to indicate that the hard drive is failing do
not show up as hard drive activity. This fixes a fairly serious
functional bug in the driver, but failing to apply this patch will
not cause any stability issues on the system.
Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
swiotlb-xen has an implicit dependency on CONFIG_NEED_SG_DMA_LENGTH.
Remove it by replacing dma_length with sg_dma_len.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This patch replace dma_length in "lib/swiotlb.c" to sg_dma_len() macro,
because the build error can occur if CONFIG_NEED_SG_DMA_LENGTH is not
set, and CONFIG_SWIOTLB is set.
Singed-off-by: EunBong Song <eunb.song@samsung.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Currently ballooned out pages are mapped to 0 and have INVALID_P2M_ENTRY
in the p2m. These ballooned out pages are used to map foreign grants
by gntdev and blkback (see alloc_xenballooned_pages).
Allocate a page per cpu and map all the ballooned out pages to the
corresponding mfn. Set the p2m accordingly. This way reading from a
ballooned out page won't cause a kernel crash (see
http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html).
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
CC: alex@alex.org.uk
CC: dcrisan@flexiant.com
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The global array of port users and the port_user_lock limits
scalability of the evtchn device. Instead of the global array lookup,
use a per-use (per-fd) tree of event channels bound by that user and
protect the tree with a per-user lock.
This is also a prerequiste for extended the number of supported event
channels, by removing the fixed size, per-event channel array.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In m2p_remove_override() when removing the grant map from the kernel
mapping and replacing with a mapping to the original page, the grant
unmap will already have flushed the TLB and it is not necessary to do
it again after updating the mapping.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Both Boris and David have graciously volunteered to help in
maintaining the Xen subsystem tree. Cementing this in the
MAINTAINERS file so they are copied on Xen related patches.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
And also drop the virtualization one since we don't really use it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com> [for netback]
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
With the current implementation, the callback in the tail of the list
can be added twice, because the check done in
gnttab_request_free_callback is bogus, callback->next can be NULL if
it is the last callback in the list. If we add the same callback twice
we end up with an infinite loop, were callback == callback->next.
Replace this check with a proper one that iterates over the list to
see if the callback has already been added.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Matt Wilson <msw@amazon.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
CC: stable@vger.kernel.org
This is a complete rewrite of the Xen TPM frontend driver, taking
advantage of a simplified frontend/backend interface and adding support
for cancellation and timeouts. The backend for this driver is provided
by a vTPM stub domain using the interface in Xen 4.3.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Matthew Fioravante <matthew.fioravante@jhuapl.edu>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Peter Huewe <peterhuewe@gmx.de>
Reviewed-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is based on a patch that Zhenzhong Duan had sent - which
was missing some of the remaining pieces. The kernel has the
logic to handle Xen-type-exceptions using the paravirt interface
in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME -
pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN -
pv_cpu_ops.iret).
That means the nmi handler (and other exception handlers) use
the hypervisor iret.
The other changes that would be neccessary for this would
be to translate the NMI_VECTOR to one of the entries on the
ipi_vector and make xen_send_IPI_mask_allbutself use different
events.
Fortunately for us commit 1db01b4903
(xen: Clean up apic ipi interface) implemented this and we piggyback
on the cleanup such that the apic IPI interface will pass the right
vector value for NMI.
With this patch we can trigger NMIs within a PV guest (only tested
x86_64).
For this to work with normal PV guests (not initial domain)
we need the domain to be able to use the APIC ops - they are
already implemented to use the Xen event channels. For that
to be turned on in a PV domU we need to remove the masking
of X86_FEATURE_APIC.
Incidentally that means kgdb will also now work within
a PV guest without using the 'nokgdbroundup' workaround.
Note that the 32-bit version is different and this patch
does not enable that.
CC: Lisa Nguyen <lisa@xenapiadmin.com>
CC: Ben Guthro <benjamin.guthro@citrix.com>
CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Fixed up per David Vrabel comments]
Reviewed-by: Ben Guthro <benjamin.guthro@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
These are needed by both guest and host.
Originally-from: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1376058122-8248-13-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Acked-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.
If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu "lock" and "want" values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-12-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks. The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).
In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking. We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:
Unlocker Locker
test for lock pickup
-> fail
unlock
test slowpath
-> false
set slowpath flags
block
Whereas this works in any ordering:
Unlocker Locker
set slowpath flags
test for lock pickup
-> fail
block
unlock
test slowpath
-> true, kick
If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.
The unlock code uses a locked add to update the head counter. This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs. If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.
Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks. However, if PV ticketlocks are not
enabled, then the old non-locked "add" is the only unlocking code.
Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it
doesn't the generated code isn't too bad, but its definitely suboptimal.
Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Commit b202952075 ("perf, core: Rate limit
perf_sched_events jump_label patching") introduced rate limiting
for jump label disabling. The changes were made in the jump label code
in order to be more widely available and to keep things tidier. This is
all fine, except now jump_label.h includes linux/workqueue.h, which
makes it impossible to include jump_label.h from anything that
workqueue.h needs. For example, it's now impossible to include
jump_label.h from asm/spinlock.h, which is done in proposed
pv-ticketlock patches. This patch splits out the rate limiting related
changes from jump_label.h into a new file, jump_label_ratelimit.h, to
resolve the issue.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Link: http://lkml.kernel.org/r/1376058122-8248-10-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a "is in slowpath state" bit. This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-9-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path. To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-7-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.
xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.
xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any. If found,
it kicks it by making its event channel pending, which wakes it up.
We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-6-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[ Raghavendra: use function + enum instead of macro, cmpxchg for zero status reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by Konrad.]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-5-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-4-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The code size expands somewhat, and its better to just call
a function rather than inline it.
Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch,
which is simplified.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1376058122-8248-3-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).
Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments. They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.
(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
To address this, we add two hooks:
- __ticket_spin_lock which is called after the cpu has been
spinning on the lock for a significant number of iterations but has
failed to take the lock (presumably because the cpu holding the lock
has been descheduled). The lock_spinning pvop is expected to block
the cpu until it has been kicked by the current lock holder.
- __ticket_spin_unlock, which on releasing a contended lock
(there are more cpus with tail tickets), it looks to see if the next
cpu is blocked and wakes it if so.
When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.
Results:
=======
setup: 32 core machine with 32 vcpu KVM guest (HT off) with 8GB RAM
base = 3.11-rc
patched = base + pvspinlock V12
+-----------------+----------------+--------+
dbench (Throughput in MB/sec. Higher is better)
+-----------------+----------------+--------+
| base (stdev %)|patched(stdev%) | %gain |
+-----------------+----------------+--------+
| 15035.3 (0.3) |15150.0 (0.6) | 0.8 |
| 1470.0 (2.2) | 1713.7 (1.9) | 16.6 |
| 848.6 (4.3) | 967.8 (4.3) | 14.0 |
| 652.9 (3.5) | 685.3 (3.7) | 5.0 |
+-----------------+----------------+--------+
pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
and undercommits results are flat
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
[ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
'target' will be set to '-1' in kvm_arch_vcpu_init(), and it need check
'target' whether less than zero or not in kvm_vcpu_initialized().
So need define target as 'int' instead of 'u32', just like ARM has done.
The related warning:
arch/arm64/kvm/../../../arch/arm/kvm/arm.c:497:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
Signed-off-by: Chen Gang <gang.chen@asianux.com>
[Marc: reformated the Subject line to fit the series]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When performing a Stage-2 TLB invalidation, it is necessary to
make sure the write to the page tables is observable by all CPUs.
For this purpose, add dsb instructions to __kvm_tlb_flush_vmid_ipa
and __kvm_flush_vm_context before doing the TLB invalidation itself.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Not saving PAR_EL1 is an unfortunate oversight. If the guest
performs an AT* operation and gets scheduled out before reading
the result of the translation from PAREL1, it could become
corrupted by another guest or the host.
Saving this register is made slightly more complicated as KVM also
uses it on the permission fault handling path, leading to an ugly
"stash and restore" sequence. Fortunately, this is already a slow
path so we don't really care. Also, Linux doesn't do any AT*
operation, so Linux guests are not impacted by this bug.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This reverts commit 407a2c2a4d.
Explanation provided by Benjamin Tissoires:
Commit "HID: hid-logitech-dj, querying_devices was never set" activate
a flag which guarantees that we do not ask the receiver for too many
enumeration. When the flag is set, each following enumeration call is
discarded (the usb request is not forwarded to the receiver). The flag
is then released when the driver receive a pairing information event,
which normally follows the enumeration request.
However, the USB3 bug makes the driver think the enumeration request
has been forwarded to the receiver. However, it is actually not the
case because the USB stack returns -EPIPE. So, when a new unknown
device appears, the workaround consisting in asking for a new
enumeration is not working anymore: this new enumeration is discarded
because of the flag, which is never reset.
A solution could be to trigger a timeout before releasing it, but for
now, let's just revert the patch.
Reported-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Tested-by: Sune Mølgaard <sune@molgaard.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
If a transaction is rolled back, the Target Address Register (TAR), Processor
Priority Register (PPR) and Data Stream Control Register (DSCR) should be
restored to the checkpointed values before the transaction began. Any changes
to these SPRs inside the transaction should not be visible in the abort
handler.
Currently Linux doesn't save or restore the checkpointed TAR, PPR or DSCR. If
we preempt a processes inside a transaction which has modified any of these, on
process restore, that same transaction may be aborted we but we won't see the
checkpointed versions of these SPRs.
This adds checkpointed versions of these SPRs to the thread_struct and adds the
save/restore of these three SPRs to the treclaim/trechkpt code.
Without this if any of these SPRs are modified during a transaction, users may
incorrectly see a speculated SPR value even if the transaction is aborted.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This moves us to save the Target Address Register (TAR) a earlier in
__switch_to. It introduces a new function save_tar() to do this.
We need to save the TAR earlier as we will overwrite it in the transactional
memory reclaim/recheckpoint path. We are going to do this in a subsequent
patch which will fix saving the TAR register when it's modified inside a
transaction.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
POWER8 allows the DSCR to be accessed directly from userspace via a new SPR
number 0x3 (Rather than 0x11. DSCR SPR number 0x11 is still used on POWER8 but
like POWER7, is only accessible in HV and OS modes). Currently, we allow this
by setting H/FSCR DSCR bit on boot.
Unfortunately this doesn't work, as the kernel needs to see the DSCR change so
that it knows to no longer restore the system wide version of DSCR on context
switch (ie. to set thread.dscr_inherit).
This clears the H/FSCR DSCR bit initially. If a process then accesses the DSCR
(via SPR 0x3), it'll trap into the kernel where we set thread.dscr_inherit in
facility_unavailable_exception().
We also change _switch() so that we set or clear the H/FSCR DSCR bit based on
the thread.dscr_inherit.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>