Commit graph

113,842 commits

Author SHA1 Message Date
Guilherme G. Piccoli
4d9aac397a powerpc/PCI: Disable MSI/MSI-X interrupts at PCI probe time in OF case
Since commit 1851617cd2 ("PCI/MSI: Disable MSI at enumeration even if
kernel doesn't support MSI"), the setup of dev->msi_cap/msix_cap and the
disable of MSI/MSI-X interrupts isn't being done at PCI probe time, as
the logic responsible for this was moved in the aforementioned commit
from pci_device_add() to pci_setup_device(). The latter function is not
reachable on PowerPC pseries platform during Open Firmware PCI probing
time.

This exhibits as drivers not being able to enable MSI, eg:

  bnx2x 0000:01:00.0: no msix capability found

This patch calls pci_msi_setup_pci_dev() explicitly to disable MSI/MSI-X
during PCI probe time on pSeries platform.

Fixes: 1851617cd2 ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI")
[mpe: Flesh out change log and clarify comment]
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-26 21:40:50 +10:00
Stephen Boyd
a7c602bf42 clk: tegra: Changes for v4.3-rc1
This contains the DFLL driver needed to implement CPU frequency scaling
 on Tegra.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVzfAzAAoJEN0jrNd/PrOhmjwP/iUYYxJ2kQ8lOxIxJ6bujBAF
 2PTdT5NOAKu0HJkKDHDtGO8gKRqZOPpyfoI8Ol5Xy/C0lIJ/uYoXqCZxfGjb2671
 XzGQFcd8j12qHu0lR5YD8wbivfBXzi4RCwFudTtdZi59dZqat8qUxtvHhtSVs1X9
 I6bN63ykXuHENAhVLdfz0+Trh4sMffOQ+jILQneB6OoKg+GGZpPkJM4WUhrilVKI
 Rw8MHxOlI5/2BUwxSRpu6K5BO1YVfWN5kgD/oyYrzRpd7p2Lxf15NTK/1zMOQ0Pt
 MD3J3enNqLT4brvnLstCChdaXz9r+CN4ut28Tnp7ocvBHIA/+1RjOfRTmsdcYRCE
 CelKsymLgEIFGZfQO57PDZXf5nEbnjhgshj1vNEWrXq6B8tgYeemwFKl1f7r6qEO
 pYmhLDV+AdP6+97FX1ySvBpBVY+GSJxOTupWYb3EvU2iwl9kfdcUs/EOQeo0oz+j
 coMWHQE9lcawcsQlbdhgq5uRGNrD9s4OkZAC4Ga5+ZiNo88gUzgE84WqYhQ6GJNE
 If+7RcJeCY4g2tpQJJtBM6kObtd+MWa7zSzHBjks7Il7y9C8uqpI5zgG3sufh/sG
 TmgxPv1FUEZBNowv6eA36euo4iw2XhZ3MFg2tlJdwgRj8+cfyK+fue/skCXuToaH
 fqeiJYcfpOv8s5JXfU3P
 =QWKO
 -----END PGP SIGNATURE-----

Merge tag 'tegra-for-4.3-clk' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into clk-next

clk: tegra: Changes for v4.3-rc1

This contains the DFLL driver needed to implement CPU frequency scaling
on Tegra.
2015-08-25 15:55:28 -07:00
Stephen Boyd
9205b797db ARM: 8421/1: smp: Collapse arch_cpu_idle_dead() into cpu_die()
The only caller of cpu_die() on ARM is arch_cpu_idle_dead(), so
let's simplify the code by renaming cpu_die() to
arch_cpu_idle_dead(). While were here, drop the __ref annotation
because __cpuinit is gone nowadays.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-25 18:19:19 +01:00
Olof Johansson
1ec6f70170 Fix for wrong error-codes in rk3288 suspend code.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCAAGBQJV3BkkAAoJEPOmecmc0R2B0T8H/23dDYmb1aKxpujzQqqZGNgx
 Aj3j50haQ23rst2UqJ9u3z+w1uhCEaLnDeXSC54VsyhhGOwczqmqA4XtH9cCdsIy
 R1AGfyXooOtKvu/IC+AGEIzVKL6bYc4BJ3jgj8tbW9RxdkyoSIYTUbUgYE9Scl3q
 3XIiCIAYY3C1zfh+4IuoxKY1hRokh9V068OEfad2/2HbIzYZdSBliO8EFw6St2DX
 cjlCCPyH8ipgA0G2IS2IA4yFobNac6xnqVwZb9/YVnYd9uxdVBBJGpWB9Lr/vOrz
 LaGILxWqPYLrtayJdtWN04TON4o/xAZmaJMFGay7QVWhEj6tVgbwEVw8QfytoOc=
 =uZaq
 -----END PGP SIGNATURE-----

Merge tag 'v4.3-rockchip32-soc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/soc

Fix for wrong error-codes in rk3288 suspend code.

* tag 'v4.3-rockchip32-soc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  ARM: rockchip: pm: Fix PTR_ERR() argument

Signed-off-by: Olof Johansson <olof@lixom.net>
2015-08-25 10:16:48 -07:00
Olof Johansson
4c80a00388 Fixes for non-standard and inverted regulator-suspend-properties
on veyron boards.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCAAGBQJV3Bg7AAoJEPOmecmc0R2BJ5kH/ReLNj9v5iCw1inJpBMuB2rk
 R1wECYOWSP3rDcORDZjjBChnYt02ft/3CnUstqE5vDJo1bNo/c9PvU5BPUGoXdOv
 Zx6ZXY5Q8nqGL0I39iP3gkboRzODeO+IpHzWlyLtnDIFRSzB8DBcHxm2wDvTPmqI
 qz4N+ObioycXqlOeimwE0o30qZeNTX/dIZ/pLlDojN2y0Kw71vnJPpgAcK6N+xEi
 w8VEMUW9oe0w31lDUfXJIL5ArjeCeOjoHjfGEwclsaOp0ic+OTVUmNOpo6MZ0Oni
 khuU/xvaWepE34+cvMA0BTzjY/t8L8dHhsw/ivEwrhRe+Uv8F/hgkVine/iS+/U=
 =AOly
 -----END PGP SIGNATURE-----

Merge tag 'v4.3-rockchip32-dts3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/dt

Fixes for non-standard and inverted regulator-suspend-properties
on veyron boards.

* tag 'v4.3-rockchip32-dts3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  ARM: dts: rockchip: correct regulator power states for suspend
  ARM: dts: rockchip: correct regulator PM properties

Signed-off-by: Olof Johansson <olof@lixom.net>
2015-08-25 10:15:27 -07:00
Linus Torvalds
b1713b135f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
 "A single fix for a APIC regression introduced in 4.0 which went
  undetected until now.

  I screwed up the x2apic cleanup in a subtle way.  The screwup is only
  visible on systems which have x2apic preenabled in the BIOS and need
  to disable it during boot"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Fix fallout from x2apic cleanup
2015-08-25 09:01:05 -07:00
Russell King
3fba7e23f7 ARM: uaccess: provide uaccess_save_and_enable() and uaccess_restore()
Provide uaccess_save_and_enable() and uaccess_restore() to permit
control of userspace visibility to the kernel, and hook these into
the appropriate places in the kernel where we need to access
userspace.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-25 16:14:43 +01:00
Russell King
08446b129b ARM: mm: improve do_ldrd_abort macro
Improve the do_ldrd_abort macro code - firstly, it inefficiently checks
for the LDRD encoding by doing a multi-stage test of various bits.  This
can be simplified by generating a mask, bitmasking the instruction and
then comparing the result.

Secondly, we want to be able to test the result rather than branching
to do_DataAbort, so remove the branch at the end and rename the macro
to 'teq_ldrd' to reflect it's new usage.  teq_ldrd macro returns 'eq'
if the instruction was a LDRD.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-25 16:14:42 +01:00
Russell King
e0aa3a6657 ARM: entry: ensure that IRQs are enabled when calling syscall_trace_exit()
The audit code looks like it's been written to cope with being called
with IRQs enabled.  However, it's unclear whether IRQs should be
enabled or disabled when calling the syscall tracing infrastructure.

Right now, sometimes we call this with IRQs enabled, and other times
with IRQs disabled.  Opt for IRQs being enabled for consistency.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-25 10:32:50 +01:00
Russell King
3302caddf1 ARM: entry: efficiency cleanups
Make the "fast" syscall return path fast again.  The addition of IRQ
tracing and context tracking has made this path grossly inefficient.
We can do much better if these options are enabled if we save the
syscall return code on the stack - we then don't need to save a bunch
of registers around every single callout to C code.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-25 10:32:48 +01:00
Russell King
01e09a2816 ARM: entry: get rid of asm_trace_hardirqs_on_cond
There's no need for this macro, it can use a default for the
condition argument.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-25 10:32:46 +01:00
Russell King
b64d1f6651 ARM: uaccess: simplify user access assembly
The user assembly for byte and word accesses was virtually identical.
Rather than duplicating this, use a macro instead.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-25 10:32:39 +01:00
Ingo Molnar
8d58b66ed2 Linux 4.2-rc8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV2pUkAAoJEHm+PkMAQRiGCIoH/Rb29ZjdCoZJp9OtnjAG+qRc
 bG3YuomIdib86x7xHRKKaLWBa7din7IYjuwT/X4S4duO5a1R5Lp1sRG3IlGfhT0W
 nBNbjFl4q4bOyiTPtTRTYyh4g5UQv4IuyCnCmZyCTJyVi/O6HVM9TWKUzm68P2dJ
 30LwLUcQJ+mHueGJwFBAXe2BaojEpvYCdSX6tvbrQ/8X3FrVExZXuJl4uMYNFYNK
 ZwG/v5t7tYOiAe76JGbrEuVFPZWLPEW7amHOWR0T4Ye4nWTlBgx7fENiNRlfgcvI
 CM16l/xkyrZQ3Q5jZy1qYDfdHYF++dyEDysX4w1ae/X0aaLZn7l+u5VQD6WpkQQ=
 =IF6I
 -----END PGP SIGNATURE-----

Merge tag 'v4.2-rc8' into x86/mm, before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-25 09:59:19 +02:00
Paul Gortmaker
13fe86f465 x86/mm: Make kernel/check.c explicitly non-modular
The Kconfig currently controlling compilation of this code is:

  arch/x86/Kconfig:config X86_CHECK_BIOS_CORRUPTION
  arch/x86/Kconfig:       bool "Check for low memory corruption"

...meaning that it currently is not being built as a module by
anyone.

Lets remove the couple traces of modularity so that when reading
the code there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the
non-modular case, the init ordering remains unchanged with this
commit.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1440459295-21814-4-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-25 09:48:38 +02:00
Paul Gortmaker
8f45fe441a x86/mm/pat: Make mm/pageattr[-test].c explicitly non-modular
The file pageattr.c is obj-y and it includes pageattr-test.c
based on CPA_DEBUG (a bool), meaning that no code here is
currently being built as a module by anyone.

Lets remove the couple traces of modularity so that when reading
the code there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the
non-modular case, the init ordering remains unchanged with this
commit.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1440459295-21814-3-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-25 09:48:38 +02:00
Paul Gortmaker
e971aa2cba x86/platform: Make atom/pmc_atom.c explicitly non-modular
The Kconfig currently controlling compilation of this code is:

config PMC_ATOM
        def_bool y

...meaning that it currently is not being built as a module by
anyone.

Lets remove the couple traces of modularity so that when reading
the driver there is no doubt it is builtin-only.

Since module_init() translates to device_initcall() in the
non-modular case, the init ordering remains unchanged with this
commit.

We leave some tags like MODULE_AUTHOR() for documentation
purposes.

Also note that MODULE_DEVICE_TABLE() is a no-op for non-modular
code. We correct a comment that indicates the data was only used
by that macro, as it actually is used by the code directly.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1440459295-21814-2-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-25 09:47:50 +02:00
Ingo Molnar
f612a7b1a7 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU cleanup from Paul E. McKenney:

 "Privatize smp_mb__after_unlock_lock().  This commit moves the
  definition of smp_mb__after_unlock_lock() to kernel/rcu/tree.h,
  in recognition of the fact that RCU is the only thing using
  this, that nothing else is likely to use it, and that it is
  likely to go away completely."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-25 09:44:49 +02:00
Linus Walleij
5dc0fe199b clk/ARM: move Ux500 PRCC bases to the device tree
The base addresses for the Ux500 PRCC controllers are hardcoded,
let's move them to the clock node in the device tree and delete
the constants.

Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2015-08-24 16:49:14 -07:00
Stephen Boyd
617b8272a6 MIPS: alchemy: Convert to clk_hw based provider APIs
We're removing struct clk from the clk provider API, so switch
this code to using the clk_hw based provider APIs.

Cc: Manuel Lauss <manuel.lauss@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Linux-MIPS <linux-mips@linux-mips.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2015-08-24 16:48:46 -07:00
Stephen Boyd
7819189779 ARM: OMAP: Convert __clk_get_rate() to provider/consumer APIs
We're removing struct clk from the clk provider API, so switch to
clk_get_rate() and clk_hw_get_rate() here appropriately.

Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2015-08-24 16:48:45 -07:00
Rafael J. Wysocki
82bb70c599 Merge branch 'turbostat' of https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-tools
Pull turbostat changes for v4.3 from Len Brown.

* 'turbostat' of https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: fix typo on DRAM column in Joules-mode
  tools/power turbostat: fix parameter passing for forked command
  tools/power turbostat: dump CONFIG_TDP
  tools/power turbostat: cpu0 is no longer hard-coded, so  update output
  tools/power turbostat: update turbostat(8)
2015-08-24 23:10:02 +02:00
Will Deacon
5166c20ef9 arm64: makefile: fix perf_callchain.o kconfig dependency
Commit 4b3dc9679c ("arm64: force CONFIG_SMP=y and remove redundant
#ifdefs") incorrectly resolved a conflict on arch/arm64/kernel/Makefile
which resulted in a partial revert of 52da443ec4 ("arm64: perf: factor
out callchain code"), leading to perf_callchain.o depending on
CONFIG_HW_PERF_EVENTS instead of CONFIG_PERF_EVENTS.

This patch restores the kconfig dependency for perf_callchain.o.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-08-24 13:44:08 +01:00
Fabio Estevam
2a03c025fd ARM: rockchip: pm: Fix PTR_ERR() argument
PTR_ERR should access the value just tested by IS_ERR.

The semantic patch that makes this change is available
in scripts/coccinelle/tests/odd_ptr_err.cocci.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2015-08-24 12:39:14 +02:00
Ard Biesheuvel
34ba2c4247 arm64: set MAX_MEMBLOCK_ADDR according to linear region size
The linear region size of a 39-bit VA kernel is only 256 GB, which
may be insufficient to cover all of system RAM, even on platforms
that have much less than 256 GB of memory but which is laid out
very sparsely.

So make sure we clip the memory we will not be able to map before
installing it into the memblock memory table, by setting
MAX_MEMBLOCK_ADDR accordingly.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-08-24 10:23:19 +01:00
Alexander Kuleshov
5d3c2c3529 arm64: Fix source code file path in comments
Architecture specific code for i386 and x86_64 was unified and merged to
the arch/x86. This patch fix old path of x86 architecture in a comment
from the arch/arm64/include/asm/fixmap.h.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-08-24 10:18:11 +01:00
Dave Airlie
3732ce72b4 Linux 4.2-rc8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV2pUkAAoJEHm+PkMAQRiGCIoH/Rb29ZjdCoZJp9OtnjAG+qRc
 bG3YuomIdib86x7xHRKKaLWBa7din7IYjuwT/X4S4duO5a1R5Lp1sRG3IlGfhT0W
 nBNbjFl4q4bOyiTPtTRTYyh4g5UQv4IuyCnCmZyCTJyVi/O6HVM9TWKUzm68P2dJ
 30LwLUcQJ+mHueGJwFBAXe2BaojEpvYCdSX6tvbrQ/8X3FrVExZXuJl4uMYNFYNK
 ZwG/v5t7tYOiAe76JGbrEuVFPZWLPEW7amHOWR0T4Ye4nWTlBgx7fENiNRlfgcvI
 CM16l/xkyrZQ3Q5jZy1qYDfdHYF++dyEDysX4w1ae/X0aaLZn7l+u5VQD6WpkQQ=
 =IF6I
 -----END PGP SIGNATURE-----

Merge tag 'v4.2-rc8' into drm-next

Linux 4.2-rc8

Backmerge required for Intel so they can fix their -next tree up properly.
2015-08-24 16:36:42 +10:00
Alexander Kuleshov
50e48bd067 m68k/coldfire: use PFN_DOWN macro
Replace ((x) >> PAGE_SHIFT) with the predefined PFN_DOWN macro.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
2015-08-24 14:31:38 +10:00
Viresh Kumar
5bbc08fb0f m68k/coldfire/pit: Migrate to new 'set-state' interface
Migrate m68k driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

We weren't doing anything in ->set_mode(RESUME) and so tick_resume()
isn't implemented.

Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
2015-08-24 14:31:38 +10:00
Linus Torvalds
eb63b34bdf Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS bug fixes from Ralf Baechle:
 "Two more fixes for 4.2.

  One fixes a build issue with the LLVM assembler - LLVM assembler macro
  names are case sensitive, GNU as macro names are insensitive; the
  other corrects a license string (GPL v2, not GPLv2) such that the
  module loader will recognice the license correctly"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  FIRMWARE: bcm47xx_nvram: Fix module license.
  MIPS: Fix LLVM build issue.
2015-08-23 07:23:09 -07:00
Andy Lutomirski
47edb65178 x86/asm/msr: Make wrmsrl() a function
As of cf991de2f6 ("x86/asm/msr: Make wrmsrl_safe() a
function"), wrmsrl_safe is a function, but wrmsrl is still a
macro.  The wrmsrl macro performs invalid shifts if the value
argument is 32 bits. This makes it unnecessarily awkward to
write code that puts an unsigned long into an MSR.

To make this work, syscall_init needs tweaking to stop passing
a function pointer to wrmsrl.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Link: http://lkml.kernel.org/r/690f0c629a1085d054e2d1ef3da073cfb3f7db92.1437678821.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-23 13:25:38 +02:00
Linus Torvalds
b7dec838b5 Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "Another couple of small ARM fixes.

  A patch from Masahiro Yamada who noticed that "make -jN all zImage"
  would end up generating bad images where N > 1, and a patch from
  Nicolas to fix the Marvell CPU user access optimisation code when page
  faults are disabled"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8418/1: add boot image dependencies to not generate invalid images
  ARM: 8414/1: __copy_to_user_memcpy: fix mmap semaphore usage
2015-08-22 15:48:04 -07:00
Paolo Bonzini
e3dbc572fe Patch queue for ppc - 2015-08-22
Highlights for KVM PPC this time around:
 
   - Book3S: A few bug fixes
   - Book3S: Allow micro-threading on POWER8
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJV2D6aAAoJECszeR4D/txggBMP/3nHD3UjEAFUhhA6VjfK2wNw
 IW2aXQ5+2T51l1K8iSGMyKpW2w4zG5Bv9LdBP2badhaVpgM4//nVf7kcEBrdhjYq
 ns7V3klzTuNY5RBbWZz3Zri0mgCkJVF1XlC3xBzGPSNKpZyrkORhlxfg5GXig8lj
 pvUcku7XgkCFabAIIZmf0pg9hpDHpH3k1G9yZxuA8pys951IPRoo1CgsYmWSbmzh
 jfA2CxBl10dHZOuk/ENyJveJgtthmBB4ezCbWXy+wcMzBKhMC5R93LUoiKXMLWpM
 HkziNGjHA1gFSxDtfUVgkcXfan3a5JmlC+u50dLCTetXOVL7m2beIiXwv3smfjLn
 AkpcChceEChxn0MxwKJjNvU+RVh3kmv8rklfPlBXHTtQ5ZSXxlcxYrmgL64stmrt
 e27dzvJd9J7KX6wEpNyuZINsmFyn3lM3IoxqmSsVCRd43fzhZt9QGcYEXMIe1+lb
 E7QncsYMuuWB/sfSieyPaXtmK5ym2+R220xlKezBZdzWdtisPrpCRyl7BdiqCj6O
 1gROi6qEyj3m5Qw/eGbFKBF0d8oVXqo1wBJkbihMl55D+jMeZMk673aeGhno8au1
 kH+Im+H5xU3oEzdqvC9y3c9kE2sRkzj43GjepIb86Y463fg6KQ5j2gbZUZolGsGH
 AnRSGcbbVer/q+9kymPw
 =t+9t
 -----END PGP SIGNATURE-----

Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queue

Patch queue for ppc - 2015-08-22

Highlights for KVM PPC this time around:

  - Book3S: A few bug fixes
  - Book3S: Allow micro-threading on POWER8
2015-08-22 14:57:59 -07:00
Linus Torvalds
d0b89bd548 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Various low level fixes: fix more fallout from the FPU rework and the
  asm entry code rework, plus an MSI rework fix, and an idle-tracing fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu/math-emu: Fix crash in fork()
  x86/fpu/math-emu: Fix math-emu boot crash
  x86/idle: Restore trace_cpu_idle to mwait_idle() calls
  x86/irq: Build correct vector mapping for multiple MSI interrupts
  Revert "sched/x86_64: Don't save flags on context switch"
2015-08-22 08:15:36 -07:00
Thomas Gleixner
a57e456a7b x86/apic: Fix fallout from x2apic cleanup
In the recent x2apic cleanup I got two things really wrong:
1) The safety check in __disable_x2apic which allows the function to
   be called unconditionally is backwards. The check is there to
   prevent access to the apic MSR in case that the machine has no
   apic. Though right now it returns if the machine has an apic and
   therefor the disabling of x2apic is never invoked.

2) x2apic_disable() sets x2apic_mode to 0 after registering the local
   apic. That's wrong, because register_lapic_address() checks x2apic
   mode and therefor takes the wrong code path.

This results in boot failures on machines with x2apic preenabled by
BIOS and can also lead to an fatal MSR access on machines without
apic.

The solutions are simple:
1) Correct the sanity check for apic availability
2) Clear x2apic_mode _before_ calling register_lapic_address()

Fixes: 659006bf3a 'x86/x2apic: Split enable and setup function'
Reported-and-tested-by: Javier Monteagudo <javiermon@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1224764
Cc: stable@vger.kernel.org # 4.0+
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
2015-08-22 17:01:48 +02:00
Linus Torvalds
84f3fe4608 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "A series of small fixlets for a regression visible on OMAP devices
  caused by the conversion of the OMAP interrupt chips to hierarchical
  interrupt domains.  Mostly one liners on the driver side plus a small
  helper function in the core to avoid open coded mess in the drivers"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/crossbar: Restore set_wake functionality
  irqchip/crossbar: Restore the mask on suspend behaviour
  ARM: OMAP: wakeupgen: Restore the irq_set_type() mechanism
  irqchip/crossbar: Restore the irq_set_type() mechanism
  genirq: Introduce irq_chip_set_type_parent() helper
  genirq: Don't return ENOSYS in irq_chip_retrigger_hierarchy
2015-08-22 07:45:36 -07:00
Andrey Ryabinin
69786cdb37 x86/kasan, mm: Introduce generic kasan_populate_zero_shadow()
Introduce generic kasan_populate_zero_shadow(shadow_start,
shadow_end). This function maps kasan_zero_page to the
[shadow_start, shadow_end] addresses.

This replaces x86_64 specific populate_zero_shadow() and will
be used for ARM64 in follow on patches.

The main changes from original version are:

 * Use p?d_populate*() instead of set_p?d()
 * Use memblock allocator directly instead of vmemmap_alloc_block()
 * __pa() instead of __pa_nodebug(). __pa() causes troubles
   iff we use it before kasan_early_init(). kasan_populate_zero_shadow()
   will be used later, so we ok with __pa() here.

Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Klimov <klimov.linux@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Keitel <dkeitel@codeaurora.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yury <yury.norov@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1439444244-26057-3-git-send-email-ryabinin.a.a@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 14:54:55 +02:00
Andrey Ryabinin
920e277e17 x86/kasan: Define KASAN_SHADOW_OFFSET per architecture
Current definition of  KASAN_SHADOW_OFFSET in
include/linux/kasan.h will not work for upcomming arm64, so move
it to the arch header.

Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Klimov <klimov.linux@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Keitel <dkeitel@codeaurora.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yury <yury.norov@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1439444244-26057-2-git-send-email-ryabinin.a.a@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 14:54:55 +02:00
Huang Rui
b466bdb614 x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
MWAITX can enable a timer and a corresponding timer value
specified in SW P0 clocks. The SW P0 frequency is the same as
TSC. The timer provides an upper bound on how long the
instruction waits before exiting.

This way, a delay function in the kernel can leverage that
MWAITX timer of MWAITX.

When a CPU core executes MWAITX, it will be quiesced in a
waiting phase, diminishing its power consumption. This way, we
can save power in comparison to our default TSC-based delays.

A simple test shows that:

	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
	$ sleep 10000s
	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc

Results:

	* TSC-based default delay:      485115 uWatts average power
	* MWAITX-based delay:           252738 uWatts average power

Thus, that's about 240 milliWatts less power consumption. The
test method relies on the support of AMD CPU accumulated power
algorithm in fam15h_power for which patches are forthcoming.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Borislav Petkov <bp@suse.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ Fix delay truncation. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com
Link: http://lkml.kernel.org/r/1439201994-28067-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 14:52:16 +02:00
Huang Rui
f96756746c x86/asm: Add MONITORX/MWAITX instruction support
AMD Carrizo processors (Family 15h, Models 60h-6fh) added a new
feature called MWAITX (MWAIT with extensions) as an extension to
MONITOR/MWAIT.

This new instruction controls a configurable timer which causes
the core to exit wait state on timer expiration, in addition to
"normal" MWAIT condition of reading from a monitored VA.

Compared to MONITOR/MWAIT, there are minor differences in opcode
and input parameters:

MWAITX ECX[1]: enable timer if set
MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks ==
TSC. The software P0 frequency is the same as the TSC frequency.

                MWAIT                           MWAITX
opcode          0f 01 c9           |            0f 01 fb
ECX[0]                  value of RFLAGS.IF seen by instruction
ECX[1]          unused/#GP if set  |            enable timer if set
ECX[31:2]                     unused/#GP if set
EAX                           unused (reserve for hint)
EBX[31:0]       unused             |            max wait time (SW P0 == TSC)

                MONITOR                         MONITORX
opcode          0f 01 c8           |            0f 01 fa
EAX                     (logical) address to monitor
ECX                     #GP if not zero

Max timeout = EBX/(TSC frequency)

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1439201994-28067-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 14:52:16 +02:00
Michael Ellerman
5d53be7d8c powerpc/powernv: Fix mis-merge of OPAL support for LEDS driver
When I merged the OPAL support for the powernv LEDS driver I missed a
hunk.

This is slightly modified from the original patch, as the original added
code to opal-api.h which is not in the skiboot version, which is
discouraged.

Instead those values are moved into the driver, which is the only place
they are used.

Fixes: 8a8d91817a ("powerpc/powernv: Add OPAL interfaces for accessing and modifying system LED states")
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-22 21:09:12 +10:00
Tim Chen
488ca7d72d x86/cpufeatures: Enable cpuid for Intel SHA extensions
Add Intel CPUID for Intel Secure Hash Algorithm Extensions. This feature
provides new instructions for accelerated computation of SHA-1 and SHA-256.
This allows the feature to be shown in the /proc/cpuinfo for cpus that
support it.

Refer to SHA extension programming guide in chapter 8.2 of the Intel
Architecture Instruction Set Extensions Programming reference
for definition of this feature's cpuid: CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29] = 1
https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf

Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Link: http://lkml.kernel.org/r/1440194206.3940.6.camel@schen9-mobl2
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-22 11:17:31 +02:00
Sam bobroff
c63517c2e3 KVM: PPC: Book3S: correct width in XER handling
In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
accessed as such.

This patch corrects places where it is accessed as a 32 bit field by a
64 bit kernel.  In some cases this is via a 32 bit load or store
instruction which, depending on endianness, will cause either the
lower or upper 32 bits to be missed.  In another case it is cast as a
u32, causing the upper 32 bits to be cleared.

This patch corrects those places by extending the access methods to
64 bits.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:19 +02:00
Paul Mackerras
563a1e93af KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation
Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen
time for it.  This currently isn't the case when we have a vcore that
no longer has any runnable threads in it but still has a runner task,
so we do an explicit call to kvmppc_core_start_stolen() in that case.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:19 +02:00
Paul Mackerras
402813fe39 KVM: PPC: Book3S HV: Fix preempted vcore list locking
When a vcore gets preempted, we put it on the preempted vcore list for
the current CPU.  The runner task then calls schedule() and comes back
some time later and takes itself off the list.  We need to be careful
to lock the list that it was put onto, which may not be the list for the
current CPU since the runner task may have moved to another CPU.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:19 +02:00
Paul Mackerras
cdeee51842 KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
This adds implementations for the H_CLEAR_REF (test and clear reference
bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls.

When clearing the reference or change bit in the guest view of the HPTE,
we also have to clear it in the real HPTE so that we can detect future
references or changes.  When we do so, we transfer the R or C bit value
to the rmap entry for the underlying host page so that kvm_age_hva_hv(),
kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page
has been referenced and/or changed.

These hypercalls are not used by Linux guests.  These implementations
have been tested using a FreeBSD guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:18 +02:00
Paul Mackerras
08fe1e7bd2 KVM: PPC: Book3S HV: Fix bug in dirty page tracking
This fixes a bug in the tracking of pages that get modified by the
guest.  If the guest creates a large-page HPTE, writes to memory
somewhere within the large page, and then removes the HPTE, we only
record the modified state for the first normal page within the large
page, when in fact the guest might have modified some other normal
page within the large page.

To fix this we use some unused bits in the rmap entry to record the
order (log base 2) of the size of the page that was modified, when
removing an HPTE.  Then in kvm_test_clear_dirty_npages() we use that
order to return the correct number of modified pages.

The same thing could in principle happen when removing a HPTE at the
host's request, i.e. when paging out a page, except that we never
page out large pages, and the guest can only create large-page HPTEs
if the guest RAM is backed by large pages.  However, we also fix
this case for the sake of future-proofing.

The reference bit is also subject to the same loss of information.  We
don't make the same fix here for the reference bit because there isn't
an interface for userspace to find out which pages the guest has
referenced, whereas there is one for userspace to find out which pages
the guest has modified.  Because of this loss of information, the
kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly
say that a page has not been referenced when it has, but that doesn't
matter greatly because we never page or swap out large pages.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:18 +02:00
Paul Mackerras
1e5bf454f5 KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE
The reference (R) and change (C) bits in a HPT entry can be set by
hardware at any time up until the HPTE is invalidated and the TLB
invalidation sequence has completed.  This means that when removing
a HPTE, we need to read the HPTE after the invalidation sequence has
completed in order to obtain reliable values of R and C.  The code
in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd3265
("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the
read after invalidation as a side effect of other changes.  This
restores the read of the HPTE after invalidation.

The user-visible effect of this bug would be that when migrating a
guest, there is a small probability that a page modified by the guest
and then unmapped by the guest might not get re-transmitted and thus
the destination might end up with a stale copy of the page.

Fixes: 6f22bd3265
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:18 +02:00
Paul Mackerras
b4deba5c41 KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:17 +02:00
Paul Mackerras
ec25716508 KVM: PPC: Book3S HV: Make use of unused threads when running guests
When running a virtual core of a guest that is configured with fewer
threads per core than the physical cores have, the extra physical
threads are currently unused.  This makes it possible to use them to
run one or more other virtual cores from the same guest when certain
conditions are met.  This applies on POWER7, and on POWER8 to guests
with one thread per virtual core.  (It doesn't apply to POWER8 guests
with multiple threads per vcore because they require a 1-1 virtual to
physical thread mapping in order to be able to use msgsndp and the
TIR.)

The idea is that we maintain a list of preempted vcores for each
physical cpu (i.e. each core, since the host runs single-threaded).
Then, when a vcore is about to run, it checks to see if there are
any vcores on the list for its physical cpu that could be
piggybacked onto this vcore's execution.  If so, those additional
vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
threads are started as well as the original vcore, which is called
the master vcore.

After the vcores have exited the guest, the extra ones are put back
onto the preempted list if any of their VCPUs are still runnable and
not idle.

This means that vcpu->arch.ptid is no longer necessarily the same as
the physical thread that the vcpu runs on.  In order to make it easier
for code that wants to send an IPI to know which CPU to target, we
now store that in a new field in struct vcpu_arch, called thread_cpu.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:17 +02:00
Tudor Laurentiu
845ac985cf KVM: PPC: add missing pt_regs initialization
On this switch branch the regs initialization
doesn't happen so add it.
This was found with the help of a static
code analysis tool.

Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:17 +02:00