Commit graph

17,058 commits

Author SHA1 Message Date
Masami Hiramatsu
06aeaaeabf ftrace: Move ARCH_SUPPORTS_FTRACE_SAVE_REGS in Kconfig
Move SAVE_REGS support flag into Kconfig and rename
it to CONFIG_DYNAMIC_FTRACE_WITH_REGS. This also introduces
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which indicates
the architecture depending part of ftrace has a code
that saves full registers.
On the other hand, CONFIG_DYNAMIC_FTRACE_WITH_REGS indicates
the code is enabled.

Link: http://lkml.kernel.org/r/20120928081516.3560.72534.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:35 -05:00
Rusty Russell
373d4d0997 taint: add explicit flag to show whether lock dep is still OK.
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-21 17:17:57 +10:30
Herbert Xu
7983627657 crypto: crc32-pclmul - Kill warning on x86-32
This patch removes a gratuitous warning on x86-32:

arch/x86/crypto/crc32-pclmul_asm.S:87:2: warning: #warning Using 32bit code support [-Wcpp]

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 18:05:02 +11:00
Jussi Kivilinna
d3f5188dfe crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:51 +11:00
Jussi Kivilinna
ac9d55dd42 crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROC
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:51 +11:00
Jussi Kivilinna
2dcfd44dee crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize jump targets
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:50 +11:00
Jussi Kivilinna
044438082c crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember functions and rename ECRYPT_* to salsa20_*
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:50 +11:00
Jussi Kivilinna
b05d3f3756 crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functions
Signed-off-by: Jussi Kivilinna <jussi.kivilinn@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:49 +11:00
Jussi Kivilinna
698a5abbb0 crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:49 +11:00
Jussi Kivilinna
1985fecf01 crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functions
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:49 +11:00
Jussi Kivilinna
e17e209ea4 crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:48 +11:00
Jussi Kivilinna
59990684b0 crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:48 +11:00
Jussi Kivilinna
5186e395fe crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:48 +11:00
Jussi Kivilinna
8309b745bb crypto: aesni-intel - add ENDPROC statements for assembler functions
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:47 +11:00
Jussi Kivilinna
3f29974383 crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:47 +11:00
Alexander Boyko
78c37d191d crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table implementation
This patch adds crc32 algorithms to shash crypto api. One is wrapper to
gerneric crc32_le function. Second is crc32 pclmulqdq implementation. It
use hardware provided PCLMULQDQ instruction to accelerate the CRC32 disposal.
This instruction present from Intel Westmere and AMD Bulldozer CPUs.

For intel core i5 I got 450MB/s for table implementation and 2100MB/s
for pclmulqdq implementation.

Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:45 +11:00
H. Peter Anvin
021ef050fc x86-32: Start out cr0 clean, disable paging before modifying cr3/4
Patch

  5a5a51db78 x86-32: Start out eflags and cr4 clean

... made x86-32 match x86-64 in that we initialize %eflags and %cr4
from scratch.  This broke OLPC XO-1.5, because the XO enters the
kernel with paging enabled, which the kernel doesn't expect.

Since we no longer support 386 (the source of most of the variability
in %cr0 configuration), we can simply match further x86-64 and
initialize %cr0 to a fixed value -- the one variable part remaining in
%cr0 is for FPU control, but all that is handled later on in
initialization; in particular, configuring %cr0 as if the FPU is
present until proven otherwise is correct and necessary for the probe
to work.

To deal with the XO case sanely, explicitly disable paging in %cr0
before we muck with %cr3, %cr4 or EFER -- those operations are
inherently unsafe with paging enabled.

NOTE: There is still a lot of 386-related junk in head_32.S which we
can and should get rid of, however, this is intended as a minimal fix
whereas the cleanup can be deferred to the next merge window.

Reported-by: Andres Salomon <dilinger@queued.net>
Tested-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/50FA0661.2060400@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-19 11:01:22 -08:00
Linus Torvalds
5c69bed266 Fixes:
- CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels)
  - Fix racy vma access spotted by Al Viro
  - Fix mmap batch ioctl potentially resulting in large O(n) page allcations.
  - Fix vcpu online/offline BUG:scheduling while atomic..
  - Fix unbound buffer scanning for more than 32 vCPUs.
  - Fix grant table being incorrectly initialized
  - Fix incorrect check in pciback
  - Allow privcmd in backend domains.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJQ+L7qAAoJEFjIrFwIi8fJLNIH/jUsneraEggWeh0L4GGWZvWL
 cNCf0zjQt/pi1Q5drbleW2/6Wv6s6N1QA9pGRsJ+rrliC73HVTqIWFh0TjpwmCVy
 hZal7jDXOuFVIR7GbGEPn004T6mkEnYDb/O2fyojwMVg0NQYwtMYJfTBkKdjKnmV
 z6sWpQPVqO3/nZ17k2DipYRldbeiqS6LLOiUWd72b2W8bV4ySY5iVPVsqFusSEr6
 PNyW33RPs5H0jEPR1uJlLD+l/uIbENykpEPeAS2uHGlch129+xHH5h79dwYJTbw6
 x5nAOveO9VNJscUoqhpE7YbySzJmrUwxnBerZ6YTW6WCknYXrx4uiVAlfWem7uY=
 =26Sq
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fixes from Konrad Rzeszutek Wilk:
 - CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels)
 - Fix racy vma access spotted by Al Viro
 - Fix mmap batch ioctl potentially resulting in large O(n) page allcations.
 - Fix vcpu online/offline BUG:scheduling while atomic..
 - Fix unbound buffer scanning for more than 32 vCPUs.
 - Fix grant table being incorrectly initialized
 - Fix incorrect check in pciback
 - Allow privcmd in backend domains.

Fix up whitespace conflict due to ugly merge resolution in Xen tree in
arch/arm/xen/enlighten.c

* tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.
  Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic."
  xen/gntdev: remove erronous use of copy_to_user
  xen/gntdev: correctly unmap unlinked maps in mmu notifier
  xen/gntdev: fix unsafe vma access
  xen/privcmd: Fix mmap batch ioctl.
  Xen: properly bound buffer access when parsing cpu/*/availability
  xen/grant-table: correctly initialize grant table version 1
  x86/xen : Fix the wrong check in pciback
  xen/privcmd: Relax access control in privcmd_ioctl_mmap
2013-01-18 12:02:52 -08:00
Rafael J. Wysocki
a090b22f3f Merge branch 'acpica' into acpi-lpss
The following commits depend on the 'acpica' material.
2013-01-18 13:49:29 +01:00
Nathan Zimmer
b8f2c21db3 efi, x86: Pass a proper identity mapping in efi_call_phys_prelog
Update efi_call_phys_prelog to install an identity mapping of all available
memory.  This corrects a bug on very large systems with more then 512 GB in
which bios would not be able to access addresses above not in the mapping.

The result is a crash that looks much like this.

BUG: unable to handle kernel paging request at 000000effd870020
IP: [<0000000078bce331>] 0x78bce330
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU 0
Pid: 0, comm: swapper/0 Tainted: G        W    3.8.0-rc1-next-20121224-medusa_ntz+ #2 Intel Corp. Stoutland Platform
RIP: 0010:[<0000000078bce331>]  [<0000000078bce331>] 0x78bce330
RSP: 0000:ffffffff81601d28  EFLAGS: 00010006
RAX: 0000000078b80e18 RBX: 0000000000000004 RCX: 0000000000000004
RDX: 0000000078bcf958 RSI: 0000000000002400 RDI: 8000000000000000
RBP: 0000000078bcf760 R08: 000000effd870000 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000000000c3 R12: 0000000000000030
R13: 000000effd870000 R14: 0000000000000000 R15: ffff88effd870000
FS:  0000000000000000(0000) GS:ffff88effe400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000effd870020 CR3: 000000000160c000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81614400)
Stack:
 0000000078b80d18 0000000000000004 0000000078bced7b ffff880078b81fff
 0000000000000000 0000000000000082 0000000078bce3a8 0000000000002400
 0000000060000202 0000000078b80da0 0000000078bce45d ffffffff8107cb5a
Call Trace:
 [<ffffffff8107cb5a>] ? on_each_cpu+0x77/0x83
 [<ffffffff8102f4eb>] ? change_page_attr_set_clr+0x32f/0x3ed
 [<ffffffff81035946>] ? efi_call4+0x46/0x80
 [<ffffffff816c5abb>] ? efi_enter_virtual_mode+0x1f5/0x305
 [<ffffffff816aeb24>] ? start_kernel+0x34a/0x3d2
 [<ffffffff816ae5ed>] ? repair_env_string+0x60/0x60
 [<ffffffff816ae2be>] ? x86_64_start_reservations+0xba/0xc1
 [<ffffffff816ae120>] ? early_idt_handlers+0x120/0x120
 [<ffffffff816ae419>] ? x86_64_start_kernel+0x154/0x163
Code:  Bad RIP value.
RIP  [<0000000078bce331>] 0x78bce330
 RSP <ffffffff81601d28>
CR2: 000000effd870020
---[ end trace ead828934fef5eab ]---

Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-18 09:43:43 +00:00
Greg Kroah-Hartman
ed408f7c0f Merge 3.9-rc4 into driver-core-next
This is to fix up a build problem with a wireless driver due to the
dynamic-debug patches in this branch.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 19:48:18 -08:00
Bjorn Helgaas
708b59bfe1 Merge branch 'pci/rafael-set-root-bridge-handle' into next
* pci/rafael-set-root-bridge-handle:
  ACPI / PCI: Set root bridge ACPI handle in advance
2013-01-17 16:00:36 -07:00
Jacob Pan
29c6fb7be1 x86/nmi: export local_touch_nmi() symbol for modules
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-01-17 22:25:57 +08:00
Andrew Cooper
9174adbee4 xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.
This fixes CVE-2013-0190 / XSA-40

There has been an error on the xen_failsafe_callback path for failed
iret, which causes the stack pointer to be wrong when entering the
iret_exc error path.  This can result in the kernel crashing.

In the classic kernel case, the relevant code looked a little like:

        popl %eax      # Error code from hypervisor
        jz 5f
        addl $16,%esp
        jmp iret_exc   # Hypervisor said iret fault
5:      addl $16,%esp
                       # Hypervisor said segment selector fault

Here, there are two identical addls on either option of a branch which
appears to have been optimised by hoisting it above the jz, and
converting it to an lea, which leaves the flags register unaffected.

In the PVOPS case, the code looks like:

        popl_cfi %eax         # Error from the hypervisor
        lea 16(%esp),%esp     # Add $16 before choosing fault path
        CFI_ADJUST_CFA_OFFSET -16
        jz 5f
        addl $16,%esp         # Incorrectly adjust %esp again
        jmp iret_exc

It is possible unprivileged userspace applications to cause this
behaviour, for example by loading an LDT code selector, then changing
the code selector to be not-present.  At this point, there is a race
condition where it is possible for the hypervisor to return back to
userspace from an interrupt, fault on its own iret, and inject a
failsafe_callback into the kernel.

This bug has been present since the introduction of Xen PVOPS support
in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-16 16:17:42 -05:00
Linus Torvalds
2409c873be Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This is mainly a workaround for a bug in Sandy Bridge graphics which
  causes corruption of certain memory pages."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
  x86/Sandy Bridge: mark arrays in __init functions as __initconst
  x86/Sandy Bridge: reserve pages when integrated graphics is present
  x86, efi: correct precedence of operators in setup_efi_pci
2013-01-16 09:11:50 -08:00
Konrad Rzeszutek Wilk
d55bf532d7 Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic."
This reverts commit 41bd956de3.

The fix is incorrect and not appropiate for the latest kernels.
In fact it _causes_ the BUG: scheduling while atomic while
doing vCPU hotplug.

Suggested-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-15 22:41:27 -05:00
John Stultz
e90c83f757 x86: Select HAS_PERSISTENT_CLOCK on x86
Select HAS_PERSISTENT_CLOCK on x86 to simplify RTC options
and allow the compiler to remove unused code.

Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-01-15 18:16:09 -08:00
Bernd Faust
2353b47bff Round the calculated scale factor in set_cyc2ns_scale()
During some experiments with an external clock (in a FPGA), we saw that
the TSC clock drifted approx. 2.5ms per second.

This drift was caused by the current way of calculating the scale.
In our case cpu_khz had a value of 3292725. This resulted in a scale
value of 310. But when doing the calculation by hand it shows that the
actual value is 310.9886188491, so a value of 311 would be more precise.

With this change the value is rounded.

Signed-off-by: Bernd Faust <berndfaust@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-01-15 18:16:07 -08:00
Konrad Rzeszutek Wilk
7bcc1ec077 Linux 3.7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJQxqj1AAoJEHm+PkMAQRiG9MQH/j21UwP2QGpdpXbWAnFMjtlv
 uE/yCFhPoqR1QjjE6oRlO6MHFA41xGDbr5RQki9Ik2AfSYiastt4ZWYvtSJKVTCr
 O0Lj+Cdt/2qBkGiARHqVEBZ4S/l/cw4/EHPb5StFyu3ggnPPQhoPIP7oAmRn0+mh
 NNb5CEcJOLqIaJSteqMP71Q899ncbLayBnimYCaC2f6r00beqNXIqxSHipcPlUsf
 ehNxqCX+5z5Q788EL33EL8GpBcy4Ueevu6nvnuVI8qIEnBnrBVngsiaQ4Hti+2eK
 A//4DYoF2N1wLjQv7hFeiwMURQ16OlxXoc/Z66sv2QQRwPxOIQlxdhWuey4KebA=
 =7LYr
 -----END PGP SIGNATURE-----

Merge tag 'v3.7' into stable/for-linus-3.8

Linux 3.7

* tag 'v3.7': (833 commits)
  Linux 3.7
  Input: matrix-keymap - provide proper module license
  Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage
  ipv4: ip_check_defrag must not modify skb before unsharing
  Revert "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended"
  inet_diag: validate port comparison byte code to prevent unsafe reads
  inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
  inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
  inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
  mm: vmscan: fix inappropriate zone congestion clearing
  vfs: fix O_DIRECT read past end of block device
  net: gro: fix possible panic in skb_gro_receive()
  tcp: bug fix Fast Open client retransmission
  tmpfs: fix shared mempolicy leak
  mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones
  mm: compaction: validate pfn range passed to isolate_freepages_block
  mmc: sh-mmcif: avoid oops on spurious interrupts (second try)
  Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts"
  mmc: sdhci-s3c: fix missing clock for gpio card-detect
  lib/Makefile: Fix oid_registry build dependency
  ...

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Conflicts:
	arch/arm/xen/enlighten.c
	drivers/xen/Makefile

[We need to have the v3.7 base as the 'for-3.8' was based off v3.7-rc3
and there are some patches in v3.7-rc6 that we to have in our branch]
2013-01-15 15:58:25 -05:00
Takuya Yoshikawa
6b81b05e44 KVM: MMU: Conditionally reschedule when kvm_mmu_slot_remove_write_access() takes a long time
If the userspace starts dirty logging for a large slot, say 64GB of
memory, kvm_mmu_slot_remove_write_access() needs to hold mmu_lock for
a long time such as tens of milliseconds.  This patch controls the lock
hold time by asking the scheduler if we need to reschedule for others.

One penalty for this is that we need to flush TLBs before releasing
mmu_lock.  But since holding mmu_lock for a long time does affect not
only the guest, vCPU threads in other words, but also the host as a
whole, we should pay for that.

In practice, the cost will not be so high because we can protect a fair
amount of memory before being rescheduled: on my test environment,
cond_resched_lock() was called only once for protecting 12GB of memory
even without THP.  We can also revisit Avi's "unlocked TLB flush" work
later for completely suppressing extra TLB flushes if needed.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:14:28 +02:00
Takuya Yoshikawa
9d1beefb71 KVM: Make kvm_mmu_slot_remove_write_access() take mmu_lock by itself
Better to place mmu_lock handling and TLB flushing code together since
this is a self-contained function.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:14:17 +02:00
Takuya Yoshikawa
b34cb590fb KVM: Make kvm_mmu_change_mmu_pages() take mmu_lock by itself
No reason to make callers take mmu_lock since we do not need to protect
kvm_mmu_change_mmu_pages() and kvm_mmu_slot_remove_write_access()
together by mmu_lock in kvm_arch_commit_memory_region(): the former
calls kvm_mmu_commit_zap_page() and flushes TLBs by itself.

Note: we do not need to protect kvm->arch.n_requested_mmu_pages by
mmu_lock as can be seen from the fact that it is read locklessly.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:14:09 +02:00
Takuya Yoshikawa
e12091ce7b KVM: Remove unused slot_bitmap from kvm_mmu_page
Not needed any more.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:13:58 +02:00
Takuya Yoshikawa
b99db1d352 KVM: MMU: Make kvm_mmu_slot_remove_write_access() rmap based
This makes it possible to release mmu_lock and reschedule conditionally
in a later patch.  Although this may increase the time needed to protect
the whole slot when we start dirty logging, the kernel should not allow
the userspace to trigger something that will hold a spinlock for such a
long time as tens of milliseconds: actually there is no limit since it
is roughly proportional to the number of guest pages.

Another point to note is that this patch removes the only user of
slot_bitmap which will cause some problems when we increase the number
of slots further.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:13:47 +02:00
Takuya Yoshikawa
245c3912ea KVM: MMU: Remove unused parameter level from __rmap_write_protect()
No longer need to care about the mapping level in this function.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:13:31 +02:00
Takuya Yoshikawa
c972f3b125 KVM: Write protect the updated slot only when dirty logging is enabled
Calling kvm_mmu_slot_remove_write_access() for a deleted slot does
nothing but search for non-existent mmu pages which have mappings to
that deleted memory; this is safe but a waste of time.

Since we want to make the function rmap based in a later patch, in a
manner which makes it unsafe to be called for a deleted slot, we makes
the caller see if the slot is non-zero and being dirty logged.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:13:15 +02:00
H. Peter Anvin
e43b3cec71 x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
early_pci_allowed() and read_pci_config_16() are only available if
CONFIG_PCI is defined.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2013-01-13 20:58:57 -08:00
H. Peter Anvin
ab3cd8670e x86/Sandy Bridge: mark arrays in __init functions as __initconst
Mark static arrays as __initconst so they get removed when the init
sections are flushed.

Reported-by: Mathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/75F4BEE6-CB0E-4426-B40B-697451677738@googlemail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-13 20:36:39 -08:00
Rafael J. Wysocki
6c0cc950ae ACPI / PCI: Set root bridge ACPI handle in advance
The ACPI handles of PCI root bridges need to be known to
acpi_bind_one(), so that it can create the appropriate
"firmware_node" and "physical_node" files for them, but currently
the way it gets to know those handles is not exactly straightforward
(to put it lightly).

This is how it works, roughly:

  1. acpi_bus_scan() finds the handle of a PCI root bridge,
     creates a struct acpi_device object for it and passes that
     object to acpi_pci_root_add().

  2. acpi_pci_root_add() creates a struct acpi_pci_root object,
     populates its "device" field with its argument's address
     (device->handle is the ACPI handle found in step 1).

  3. The struct acpi_pci_root object created in step 2 is passed
     to pci_acpi_scan_root() and used to get resources that are
     passed to pci_create_root_bus().

  4. pci_create_root_bus() creates a struct pci_host_bridge object
     and passes its "dev" member to device_register().

  5. platform_notify(), which for systems with ACPI is set to
     acpi_platform_notify(), is called.

So far, so good.  Now it starts to be "interesting".

  6. acpi_find_bridge_device() is used to find the ACPI handle of
     the given device (which is the PCI root bridge) and executes
     acpi_pci_find_root_bridge(), among other things, for the
     given device object.

  7. acpi_pci_find_root_bridge() uses the name (sic!) of the given
     device object to extract the segment and bus numbers of the PCI
     root bridge and passes them to acpi_get_pci_rootbridge_handle().

  8. acpi_get_pci_rootbridge_handle() browses the list of ACPI PCI
     root bridges and finds the one that matches the given segment
     and bus numbers.  Its handle is then used to initialize the
     ACPI handle of the PCI root bridge's device object by
     acpi_bind_one().  However, this is *exactly* the ACPI handle we
     started with in step 1.

Needless to say, this is quite embarassing, but it may be avoided
thanks to commit f3fd0c8 (ACPI: Allow ACPI handles of devices to be
initialized in advance), which makes it possible to initialize the
ACPI handle of a device before passing it to device_register().

Accordingly, add a new __weak routine, pcibios_root_bridge_prepare(),
defaulting to an empty implementation that can be replaced by the
interested architecutres (x86 and ia64 at the moment) with functions
that will set the root bridge's ACPI handle before its dev member is
passed to device_register().  Make both x86 and ia64 provide such
implementations of pcibios_root_bridge_prepare() and remove
acpi_pci_find_root_bridge() and acpi_get_pci_rootbridge_handle() that
aren't necessary any more.

Included is a fix for breakage on systems with non-ACPI PCI host
bridges from Bjorn Helgaas.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-13 17:14:28 -07:00
Jesse Barnes
a9acc5365d x86/Sandy Bridge: reserve pages when integrated graphics is present
SNB graphics devices have a bug that prevent them from accessing certain
memory ranges, namely anything below 1M and in the pages listed in the
table.  So reserve those at boot if set detect a SNB gfx device on the
CPU to avoid GPU hangs.

Stephane Marchesin had a similar patch to the page allocator awhile
back, but rather than reserving pages up front, it leaked them at
allocation time.

[ hpa: made a number of stylistic changes, marked arrays as static
  const, and made less verbose; use "memblock=debug" for full
  verbosity. ]

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-11 14:26:38 -08:00
Kees Cook
01b35ab723 arch/x86/um: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Jeff Dike <jdike@addtoit.com>
CC: Richard Weinberger <richard@nod.at>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Richard Weinberger <richard@nod.at>
2013-01-11 11:38:05 -08:00
Kees Cook
6ea3038648 arch/x86: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
2013-01-11 11:38:04 -08:00
Xiao Guangrong
7751babd3c KVM: MMU: fix infinite fault access retry
We have two issues in current code:
- if target gfn is used as its page table, guest will refault then kvm will use
  small page size to map it. We need two #PF to fix its shadow page table

- sometimes, say a exception is triggered during vm-exit caused by #PF
  (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
  by the target gfn before go into page fault path, it will cause infinite
  loop:
  delete shadow pages shadowed by the gfn -> try to use large page size to map
  the gfn -> retry the access ->...

To fix these, we can adjust page size early if the target gfn is used as page
table

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-10 15:28:30 -02:00
Xiao Guangrong
c22885050e KVM: MMU: fix Dirty bit missed if CR0.WP = 0
If the write-fault access is from supervisor and CR0.WP is not set on the
vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte
and clears U bit. This is the chance that kvm can change pte access from
readonly to writable

Unfortunately, the pte access is the access of 'direct' shadow page table,
means direct sp.role.access = pte_access, then we will create a writable
spte entry on the readonly shadow page table. It will cause Dirty bit is
not tracked when two guest ptes point to the same large page. Note, it
does not have other impact except Dirty bit since cr0.wp is encoded into
sp.role

It can be fixed by adjusting pte access before establishing shadow page
table. Also, after that, no mmu specified code exists in the common function
and drop two parameters in set_spte

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-10 15:28:08 -02:00
Linus Torvalds
ccae663cd4 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM bugfixes from Marcelo Tosatti.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: use dynamic percpu allocations for shared msrs area
  KVM: PPC: Book3S HV: Fix compilation without CONFIG_PPC_POWERNV
  powerpc: Corrected include header path in kvm_para.h
  Add rcu user eqs exception hooks for async page fault
2013-01-10 09:05:18 -08:00
Daniel J Blueman
8b84c8df38 x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id
Change amd_get_nb_id to return u16 to support >255 memory controllers,
and related consistency fixes.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1353997932-8475-2-git-send-email-daniel@numascale-asia.com
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-10 16:17:58 +01:00
Daniel J Blueman
772c3ff385 x86, AMD, NB: Add multi-domain support
Fix get_node_id to match northbridge IDs from the array of detected
ones, allowing multi-server support such as with Numascale's
NumaConnect, renaming to 'amd_get_node_id' for consistency.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com
[Boris: shorten lines to fit 80 cols]
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-10 16:17:58 +01:00
David Ahern
a706d965dc perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
This patch is brought to you by the letter 'H'.

Commit 20b279 breaks compatiblity with older perf binaries when run with
precise modifier (:p or :pp) by requiring the exclude_guest attribute to be
set. Older binaries default exclude_guest to 0 (ie., wanting guest-based
samples) unless host only profiling is requested (:H modifier). The workaround
for older binaries is to add H to the modifier list (e.g., -e cycles:ppH -
toggles exclude_guest to 1). This was deemed unacceptable by Linus:

https://lkml.org/lkml/2012/12/12/570

Between family in town and the fresh snow in Breckenridge there is no time left
to be working on the proper fix for this over the holidays. In the New Year I
have more pressing problems to resolve -- like some memory leaks in perf which
are proving to be elusive -- although the aforementioned snow is probably why
they are proving to be elusive. Either way I do not have any spare time to work
on this and from the time I have managed to spend on it the solution is more
difficult than just moving to a new exclude_guest flag (does not work) or
flipping the logic to include_guest (which is not as trivial as one would
think).

So, two options: silently force exclude_guest on as suggested by Gleb which
means no impact to older perf binaries or revert the original patch which
caused the breakage.

This patch does the latter -- reverts the original patch that introduced the
regression. The problem can be revisited in the future as time allows.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/r/1356749767-17322-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-10 09:21:19 -03:00
Lv Zheng
0947c6dee3 ACPICA: Update compilation environment settings.
This patch does not affect the generation of the Linux binary.
This patch decreases 300 lines of 20121018 divergence.diff.

This patch updates architecture specific environment settings for compiling
ACPICA as such enhancement already has been done in ACPICA.

Note that the appended compiler default settings in the
<acpi/platform/acenv.h> will deprecate some of the macros defined in the
architecture specific <asm/acpi.h>. Thus two of the <asm/acpi.h> headers
have been cleaned up in this patch accordingly.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-10 12:36:17 +01:00
Avi Kivity
fb864fbc72 KVM: x86 emulator: convert basic ALU ops to fastop
Opcodes:
	TEST
	CMP
	ADD
	ADC
	SUB
	SBB
	XOR
	OR
	AND

Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09 17:39:30 -02:00