Commit graph

18,351 commits

Author SHA1 Message Date
Roy Franz
876dc36ace efi: Add system table pointer argument to shared functions.
Add system table pointer argument to shared EFI stub related functions
so they no longer use a global system table pointer as they did when part
of eboot.c.  For the ARM EFI stub this allows us to avoid global
variables completely and thereby not have to deal with GOT fixups.
Not having the EFI stub fixup its GOT, which is shared with the
decompressor, simplifies the relocating of the zImage to a
bootable address.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:35 +01:00
Roy Franz
7721da4c1e efi: Move common EFI stub code from x86 arch code to common location
No code changes made, just moving functions and #define from x86 arch
directory to common location.  Code is shared using #include, similar
to how decompression code is shared among architectures.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:34 +01:00
Roy Franz
ed37ddffe2 efi: Add proper definitions for some EFI function pointers.
The x86/AMD64 EFI stubs must use a call wrapper to convert between
the Linux and EFI ABIs, so void pointers are sufficient.  For ARM,
the ABIs are compatible, so we can directly invoke the function
pointers.  The functions that are used by the ARM stub are updated
to match the EFI definitions.
Also add some EFI types used by EFI functions.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:33 +01:00
Roy Franz
4172fe2f8a EFI stub documentation updates
Move efi-stub.txt out of x86 directory and into common directory
in preparation for adding ARM EFI stub support.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:32 +01:00
Gleb Natapov
bcd1c29495 KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined
Bit 12 is undefined in any of the following cases:
- If the "NMI exiting" VM-execution control is 1 and the "virtual NMIs"
  VM-execution control is 0.
- If the VM exit sets the valid bit in the IDT-vectoring information field

Signed-off-by: Gleb Natapov <gleb@redhat.com>
[Add parentheses around & within && - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-25 11:38:26 +02:00
Toshi Kani
1cad5e9a39 hotplug / x86: Disable ARCH_CPU_PROBE_RELEASE on x86
Commit d7c53c9e enabled ARCH_CPU_PROBE_RELEASE on x86 in order to
serialize CPU online/offline operations.  Although it is the config
option to enable CPU hotplug test interfaces, probe & release, it is
also the option to enable cpu_hotplug_driver_lock() as well.  Therefore,
this option had to be enabled on x86 with dummy arch_cpu_probe() and
arch_cpu_release().

Since then, lock_device_hotplug() was introduced to serialize CPU
online/offline & hotplug operations.  Therefore, this config option
is no longer required for the serialization.  This patch disables
this config option on x86 and revert the changes made by commit
d7c53c9e.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:10 +02:00
Toshi Kani
574b851e99 hotplug / x86: Add hotplug lock to missing places
lock_device_hotplug[_sysfs]() serializes CPU & Memory online/offline
and hotplug operations.  However, this lock is not held in the debug
interfaces below that initiate CPU online/offline operations.

 - _debug_hotplug_cpu(), cpu0 hotplug test interface enabled by
   CONFIG_DEBUG_HOTPLUG_CPU0.
 - cpu_probe_store() and cpu_release_store(), cpu hotplug test interface
   enabled by CONFIG_ARCH_CPU_PROBE_RELEASE.

This patch changes the above interfaces to hold lock_device_hotplug().

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:09 +02:00
Toshi Kani
f6913f9902 hotplug / x86: Fix online state in cpu0 debug interface
_debug_hotplug_cpu() is a debug interface that puts cpu0 offline during
boot-up when CONFIG_DEBUG_HOTPLUG_CPU0 is set.  After cpu0 is put offline
in this interface, however, /sys/devices/system/cpu/cpu0/online still
shows 1 (online).

This patch fixes _debug_hotplug_cpu() to update dev->offline when CPU
online/offline operation succeeded.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:09 +02:00
Dave Jones
7a20c2fad6 x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround
This seems to have been copied from the Optiplex 990 entry
above, but somoene forgot to change the ident text.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20130925001344.GA13554@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 08:41:10 +02:00
Konrad Rzeszutek Wilk
a945928ea2 xen: Do not enable spinlocks before jump_label_init() has executed
xen_init_spinlocks() currently calls static_key_slow_inc() before
jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is
the case) the effect of this static_key_slow_inc() is deferred until after
jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in
which case the key is set immediately. Thus, depending on the value of config
option, we may observe different behavior.

In addition, when we come to __jump_label_transform() from jump_label_init(),
the key (paravirt_ticketlocks_enabled) is already enabled. On processors where
ideal_nop is not the same as default_nop this will cause a BUG() since it is
expected that before a key is enabled the latter is replaced by the former
during initialization.

To address this problem we need to move
static_key_slow_inc(&paravirt_ticketlocks_enabled) so that it is called
after jump_label_init(). We also need to make sure that this is done before
other cpus start to boot. early_initcall appears to be  a good place to do so.
(Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops
need to be set before alternative_instructions() runs.)

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Added extra comments in the code]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2013-09-24 16:22:26 -04:00
Jan Kiszka
92fbc7b195 KVM: nVMX: Enable unrestricted guest mode support
Now that we provide EPT support, there is no reason to torture our
guests by hiding the relieving unrestricted guest mode feature. We just
need to relax CR0 checks for always-on bits as PE and PG can now be
switched off.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-24 19:12:15 +02:00
Jan Kiszka
10ba54a589 KVM: nVMX: Implement support for EFER saving on VM-exit
Implement and advertise VM_EXIT_SAVE_IA32_EFER. L0 traps EFER writes
unconditionally, so we always find the current L2 value in the
architectural state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-24 19:12:14 +02:00
Jan Kiszka
59ab5a8f44 KVM: nVMX: Do not set identity page map for L2
Fiddling with CR3 for L2 is L1's job. It may set its own, different
identity map or simple leave it alone if unrestricted guest mode is
enabled. This also fixes reading back the current CR3 on L2 exits for
reporting it to L1.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-24 19:12:14 +02:00
Jan Kiszka
9e3e4dbf44 KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0 in load_vmcs12_host_state
kvm_set_cr0 performs checks on the state transition that may prevent
loading L1's cr0. For now we rely on the hardware to catch invalid
states loaded by L1 into its VMCS. Still, consistency checks on the host
state part of the VMCS on guest entry will have to be improved later on.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-24 19:12:13 +02:00
Ingo Molnar
8a1f4653f2 x86/UV: Check for alloc_cpumask_var() failures properly in uv_nmi_setup()
GCC warned about:

   arch/x86/platform/uv/uv_nmi.c: In function ‘uv_nmi_setup’:
   arch/x86/platform/uv/uv_nmi.c:664:2: warning: the address of ‘uv_nmi_cpu_mask’ will always evaluate as ‘true’

The reason is this code:

        alloc_cpumask_var(&uv_nmi_cpu_mask, GFP_KERNEL);
        BUG_ON(!uv_nmi_cpu_mask);

which is not the way to check for alloc_cpumask_var() failures - its
return code should be checked instead.

Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/n/tip-2pXRemsjupmvonbpmmnzleo1@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:52:40 +02:00
Mike Travis
8eba18428a x86/UV: Add uvtrace support
This patch adds support for the uvtrace module by providing a
skeleton call to the registered trace function.  It also
provides another separate 'NMI' tracer that is triggered by the
system wide 'power nmi' command.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212501.185052551@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:04 +02:00
Mike Travis
12ba6c990f x86/UV: Add kdump to UV NMI handler
If a system has hung and it no longer responds to external
events, this patch adds the capability of doing a standard kdump
and system reboot then triggered by the system NMI command.

It is enabled when the nmi action is changed to "kdump" and the
kernel is built with CONFIG_KEXEC enabled.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.660567460@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:03 +02:00
Mike Travis
3c121d9a21 x86/UV: Add summary of cpu activity to UV NMI handler
The standard NMI handler dumps the states of all the cpus.  This
includes a full register dump and stack trace.  This can be way
more information than what is needed.  This patch adds a
"summary" dump that is basically a form of the "ps" command.  It
includes the symbolic IP address as well as the command field
and basic process information.

It is enabled when the nmi action is changed to "ips".

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.507922930@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:03 +02:00
Mike Travis
0d12ef0c90 x86/UV: Update UV support for external NMI signals
The current UV NMI handler has not been updated for the changes
in the system NMI handler and the perf operations.  The UV NMI
handler reads an MMR in the UV Hub to check to see if the NMI
event was caused by the external 'system NMI' that the operator
can initiate on the System Mgmt Controller.

The problem arises when the perf tools are running, causing
millions of perf events per second on very large CPU count
systems.  Previously this was okay because the perf NMI handler
ran at a higher priority on the NMI call chain and if the NMI
was a perf event, it would stop calling other NMI handlers
remaining on the NMI call chain.

Now the system NMI handler calls all the handlers on the NMI
call chain including the UV NMI handler.  This causes the UV NMI
handler to read the MMRs at the same millions per second rate.
This can lead to significant performance loss and possible
system failures.  It also can cause thousands of 'Dazed and
Confused' messages being sent to the system console.  This
effectively makes perf tools unusable on UV systems.

To avoid this excessive overhead when perf tools are running,
this code has been optimized to minimize reading of the MMRs as
much as possible, by moving to the NMI_UNKNOWN notifier chain.
This chain is called only when all the users on the standard
NMI_LOCAL call chain have been called and none of them have
claimed this NMI.

There is an exception where the NMI_LOCAL notifier chain is
used.  When the perf tools are in use, it's possible that the UV
NMI was captured by some other NMI handler and then either
ignored or mistakenly processed as a perf event.  We set a
per_cpu ('ping') flag for those CPUs that ignored the initial
NMI, and then send them an IPI NMI signal.  The NMI_LOCAL
handler on each cpu does not need to read the MMR, but instead
checks the in memory flag indicating it was pinged.  There are
two module variables, 'ping_count' indicating how many requested
NMI events occurred, and 'ping_misses' indicating how many stray
NMI events.  These most likely are perf events so it shows the
overhead of the perf NMI interrupts and how many MMR reads were avoided.

This patch also minimizes the reads of the MMRs by having the
first cpu entering the NMI handler on each node set a per HUB
in-memory atomic value.  (Having a per HUB value avoids sending
lock traffic over NumaLink.)  Both types of UV NMIs from the SMI
layer are supported.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.353547733@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:02 +02:00
Mike Travis
1e019421bc x86/UV: Move NMI support
This patch moves the UV NMI support from the x2apic file to a
new separate uv_nmi.c file in preparation for the next sequence
of patches. It prevents upcoming bloat of the x2apic file, and
has the added benefit of putting the upcoming /sys/module
parameters under the name 'uv_nmi' instead of 'x2apic_uv_x',
which was obscure.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.183295611@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:02 +02:00
Jiang Liu
7e1f85f96d x86 / ACPI: simplify _acpi_map_lsapic()
In acpi_register_lapic(), it will generates a new logical cpu
number and maps to the local APIC id, this logical cpu number
can be returned to simplify _acpi_map_lsapic() implementation.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:39:40 +02:00
Jiang Liu
d536bf3dc9 ACPI / processor: use apic_id and remove duplicated _MAT evaluation
Since APIC id is saved in processor struct, just use it and
remove the duplicated _MAT evaluation.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:39:39 +02:00
Zhang Rui
b408a05493 olpc_xo15_sci: convert acpi_evaluate_object() to acpi_execute_simple_method()
acpi_execute_simple_method() is a new ACPI API introduced to invoke
an ACPI control method that has single integer parameter and no return value.

Convert acpi_evaluate_object() to acpi_execute_simple_method()
in arch/x86/platform/olpc/olpc-xo15-sci.c

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
CC: Daniel Drake <dsd@laptop.org>
CC: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:37:53 +02:00
Yijing Wang
f8a26fe637 x86/pci: Use cached pci_dev->pcie_cap to simplify code
The PCI core caches the PCIe Capability offset in pci_dev->pcie_cap, so
use that instead of pci_find_capability().  Use pci_bus_set_ops() when
replacing the device pci_ops.  And use #defines instead of numeric
constants.

[bhelgaas: changelog, also use PCI_EXP_LNKCTL_ASPMC]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-09-23 17:30:03 -06:00
Masoud Sharbiani
4f0acd31c3 x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically
Dell PowerEdge C6100 machines fail to completely reboot about 20% of the time.

Signed-off-by: Masoud Sharbiani <msharbiani@twitter.com>
Signed-off-by: Vinson Lee <vlee@twitter.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1379717947-18042-1-git-send-email-vlee@freedesktop.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-23 10:26:08 +02:00
Yan, Zheng
cf3b425dd8 perf/x86/intel: Add model number for Avoton Silvermont
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1379837953-17755-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-23 10:22:00 +02:00
Peter Zijlstra
92519bbc8a perf/x86/intel: Fix build warning in intel_pmu_drain_pebs_nhm()
Fengguang Wu reported this build warning:

    arch/x86/kernel/cpu/perf_event_intel_ds.c: In function 'intel_pmu_drain_pebs_nhm':
    arch/x86/kernel/cpu/perf_event_intel_ds.c:964:2: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'int'

Because pointer arithmetics result type is bitness dependent there's no natural
type to use here, cast it to long.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-jbpauwxJqtf24luewcsdFith@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 13:15:27 +02:00
Ingo Molnar
ee0424218c * Fix WARNING on i386 by only enabling a workaround for x86-64 since
we've never encountered the bug on i386 - Josh Boyer
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJSObASAAoJEC84WcCNIz1VziUP+gOaYTf/RQSpXW6cNYMTsgHQ
 e125s6WB2auNvPM0AuH+SCqK3Nia3hRovpMR2Pv0GirWzDVxrFfOiyD3n9jM/gdk
 m+5CoxZnA+ka5mgOGN7xWhCd5gr8bkdqgOoehrQW4lsA3F7uYBZqopLPRuV3W5Oo
 axpRkIgH1gwux5hPnddNohvYb7JfGugQgLsZRZI+kzZ4DfPQU/mklJVydcynjiYW
 NZVwr6g1gQwZ263/BtLcrEcBGTQ4GaynXgq3knBEawdw55Myzfj6S6Q0wwkGPfL5
 YhzxJPsXIvmlMhE+YBn0ONBW/5bvTk+0/XFW2/KZLPqhqGJbguQNYMH/4DVq9qHv
 XemOvMhzIiLFaKutVvEXa5DqADmBHaPHEAjgwbF6TFWCD9GU3tCNuQ336Rdl+5l+
 lDM5PWWdw6w6F4XpjEmJzNB49ZHlnCcka5SV0YWLbgAEEKjnzY3L9flp3B7hEQZg
 F0kP/IVOh+rDCDQkGLX+u2jaBdTCIwatctA1KJgNXhUPFZwLWLLNeKb+FkzjJkUz
 hd0407NvN76/7X6440bL0sfaEuTW+CdTSAPdnNQ2YmO//0PWhHRsVAr/sQxpnAGT
 IPfTaF/7AMjhrp95afz4fHvcCWIuOF9TEXG2Kp87Em4a+A/0Sq/tjhDJwOmpY0fz
 nRqAu6gfZ3Su7BgY5zf4
 =nm0S
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fix from Matt Flemin:

" * Fix WARNING on i386 by only enabling a workaround for x86-64 since
    we've never encountered the bug on i386 - Josh Boyer "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 09:52:09 +02:00
Peter Zijlstra
fa73158710 perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
Solve the problems around the broken definition of perf_event_mmap_page::
cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
fixed by:

  860f085b74 ("perf: Fix broken union in 'struct perf_event_mmap_page'")

The problem with the fix (merged in v3.12-rc1 and not yet released
officially), noticed by Vince Weaver is that the new behavior is
not detectable by new user-space, and that due to the reuse of the
field names it's easy to mis-compile a binary if old headers are used
on a new kernel or new headers are used on an old kernel.

To solve all that make this change explicit, detectable and self-contained,
by iterating the ABI the following way:

 - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
   confuse old user-space binaries. RDPMC will be marked as unavailable
   to old binaries but that's within the ABI, this is a capability bit.

 - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - Use bits 2, 3, 4 for the newly defined, correct functionality:

	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
	cap_user_time		: 1, /* The time_* fields are used */
	cap_user_time_zero	: 1, /* The time_zero field is used */

 - Rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

The 'size' field can then be used in the future to add new fields and it
will act as a natural ABI version indicator as well.

Also adjust tools/perf/ userspace for the new definitions, noticed by
Adrian Hunter.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Also-Fixed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 09:45:11 +02:00
Yan, Zheng
73c4427c6c perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group()
uncore_validate_group() can't call smp_processor_id() because it is
in preemptible context. Pass NUMA_NO_NODE to the allocator instead.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379400493-11505-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 06:54:36 +02:00
Peter Zijlstra
eb8417aa70 perf/x86/intel: Remove division from the intel_pmu_drain_pebs_nhm() hot path
Only do the division in case we have to print the result out in a warning.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-43nl31erfbajwpfj254f6zji@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 06:53:44 +02:00
Linus Torvalds
f42bcf1aa8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/lpss: Add pin control support to Intel low power subsystem
  perf/x86/intel: Mark MEM_LOAD_UOPS_MISS_RETIRED as precise on SNB
  x86: Remove now-unused save_rest()
  x86/smpboot: Fix announce_cpu() to printk() the last "OK" properly
2013-09-18 11:26:17 -05:00
Linus Torvalds
186844b292 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two small fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix UAPI export of PERF_EVENT_IOC_ID
  perf/x86/intel: Fix Silvermont offcore masks
2013-09-18 11:22:53 -05:00
Josh Boyer
700870119f x86, efi: Don't map Boot Services on i386
Add patch to fix 32bit EFI service mapping (rhbz 726701)

Multiple people are reporting hitting the following WARNING on i386,

  WARNING: at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x3d3/0x440()
  Modules linked in:
  Pid: 0, comm: swapper Not tainted 3.9.0-rc7+ #95
  Call Trace:
   [<c102b6af>] warn_slowpath_common+0x5f/0x80
   [<c1023fb3>] ? __ioremap_caller+0x3d3/0x440
   [<c1023fb3>] ? __ioremap_caller+0x3d3/0x440
   [<c102b6ed>] warn_slowpath_null+0x1d/0x20
   [<c1023fb3>] __ioremap_caller+0x3d3/0x440
   [<c106007b>] ? get_usage_chars+0xfb/0x110
   [<c102d937>] ? vprintk_emit+0x147/0x480
   [<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de
   [<c102406a>] ioremap_cache+0x1a/0x20
   [<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de
   [<c1418593>] efi_enter_virtual_mode+0x1e4/0x3de
   [<c1407984>] start_kernel+0x286/0x2f4
   [<c1407535>] ? repair_env_string+0x51/0x51
   [<c1407362>] i386_start_kernel+0x12c/0x12f

Due to the workaround described in commit 916f676f8 ("x86, efi: Retain
boot service code until after switching to virtual mode") EFI Boot
Service regions are mapped for a period during boot. Unfortunately, with
the limited size of the i386 direct kernel map it's possible that some
of the Boot Service regions will not be directly accessible, which
causes them to be ioremap()'d, triggering the above warning as the
regions are marked as E820_RAM in the e820 memmap.

There are currently only two situations where we need to map EFI Boot
Service regions,

  1. To workaround the firmware bug described in 916f676f8
  2. To access the ACPI BGRT image

but since we haven't seen an i386 implementation that requires either,
this simple fix should suffice for now.

[ Added to changelog - Matt ]

Reported-by: Bryan O'Donoghue <bryan.odonoghue.lkml@nexus-software.ie>
Acked-by: Tom Zanussi <tom.zanussi@intel.com>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-18 14:42:33 +01:00
Gleb Natapov
0be9c7a89f KVM: VMX: set "blocked by NMI" flag if EPT violation happens during IRET from NMI
Set "blocked by NMI" flag if EPT violation happens during IRET from NMI
otherwise NMI can be called recursively causing stack corruption.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-09-17 19:09:47 +03:00
Gleb Natapov
72f857950f KVM: nEPT: reset PDPTR register cache on nested vmentry emulation
After nested vmentry stale cache can be used to reload L2 PDPTR pointers
which will cause L2 guest to fail. Fix it by invalidating cache on nested
vmentry emulation.

https://bugzilla.kernel.org/show_bug.cgi?id=60830

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-17 12:52:42 +03:00
Paolo Bonzini
ba6a354154 KVM: mmu: allow page tables to be in read-only slots
Page tables in a read-only memory slot will currently cause a triple
fault because the page walker uses gfn_to_hva and it fails on such a slot.

OVMF uses such a page table; however, real hardware seems to be fine with
that as long as the accessed/dirty bits are set.  Save whether the slot
is readonly, and later check it when updating the accessed and dirty bits.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-17 12:52:31 +03:00
Bruce Rogers
3261107ebf KVM: x86 emulator: emulate RETF imm
Opcode CA

This gets used by a DOS based NetWare guest.

Signed-off-by: Bruce Rogers <brogers@suse.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-17 12:51:35 +03:00
Mathias Nyman
0f531431d3 x86/intel/lpss: Add pin control support to Intel low power subsystem
x86 chips with LPSS (low power subsystem) such as Lynxpoint and
Baytrail have SoC like peripheral support and controllable pins.

At the moment, Baytrail needs the pinctrl-baytrail driver to let
peripherals control their gpio resources, but more pincontrol
functions such as pin muxing and grouping are possible to add
later.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: http://lkml.kernel.org/r/1379080949-21734-1-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-14 08:06:28 +02:00
Stephane Eranian
9d8e3f9693 perf/x86/intel: Mark MEM_LOAD_UOPS_MISS_RETIRED as precise on SNB
On Intel SNB (SNB, SNB-EP), the event MEM_LOAD_UOPS_MISS_RETIRED
supports PEBS. It was missing for the SNB PEBS event constraint
table thereby preventing any measurement with PEBS for it.

This patch adds the event to the PEBS table for SNB.

WARNING: it should be noted that this event like a few others
are subject to the erratum BT241 for Xeon E5 (SNB-EP). As such,
the event may undercount when used with PEBS unless the
workaround is implemented. But without this patch and just the
workaround, the kernel would not allow precise sampling on this
event. BT241 is documented in:

  http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdf

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20130913201646.GA23981@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-14 08:00:18 +02:00
Martin Schwidefsky
0244ad004a Remove GENERIC_HARDIRQ config option
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-13 15:09:52 +02:00
Linus Torvalds
ac4de9543a Merge branch 'akpm' (patches from Andrew Morton)
Merge more patches from Andrew Morton:
 "The rest of MM.  Plus one misc cleanup"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
  mm/Kconfig: add MMU dependency for MIGRATION.
  kernel: replace strict_strto*() with kstrto*()
  mm, thp: count thp_fault_fallback anytime thp fault fails
  thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()
  thp: do_huge_pmd_anonymous_page() cleanup
  thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
  mm: cleanup add_to_page_cache_locked()
  thp: account anon transparent huge pages into NR_ANON_PAGES
  truncate: drop 'oldsize' truncate_pagecache() parameter
  mm: make lru_add_drain_all() selective
  memcg: document cgroup dirty/writeback memory statistics
  memcg: add per cgroup writeback pages accounting
  memcg: check for proper lock held in mem_cgroup_update_page_stat
  memcg: remove MEMCG_NR_FILE_MAPPED
  memcg: reduce function dereference
  memcg: avoid overflow caused by PAGE_ALIGN
  memcg: rename RESOURCE_MAX to RES_COUNTER_MAX
  memcg: correct RESOURCE_MAX to ULLONG_MAX
  mm: memcg: do not trap chargers with full callstack on OOM
  mm: memcg: rework and document OOM waiting and wakeup
  ...
2013-09-12 15:44:27 -07:00
Johannes Weiner
3a13c4d761 x86: finish user fault error path with fatal signal
The x86 fault handler bails in the middle of error handling when the
task has a fatal signal pending.  For a subsequent patch this is a
problem in OOM situations because it relies on pagefault_out_of_memory()
being called even when the task has been killed, to perform proper
per-task OOM state unwinding.

Shortcutting the fault like this is a rather minor optimization that
saves a few instructions in rare cases.  Just remove it for
user-triggered faults.

Use the opportunity to split the fault retry handling from actual fault
errors and add locking documentation that reads suprisingly similar to
ARM's.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
Johannes Weiner
759496ba64 arch: mm: pass userspace fault flag to generic fault handler
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
Linus Torvalds
26935fb06e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 4 from Al Viro:
 "list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
  super: fix for destroy lrus
  list_lru: dynamically adjust node arrays
  shrinker: Kill old ->shrink API.
  shrinker: convert remaining shrinkers to count/scan API
  staging/lustre/libcfs: cleanup linux-mem.h
  staging/lustre/ptlrpc: convert to new shrinker API
  staging/lustre/obdclass: convert lu_object shrinker to count/scan API
  staging/lustre/ldlm: convert to shrinkers to count/scan API
  hugepage: convert huge zero page shrinker to new shrinker API
  i915: bail out earlier when shrinker cannot acquire mutex
  drivers: convert shrinkers to new count/scan API
  fs: convert fs shrinkers to new scan/count API
  xfs: fix dquot isolation hang
  xfs-convert-dquot-cache-lru-to-list_lru-fix
  xfs: convert dquot cache lru to list_lru
  xfs: rework buffer dispose list tracking
  xfs-convert-buftarg-lru-to-generic-code-fix
  xfs: convert buftarg LRU to generic code
  fs: convert inode and dentry shrinking to be node aware
  vmscan: per-node deferred work
  ...
2013-09-12 15:01:38 -07:00
Linus Torvalds
75acebf242 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Various fixes.

  The -g perf report lockup you reported is only partially addressed,
  patches that fix the excessive runtime are still being worked on"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix uncore PCI fixed counter handling
  uprobes: Fix utask->depth accounting in handle_trampoline()
  perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING
  perf: Fix up MMAP2 buffer space reservation
  perf tools: Add attr->mmap2 support
  perf kvm: Fix sample_type manipulation
  perf evlist: Fix id pos in perf_evlist__open()
  perf trace: Handle perf.data files with no tracepoints
  perf session: Separate progress bar update when processing events
  perf trace: Check if MAP_32BIT is defined
  perf hists: Fix formatting of long symbol names
  perf evlist: Fix parsing with no sample_id_all bit set
  perf tools: Add test for parsing with no sample_id_all bit
  perf trace: Check control+C more often
2013-09-12 10:44:54 -07:00
Ingo Molnar
7f2ee91f54 perf/x86/intel: Clean up EVENT_ATTR_STR() muck
Make the code a bit more readable by removing stray whitespaces et al.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-lzEnychz1ylqy8zjenxOmeht@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:17:50 +02:00
Peter Zijlstra
d2beea4a34 perf/x86/intel: Clean-up/reduce PEBS code
Get rid of some pointless duplication introduced by the Haswell code.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-8q6y4davda9aawwv5yxe7klp@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:37 +02:00
Peter Zijlstra
2b9e344df3 perf/x86/intel: Clean up checkpoint-interrupt bits
Clean up the weird CP interrupt exception code by keeping a CP mask.

Andi suggested this implementation but weirdly didn't actually
implement it himself, do so now because it removes the conditional in
the interrupt handler and avoids the assumption its only on cnt2.

Suggested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-dvb4q0rydkfp00kqat4p5bah@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:37 +02:00
Andi Kleen
4b2c4f1f1b perf/x86/intel: Add Haswell TSX event aliases
Add TSX event aliases, and export them from the kernel to perf.

These are used by perf stat -T and to allow
more user friendly access to events. The events are designed to
be fairly generic and may also apply to other architectures
implementing HTM.  They all cover common situations that
happens during tuning of transactional code.

For Haswell we have to separate the HLE and RTM events,
as they are separate in the PMU.

This adds the following events:

 tx-start	Count start transaction (used by perf stat -T)
 tx-commit	Count commit of transaction
 tx-abort	Count all aborts
 tx-conflict	Count aborts due to conflict with another CPU.
 tx-capacity	Count capacity aborts (transaction too large)

Then matching el-* events for HLE

 cycles-t	Transactional cycles (used by perf stat -T)
  * also exists on POWER8

 cycles-ct	Transactional cycles commited (used by perf stat -T)
  * according to Michael Ellerman POWER8 has a cycles-transactional-committed,
  * perf stat -T handles both cases

Note for useful abort profiling often precise has to be set,
as Haswell can only report the point inside the transaction
with precise=2.

For some classes of aborts, like conflicts, this is not needed,
as it makes more sense to look at the complete critical section.

This gives a clean set of generalized events to examine transaction
success and aborts. Haswell has additional events for TSX, but those
are more specialized for very specific situations.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:36 +02:00