Commit graph

2,985 commits

Author SHA1 Message Date
Ingo Molnar
936c663aed Merge branch 'perf/x86' into perf/core, because it's ready
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:46:19 +01:00
Ingo Molnar
072e5a1cfa Merge branch 'perf/urgent' into perf/core, to pick up fixes and to refresh the tree
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:46:03 +01:00
Andi Kleen
294fe0f52a perf/x86/intel: Add INST_RETIRED.ALL workarounds
On Broadwell INST_RETIRED.ALL cannot be used with any period
that doesn't have the lowest 6 bits cleared. And the period
should not be smaller than 128.

This is erratum BDM11 and BDM55:

  http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf

BDM11: When using a period < 100; we may get incorrect PEBS/PMI
interrupts and/or an invalid counter state.
BDM55: When bit0-5 of the period are !0 we may get redundant PEBS
records on overflow.

Add a new callback to enforce this, and set it for Broadwell.

How does this handle the case when an app requests a specific
period with some of the bottom bits set?

Short answer:

Any useful instruction sampling period needs to be 4-6 orders
of magnitude larger than 128, as an PMI every 128 instructions
would instantly overwhelm the system and be throttled.
So the +-64 error from this is really small compared to the
period, much smaller than normal system jitter.

Long answer (by Peterz):

IFF we guarantee perf_event_attr::sample_period >= 128.

Suppose we start out with sample_period=192; then we'll set period_left
to 192, we'll end up with left = 128 (we truncate the lower bits). We
get an interrupt, find that period_left = 64 (>0 so we return 0 and
don't get an overflow handler), up that to 128. Then we trigger again,
at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
an overflow). We increment with sample_period so we get left = 128. We
fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
overflow). And on and on.

So while the individual interrupts are 'wrong' we get then with
interval=256,128 in exactly the right ratio to average out at 192. And
this works for everything >=128.

So the num_samples*fixed_period thing is still entirely correct +- 127,
which is good enough I'd say, as you already have that error anyhow.

So no need to 'fix' the tools, al we need to do is refuse to create
INST_RETIRED:ALL events with sample_period < 128.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Updated comments and changelog a bit. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:14:03 +01:00
Andi Kleen
91f1b70582 perf/x86/intel: Add Broadwell core support
Add Broadwell support for Broadwell to perf.

The basic support is very similar to Haswell. We use the new cache
event list added for Haswell earlier. The only differences
are a few bits related to remote nodes. To avoid an extra,
mostly identical, table these are patched up in the initialization code.

The constraint list has one new event that needs to be handled over Haswell.

Includes code and testing from Kan Liang.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:14:02 +01:00
Andi Kleen
0f1b5ca240 perf/x86/intel: Add new cache events table for Haswell
Haswell offcore events are quite different from Sandy Bridge.
Add a new table to handle Haswell properly.

Note that the offcore bits listed in the SDM are not quite correct
(this is currently being fixed). An uptodate list of bits is
in the patch.

The basic setup is similar to Sandy Bridge. The prefetch columns
have been removed, as prefetch counting is not very reliable
on Haswell. One L1 event that is not in the event list anymore
has been also removed.

- data reads do not include code reads (comparable to earlier Sandy Bridge tables)
- data counts include speculative execution (except L1 write, dtlb, bpu)
- remote node access includes both remote memory, remote cache, remote mmio.
- prefetches are not included in the counts for consistency
  (different from Sandy Bridge, which includes prefetches in the remote node)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Removed the HSM30 comments; we don't have them for SNB/IVB either. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:14:01 +01:00
Ingo Molnar
d56fe4bf5f x86/asm/entry/64: Always set up SYSENTER MSRs
On CONFIG_IA32_EMULATION=y kernels we set up
MSR_IA32_SYSENTER_CS/ESP/EIP, but on !CONFIG_IA32_EMULATION
kernels we leave them unchanged.

Clear them to make sure the instruction is disabled properly.

SYSCALL is set up properly in both cases.

Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 20:57:25 +01:00
Denys Vlasenko
ef593260f0 x86/asm/entry: Get rid of KERNEL_STACK_OFFSET
PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.

Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.

Ironically, none of them benefits from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.

This patch eliminates KERNEL_STACK_OFFSET.

PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Hopefully we can merge it later with 32-bit specific
PER_CPU_VAR(cpu_current_top_of_stack) variable...

Net result in generated code is that constants in several insns
are changed.

This change is necessary for changing struct pt_regs creation
in SYSCALL64 code path from MOV to PUSH instructions.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 19:42:38 +01:00
Aravind Gopalakrishnan
43eaa2a1ad x86/mce: Define mce_severity function pointer
Rename mce_severity() to mce_severity_intel() and assign the
mce_severity function pointer to mce_severity_amd() during init on AMD.
This way, we can avoid a test to call mce_severity_amd every time we get
into mce_severity(). And it's cleaner to do it this way.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Suggested-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1427125373-2918-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-24 12:14:15 +01:00
Aravind Gopalakrishnan
bf80bbd7dc x86/mce: Add an AMD severities-grading function
Add a severities function that caters to AMD processors. This allows us
to do some vendor-specific work within the function if necessary.

Also, introduce a vendor flag bitfield for vendor-specific settings. The
severities code uses this to define error scope based on the prescence
of the flags field.

This is based off of work by Boris Petkov.

Testing details:
Fam10h, Model 9h (Greyhound)
Fam15h: Models 0h-0fh (Orochi), 30h-3fh (Kaveri) and 60h-6fh (Carrizo),
Fam16h Model 00h-0fh (Kabini)

Boris:
Intel SNB
AMD K8 (JH-E0)

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/1427125373-2918-2-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Fixup build, clean up comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-24 12:13:34 +01:00
Denys Vlasenko
a76c7f4604 x86/asm/entry/64: Fold syscall32_cpu_init() into its sole user
Having syscall32/sysenter32 initialization in a separate tiny
function, called from within a function that is already syscall
init specific, serves no real purpose.

Its existense also caused an unintended effect of having
wrmsrl(MSR_CSTAR) performed twice: once we set it to a dummy
function returning -ENOSYS, and immediately after
(if CONFIG_IA32_EMULATION), we set it to point to the proper
syscall32 entry point, ia32_cstar_target.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 08:20:51 +01:00
Andy Lutomirski
383f3af3f8 x86/asm/entry, perf: Explicitly optimize vm86 handling in code_segment_base()
There's no point in checking the VM bit on 64-bit, and, since
we're explicitly checking it, we can use user_mode_ignore_vm86()
after the check.

While we're at it, rearrange the #ifdef slightly to make the code
flow a bit clearer.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/dc1457a734feccd03a19bb3538a7648582f57cdd.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 11:13:41 +01:00
Peter Zijlstra
50f16a8bf9 perf: Remove type specific target pointers
The only reason CQM had to use a hard-coded pmu type was so it could use
cqm_target in hw_perf_event.

Do away with the {tp,bp,cqm}_target pointers and provide a non type
specific one.

This allows us to do away with that silly pmu type as well.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vince Weaver <vince@deater.net>
Cc: acme@kernel.org
Cc: acme@redhat.com
Cc: hpa@zytor.com
Cc: jolsa@redhat.com
Cc: kanaka.d.juvva@intel.com
Cc: matt.fleming@intel.com
Cc: tglx@linutronix.de
Cc: torvalds@linux-foundation.org
Cc: vikas.shivappa@linux.intel.com
Link: http://lkml.kernel.org/r/20150305211019.GU21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:58:04 +01:00
Matt Fleming
4e16ed9941 perf/x86/intel: Fix Makefile to actually build the cqm driver
Someone fat fingered a merge conflict and lost the Makefile hunk.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <acme@redhat.com>
Cc: <hpa@zytor.com>
Cc: <jolsa@redhat.com>
Cc: <kanaka.d.juvva@intel.com>
Cc: <tglx@linutronix.de>
Cc: <torvalds@linux-foundation.org>
Cc: <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1424976420.15321.35.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:58:03 +01:00
Sudeep Holla
37dea8c52c x86/cpu/cacheinfo: Fix cache_get_priv_group() for Intel processors
The private pointer provided by the cacheinfo code is used to implement
the AMD L3 cache-specific attributes using a pointer to the northbridge
descriptor. It is needed for performing L3-specific operations and for
that we need a couple of PCI devices and other service information, all
contained in the northbridge descriptor.

This results in failure of cacheinfo setup as shown below as
cache_get_priv_group() returns the uninitialised private attributes which
are not valid for Intel processors.

  ------------[ cut here ]------------
  WARNING: CPU: 3 PID: 1 at fs/sysfs/group.c:102
  internal_create_group+0x151/0x280()
  sysfs: (bin_)attrs not set by subsystem for group: index3/
  Modules linked in:
  CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc3+ #1
  Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
  ...
  Call Trace:
    dump_stack
    warn_slowpath_common
    warn_slowpath_fmt
    internal_create_group
    sysfs_create_groups
    device_add
    cpu_device_create
    ? __kmalloc
    cache_add_dev
    cacheinfo_sysfs_init
    ? container_dev_init
    do_one_initcall
    kernel_init_freeable
    ? rest_init
    kernel_init
    ret_from_fork
    ? rest_init

This patch fixes the issue by checking if the L3 cache indices are
populated correctly (AMD-specific) before initializing the private
attributes.

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:22:38 +01:00
Borislav Petkov
c9ce871283 x86/mce: Reindent __mcheck_cpu_apply_quirks() properly
Had some strange 3 tabs + 2 chars indentation, probably from me. Fix it.

No code changed:

  # arch/x86/kernel/cpu/mcheck/mce.o:

   text    data     bss     dec     hex filename
  21371    5923     264   27558    6ba6 mce.o.before
  21371    5923     264   27558    6ba6 mce.o.after

md5:
   eb3996c84d15e08ed836f043df2cbb01  mce.o.before.asm
   eb3996c84d15e08ed836f043df2cbb01  mce.o.after.asm

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:16:44 +01:00
Jesse Larrew
f77ac507f8 x86/mce: Use safe MSR accesses for AMD quirk
Certain MSRs are only relevant to a kernel in host mode, and kvm had
chosen not to implement these MSRs at all for guests. If a guest kernel
ever tried to access these MSRs, the result was a general protection
fault.

KVM will be separately patched to return 0 when these MSRs are read,
and this patch ensures that MSR accesses are tolerant of exceptions.

Signed-off-by: Jesse Larrew <jesse.larrew@amd.com>
[ Drop {} braces around loop ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joel Schopp <joel.schopp@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/1426262619-5016-1-git-send-email-jesse.larrew@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:16:43 +01:00
Andy Lutomirski
c56716af8d x86/asm/entry, perf: Fix incorrect TIF_IA32 check in code_segment_base()
We want to check whether user code is in 32-bit mode, not
whether the task is nominally 32-bit.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/33e5107085ce347a8303560302b15c2cadd62c4c.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:08:21 +01:00
Ingo Molnar
8b6c0ab1a1 x86/asm/entry: Document and clean up the enable_sep_cpu() and syscall32_cpu_init() functions
Clean up the flow and document the functions a bit better.

Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:29 +01:00
Denys Vlasenko
d828c71fba x86/asm/entry/32: Document the 32-bit SYSENTER "emergency stack" better
Before the patch, the 'tss_struct::stack' field was not referenced anywhere.

It was used only to set SYSENTER's stack to point after the last byte
of tss_struct, thus the trailing field, stack[64], was used.

But grep would not know it. You can comment it out, compile,
and kernel will even run until an unlucky NMI corrupts
io_bitmap[] (which is also not easily detectable).

This patch changes code so that the purpose and usage of this
field is not mysterious anymore, and can be easily grepped for.

This does change generated code, for a subtle reason:
since tss_struct is ____cacheline_aligned, there happens to be
5 longs of padding at the end. Old code was using the padding
too; new code will strictly use it only for SYSENTER_stack[].

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425912738-559-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:29 +01:00
Ingo Molnar
56544d29c3 Linux 4.0-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU/NacAAoJEHm+PkMAQRiGdUcIAJU5dHclwd9HRc7LX5iOwYN6
 mN0aCsYjMD8Pjx2VcPCgJvkIoESQO5pkwYpFFWCwILup1bVEidqXfr8EPOdThzdh
 kcaT0FwUvd19K+0jcKVNCX1RjKBtlUfUKONk6sS2x4RrYZpv0Ur8Gh+yXV8iMWtf
 fAusNEYlxQJvEz5+NSKw86EZTr4VVcykKLNvj+/t/JrXEuue7IG8EyoAO/nLmNd2
 V/TUKKttqpE6aUVBiBDmcMQl2SUVAfp5e+KJAHmizdDpSE80nU59UC1uyV8VCYdM
 qwHXgttLhhKr8jBPOkvUxl4aSXW7S0QWO8TrMpNdEOeB3ZB8AKsiIuhe1JrK0ro=
 =Xkue
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc3' into x86/build, to refresh an older tree before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-13 14:21:04 +01:00
Sudeep Holla
0d55ba46bf x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure
This patch removes the redundant sysfs cacheinfo code by reusing
the newly introduced generic cacheinfo infrastructure through the
commit

  246246cbde ("drivers: base: support cpu cache information
		 interface to userspace via sysfs")

The private pointer provided by the cacheinfo is used to implement
the AMD L3 cache-specific attributes.

Note that with v4.0-rc1, commit

  513e3d2d11 ("cpumask: always use nr_cpu_ids in formatting and parsing
		 functions")

in particular changes from long format to shorter one for all cpumasks
sysfs entries. As the consequence of the same, even the shared_cpu_map
in the cacheinfo sysfs was also changed.

This patch neither alters any existing sysfs entries nor their
formating, however since the generic cacheinfo has switched to use the
device attributes instead of the traditional raw kobjects, a directory
named "power" along with its standard attributes are added similar to
any other device.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Link: http://lkml.kernel.org/r/1425470416-20691-1-git-send-email-sudeep.holla@arm.com
[ Add a check for uninitialized this_cpu_ci for the cpu_has_topoext case too
  in __cache_amd_cpumap_setup() ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-09 09:32:24 +01:00
Andy Lutomirski
a7fcf28d43 x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it on x86_32
I broke 32-bit kernels.  The implementation of sp0 was correct
as far as I can tell, but sp0 was much weirder on x86_32 than I
realized.  It has the following issues:

 - Init's sp0 is inconsistent with everything else's: non-init tasks
   are offset by 8 bytes.  (I have no idea why, and the comment is unhelpful.)

 - vm86 does crazy things to sp0.

Fix it up by replacing this_cpu_sp0() with
current_top_of_stack() and using a new percpu variable to track
the top of the stack on x86_32.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 75182b1632 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")
Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-07 09:34:03 +01:00
Andy Lutomirski
24933b82c0 x86/asm/entry: Rename 'init_tss' to 'cpu_tss'
It has nothing to do with init -- there's only one TSS per cpu.

Other names considered include:

 - current_tss: Confusing because we never switch the tss.
 - singleton_tss: Too long.

This patch was generated with 's/init_tss/cpu_tss/g'.  Followup
patches will fix INIT_TSS and INIT_TSS_IST by hand.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Ingo Molnar
f8e92fb4b0 A more involved rework of the alternatives framework to be able to
pad instructions and thus make using the alternatives macros more
 straightforward and without having to figure out old and new instruction
 sizes but have the toolchain figure that out for us.
 
 Furthermore, it optimizes JMPs used so that fetch and decode can be
 relieved with smaller versions of the JMPs, where possible.
 
 Some stats:
 
 x86_64 defconfig:
 
 Alternatives sites total:               2478
 Total padding added (in Bytes):         6051
 
 The padding is currently done for:
 
 X86_FEATURE_ALWAYS
 X86_FEATURE_ERMS
 X86_FEATURE_LFENCE_RDTSC
 X86_FEATURE_MFENCE_RDTSC
 X86_FEATURE_SMAP
 
 This is with the latest version of the patchset. Of course, on each
 machine the alternatives sites actually being patched are a proper
 subset of the total number.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU9ekpAAoJEBLB8Bhh3lVKyjYP/AiHEiHkkjnpwTt49kUtUMI6
 GIlGfJVNjp5LLnSRD/fkL/wdkBgQtMzr9O1g8Qi/lbFqxsOFteU9f1OtLx34ZwZw
 MhtdiHcrKGMsaIxTJh4FaqPHBT5ussm2yn1jlAX+LgILd3dpqe3oytsO8JihcK9j
 t2u9V/Lq92TV7zXxGgWJsPc86WhhgdldlU3X96S++Di18bnDaKbGkzthU6WzZG/H
 qtFZ5bfK8TlVHYduft+D9ZPzFYGp1WCOa03qU4+Djaxw02HDB6Ltysend9zg0lB1
 RT/BP0PwHD3mOL11qpgtV1ChCbR8FJMN/z5+YdSNJgzDQA0H5Sf0UueTweosfAz+
 /iC5t/wkegdYtqtA0nKVypYOJCS+UdfMZXenYgtSUJl6drB6I5BCW4mVft3AuWo+
 EilPGpblvmjWRx1HiF4/Q/5zrSWHzmKQDyXuyxI9m0OUxAGAM0+8CY6wOqRA5pX+
 /f5MjZ1hXELQGhl5Qdj4nqJacICGevJ8WYdZ53B+uYVxz7fbXk9hSYcZKT94UshD
 qSdaV4XJSuC7pDKqiWoNWXp5N1g+D2BgfwoQEr/RnodFZRlfc+cmOv/visak0OLr
 E/pp1vJvCi3+T3ImX1MCDiXmflQtFctiL3hNgMXYK2IGhJb2RDC2bFeZkksOHuAE
 BGgrn+usQDjVlikEnfI3
 =0KXp
 -----END PGP SIGNATURE-----

Merge tag 'alternatives_padding' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/asm

Pull alternative instructions framework improvements from Borislav Petkov:

 "A more involved rework of the alternatives framework to be able to
  pad instructions and thus make using the alternatives macros more
  straightforward and without having to figure out old and new instruction
  sizes but have the toolchain figure that out for us.

  Furthermore, it optimizes JMPs used so that fetch and decode can be
  relieved with smaller versions of the JMPs, where possible.

  Some stats:

    x86_64 defconfig:

    Alternatives sites total:               2478
    Total padding added (in Bytes):         6051

  The padding is currently done for:

    X86_FEATURE_ALWAYS
    X86_FEATURE_ERMS
    X86_FEATURE_LFENCE_RDTSC
    X86_FEATURE_MFENCE_RDTSC
    X86_FEATURE_SMAP

  This is with the latest version of the patchset. Of course, on each
  machine the alternatives sites actually being patched are a proper
  subset of the total number."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:36:15 +01:00
Ingo Molnar
25efdcb43c The first part of the scrubbing of the intel early microcode loader.
There's more work to come but let's unload this pile first.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU9LsjAAoJEBLB8Bhh3lVKVIMP/0xUfRb/wV8P+HtJ6St41G8/
 OygkO/D4UJcZiZP2xrxNix5/waExpegtEIqJPbO6Wlyq0N+0imCZxcsPsgmdw2Zo
 CQv6eu4p0on/v8EUFTJV9+ZHqs6zhch1tNQMfLuq+nXH7f8okSzbDL25RWoo8QU5
 qOOHhHwjyQzivC1KpEodwVte1nT/KLNFio3moRwONKM0/1xCBHyvK4us6QqifWow
 hsDnVNdoXqTzqhY43u7zxNcSzo/RMCq/4sc90augdCdZFAVbKzPGM0o0Pq5FQTmv
 MbMuGF80LhfMFln0tBv30IGFuEc54BBD/x3d7YYadyeX0jIE4T27Pe4xbVLoTDTM
 T8PnaNn/sUyiBUGYO5ff5niMwzpqnuwgKi3wSe4fJ0HIHVE/SAHQogoomP1EKb59
 n66RTWV5eE9KkzdZCTdXhm8aalLK8QfbSlElOrbqwr7/qfFslNpnNUzRwhaHoN1k
 kk5PJ8PipZR/YmWapIU7K6lEZQoRixAb+StMiOvX5n++d+Z7d2/Mu3UqebpqlvUc
 nVFpbPpB9FuH0XQPfICQEUvfWf+MTP9cPw5OkMu+Zo4ok8fIt7MdHCtQJVa9d0R/
 3ZMc9daDgwU8bEYoMRn6xHSGaUjtyO6AUeFhDM7b7If9YWDg59H7nufa5ySkM5KB
 xhWDhQYiSjhdvToKKF/e
 =Jlz9
 -----END PGP SIGNATURE-----

Merge tag 'intel_microcode_cleanup_p1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/microcode

Pull x86 microcode loader code cleanups from Borislav Petkov:

  "The first part of the scrubbing of the intel early microcode loader.
   There's more work to come but let's unload this pile first."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-03 13:53:16 +01:00
Borislav Petkov
a858b5e504 x86/microcode/intel: Fix printing of microcode blobs in show_saved_mc()
When doing

  echo 1 > /sys/devices/system/cpu/microcode/reload

in order to reload microcode, I get:

  microcode: Total microcode saved: 1
  BUG: using smp_processor_id() in preemptible [00000000] code: bash/2606
  caller is debug_smp_processor_id+0x17/0x20
  CPU: 1 PID: 2606 Comm: bash Not tainted 3.19.0-rc7+ #9
  Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
   ffffffff81a4266d ffff8802131db808 ffffffff81666588 0000000000000007
   0000000000000001 ffff8802131db838 ffffffff812e6eef ffff8802131db868
   00000000000306a9 0000000000000010 0000000000000015 ffff8802131db848
  Call Trace:
   dump_stack
   check_preemption_disabled
   debug_smp_processor_id
   show_saved_mc
   ? save_microcode.constprop.8
   save_mc_for_early
   ? print_context_stack
   ? dump_trace
   ? __bfs
   ? mark_held_locks
   ? get_page_from_freelist
   ? trace_hardirqs_on_caller
   ? trace_hardirqs_on
   ? __alloc_pages_nodemask
   ? __get_vm_area_node
   ? map_vm_area
   ? __vmalloc_node_range
   ? generic_load_microcode
   generic_load_microcode
   ? microcode_fini_cpu
   request_microcode_fw
   reload_store
   dev_attr_store
   sysfs_kf_write
   kernfs_fop_write
   vfs_write
   ? sysret_check
   SyS_write
   system_call_fastpath
  microcode: CPU1: sig=0x306a9, pf=0x10, rev=0x15
  microcode: mc_saved[0]: sig=0x306a9, pf=0x12, rev=0x1b, toal size=0x3000, date = 2014-05-29

because we're using smp_processor_id() in preemtible context. And we
don't really need to use it there because the microcode container we're
dumping is global and CPU-specific info is irrelevant.

While at it, make pr_* stuff use "microcode: " prefix for easier
grepping and document how to enable the DEBUG build.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:34 +01:00
Borislav Petkov
4f1f605cfe x86/microcode/intel: Check scan_microcode()'s retval
... and do not attempt to load anything in case of error.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:20 +01:00
Borislav Petkov
140f74fced x86/microcode/intel: Sanitize microcode_pointer()
Shorten variable names and rename it to what it does.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:16 +01:00
Borislav Petkov
e3d8f67476 x86/microcode/intel: Move mc arg last in get_matching_{microcode|sig}
... arguments list so that it comes more natural for those functions to
have the signature, processor flags and revision together, before the
rest of the args.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:13 +01:00
Borislav Petkov
9e02bb46d3 x86/microcode/intel: Simplify generic_load_microcode_early()
* remove state variable and out label
* get rid of completely unused mc_size
* shorten variable names
* get rid of local variables
* don't do assignments in local var declarations for less cluttered code
* finally rename it to the shorter and perfectly fine load_microcode_early()

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:10 +01:00
Borislav Petkov
58ce8d6d3a x86/microcode: Consolidate family,model, ... code
... to the header. Split the family acquiring function into a
main one, doing CPUID and a helper which computes the extended
family and is used in multiple places. Get rid of the locally-grown
get_x86_{family,model}().

While at it, rename local variables to something more descriptive and
vertically align assignments for better readability.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:07 +01:00
Borislav Petkov
4f5e5f2b57 x86/microcode/intel: Rename update_match_revision()
... to revision_is_newer() and push it up into the header and make it an
inline function.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:03 +01:00
Borislav Petkov
c868570e74 x86/microcode/intel: Sanitize _save_mc()
Shorten local variable names for better readability and flatten loop
indentation levels.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:00 +01:00
Borislav Petkov
a5de5e242b x86/microcode/intel: Make _save_mc() return the updated saved count
... of microcode patches instead of handing in a pointer which is used
for I/O in an otherwise void function.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:31:56 +01:00
Borislav Petkov
02f35177fb x86/microcode/intel: Simplify load_ucode_intel_bsp()
Don't compute start and end from start and size in order to compute size
again down the path in scan_microcode(). So pass size directly instead
and simplify a bunch. Shorten variable names and remove useless ones.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:31:51 +01:00
Borislav Petkov
2d48bb9b6e x86/microcode/intel: Get rid of last arg to load_ucode_intel_bsp()
Allocate it on the helper's _load_ucode_intel_bsp() stack instead and do
not hand it down.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:31:48 +01:00
Borislav Petkov
f9524e6f54 x86/microcode/intel: Do the mc_saved_src NULL check first
... and only then deref it. Also, shorten some variable names and rename
others so as to diminish the ubiquitous presence of the "mc_" prefix
everywhere and make it a bit more readable.

Use kcalloc so that we don't kfree() uninitialized memory on the unwind
path, as suggested by Quentin.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
2015-03-02 20:31:11 +01:00
Borislav Petkov
776d3cdc93 x86/microcode/intel: Check if microcode was found before applying
We should check the return value of the routines fishing out the proper
microcode and not try to apply if we haven't found a suitable blob.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:31:03 +01:00
Quentin Casasnovas
d496a002ae x86/microcode/intel: Fix out of bounds memory access to the extended header
Improper pointer arithmetics when calculating the address of the
extended header could lead to an out of bounds memory read and kernel
panic.

Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Link: http://lkml.kernel.org/r/20150225094125.GB30434@chrystal.uk.oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:30:42 +01:00
Steven Rostedt
5b2bdbc845 x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too
Commit:

   1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")

added a shadow CR4 such that reads and writes that do not
modify the CR4 execute much faster than always reading the
register itself.

The change modified cpu_init() in common.c, so that the
shadow CR4 gets initialized before anything uses it.

Unfortunately, there's two cpu_init()s in common.c. There's
one for 64-bit and one for 32-bit. The commit only added
the shadow init to the 64-bit path, but the 32-bit path
needs the init too.

Link: http://lkml.kernel.org/r/20150227125208.71c36402@gandalf.local.home Fixes: 1e02ce4ccc "x86: Store a per-cpu shadow copy of CR4"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150227145019.2bdd4354@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-28 08:04:20 +01:00
Ingo Molnar
5838d18955 Merge branch 'linus' into x86/urgent, to merge dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-28 08:03:10 +01:00
Ingo Molnar
e9e4e44309 Linux 34.0-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU6pFJAAoJEHm+PkMAQRiG2OwH/24nDK+l9zkaRs0xJsVh+qiW
 8A2N1od0ickz43iMk48jfeWGkFOkd4izyvan/daJshJOE1Y5lCdSs7jq/OXVOv9L
 G0+KQUoC5NL0hqYKn1XJPFluNQ1yqMvrDwQt99grDGzruNGBbwHuBhAQmgzpj1nU
 do8KrGjr7ft1Rzm4mOAdET/ExWiF+mRSJSxxOv598HbsIRdM5wgn0hHjPlqDxmLN
 KH4r3YYEm0cHyjf4Krse0+YdhqdamRGJlmYxJgEsYNwCoMwkmHlLTc71diseUhrg
 r/VYIYQvpAA6Yvgw8rJ0N5gk/sJJig+WyyPhfQuc2bD5sbL9eO7mPnz2UP7z7ss=
 =vXB6
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc1' into perf/core, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-26 12:24:50 +01:00
Matt Fleming
59bf7fd45c perf/x86/intel: Enable conflicting event scheduling for CQM
We can leverage the workqueue that we use for RMID rotation to support
scheduling of conflicting monitoring events. Allowing events that
monitor conflicting things is done at various other places in the perf
subsystem, so there's precedent there.

An example of two conflicting events would be monitoring a cgroup and
simultaneously monitoring a task within that cgroup.

This uses the cache_groups list as a queuing mechanism, where every
event that reaches the front of the list gets the chance to be scheduled
in, possibly descheduling any conflicting events that are running.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-10-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:36 +01:00
Matt Fleming
bff671dba7 perf/x86/intel: Perform rotation on Intel CQM RMIDs
There are many use cases where people will want to monitor more tasks
than there exist RMIDs in the hardware, meaning that we have to perform
some kind of multiplexing.

We do this by "rotating" the RMIDs in a workqueue, and assigning an RMID
to a waiting event when the RMID becomes unused.

This scheme reserves one RMID at all times for rotation. When we need to
schedule a new event we give it the reserved RMID, pick a victim event
from the front of the global CQM list and wait for the victim's RMID to
drop to zero occupancy, before it becomes the new reserved RMID.

We put the victim's RMID onto the limbo list, where it resides for a
"minimum queue time", which is intended to save ourselves an expensive
smp IPI when the RMID is unlikely to have a occupancy value below
__intel_cqm_threshold.

If we fail to recycle an RMID, even after waiting the minimum queue time
then we need to increment __intel_cqm_threshold. There is an upper bound
on this threshold, __intel_cqm_max_threshold, which is programmable from
userland as /sys/devices/intel_cqm/max_recycling_threshold.

The comments above __intel_cqm_rmid_rotate() have more details.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-9-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:35 +01:00
Matt Fleming
bfe1fcd268 perf/x86/intel: Support task events with Intel CQM
Add support for task events as well as system-wide events. This change
has a big impact on the way that we gather LLC occupancy values in
intel_cqm_event_read().

Currently, for system-wide (per-cpu) events we defer processing to
userspace which knows how to discard all but one cpu result per package.

Things aren't so simple for task events because we need to do the value
aggregation ourselves. To do this, we defer updating the LLC occupancy
value in event->count from intel_cqm_event_read() and do an SMP
cross-call to read values for all packages in intel_cqm_event_count().
We need to ensure that we only do this for one task event per cache
group, otherwise we'll report duplicate values.

If we're a system-wide event we want to fallback to the default
perf_event_count() implementation. Refactor this into a common function
so that we don't duplicate the code.

Also, introduce PERF_TYPE_INTEL_CQM, since we need a way to track an
event's task (if the event isn't per-cpu) inside of the Intel CQM PMU
driver.  This task information is only availble in the upper layers of
the perf infrastructure.

Other perf backends stash the target task in event->hw.*target so we
need to do something similar. The task is used to determine whether
events should share a cache group and an RMID.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/1422038748-21397-8-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:34 +01:00
Matt Fleming
35298e554c perf/x86/intel: Implement LRU monitoring ID allocation for CQM
It's possible to run into issues with re-using unused monitoring IDs
because there may be stale cachelines associated with that ID from a
previous allocation. This can cause the LLC occupancy values to be
inaccurate.

To attempt to mitigate this problem we place the IDs on a least recently
used list, essentially a FIFO. The basic idea is that the longer the
time period between ID re-use the lower the probability that stale
cachelines exist in the cache.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-7-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:33 +01:00
Matt Fleming
4afbb24ce5 perf/x86/intel: Add Intel Cache QoS Monitoring support
Future Intel Xeon processors support a Cache QoS Monitoring feature that
allows tracking of the LLC occupancy for a task or task group, i.e. the
amount of data in pulled into the LLC for the task (group).

Currently the PMU only supports per-cpu events. We create an event for
each cpu and read out all the LLC occupancy values.

Because this results in duplicate values being written out to userspace,
we also export a .per-pkg event file so that the perf tools only
accumulate values for one cpu per package.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-6-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:32 +01:00
Peter P Waskiewicz Jr
cbc82b1726 x86: Add support for Intel Cache QoS Monitoring (CQM) detection
This patch adds support for the new Cache QoS Monitoring (CQM)
feature found in future Intel Xeon processors.  It includes the
new values to track CQM resources to the cpuinfo_x86 structure,
plus the CPUID detection routines for CQM.

CQM allows a process, or set of processes, to be tracked by the CPU
to determine the cache usage of that task group.  Using this data
from the CPU, software can be written to extract this data and
report cache usage and occupancy for a particular process, or
group of processes.

More information about Cache QoS Monitoring can be found in the
Intel (R) x86 Architecture Software Developer Manual, section 17.14.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Webb <chris@arachsys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Honeyman <stevenhoneyman@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-5-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:31 +01:00
Borislav Petkov
a930dc4543 x86/asm: Cleanup prefetch primitives
This is based on a patch originally by hpa.

With the current improvements to the alternatives, we can simply use %P1
as a mem8 operand constraint and rely on the toolchain to generate the
proper instruction sizes. For example, on 32-bit, where we use an empty
old instruction we get:

  apply_alternatives: feat: 6*32+8, old: (c104648b, len: 4), repl: (c195566c, len: 4)
  c104648b: alt_insn: 90 90 90 90
  c195566c: rpl_insn: 0f 0d 4b 5c

  ...

  apply_alternatives: feat: 6*32+8, old: (c18e09b4, len: 3), repl: (c1955948, len: 3)
  c18e09b4: alt_insn: 90 90 90
  c1955948: rpl_insn: 0f 0d 08

  ...

  apply_alternatives: feat: 6*32+8, old: (c1190cf9, len: 7), repl: (c1955a79, len: 7)
  c1190cf9: alt_insn: 90 90 90 90 90 90 90
  c1955a79: rpl_insn: 0f 0d 0d a0 d4 85 c1

all with the proper padding done depending on the size of the
replacement instruction the compiler generates.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
2015-02-23 13:44:17 +01:00
Yannick Guerrini
a927792c19 x86/cpu/intel: Fix trivial typo in intel_tlb_table[]
Change 'ssociative' to 'associative'

Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr>
Cc: Borislav Petkov <bp@suse.de>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Chris Bainbridge <chris.bainbridge@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Steven Honeyman <stevenhoneyman@gmail.com>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1424558510-1420-1-git-send-email-yguerrini@tomshardware.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-22 08:55:58 +01:00