Commit graph

233 commits

Author SHA1 Message Date
Linus Torvalds
07f2d8c63f Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:
 "The main changes in this cycle were:

   - Simplify the CMCI storm logic on Intel CPUs after yet another
     report about a race in the code (Borislav Petkov)

   - Enable the MCE threshold irq on AMD CPUs by default (Aravind
     Gopalakrishnan)

   - Add AMD-specific MCE-severity grading function.  Further error
     recovery actions will be based on its output (Aravind Gopalakrishnan)

   - Documentation updates (Borislav Petkov)

   - ... assorted fixes and cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/severity: Fix warning about indented braces
  x86/mce: Define mce_severity function pointer
  x86/mce: Add an AMD severities-grading function
  x86/mce: Reindent __mcheck_cpu_apply_quirks() properly
  x86/mce: Use safe MSR accesses for AMD quirk
  x86/MCE/AMD: Enable thresholding interrupts by default if supported
  x86/MCE: Make mce_panic() fatal machine check msg in the same pattern
  x86/MCE/intel: Cleanup CMCI storm logic
  Documentation/acpi/einj: Correct and streamline text
  x86/MCE/AMD: Drop bogus const modifier from AMD's bank4_names()
2015-04-13 13:33:20 -07:00
Aravind Gopalakrishnan
43eaa2a1ad x86/mce: Define mce_severity function pointer
Rename mce_severity() to mce_severity_intel() and assign the
mce_severity function pointer to mce_severity_amd() during init on AMD.
This way, we can avoid a test to call mce_severity_amd every time we get
into mce_severity(). And it's cleaner to do it this way.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Suggested-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1427125373-2918-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-24 12:14:15 +01:00
Aravind Gopalakrishnan
bf80bbd7dc x86/mce: Add an AMD severities-grading function
Add a severities function that caters to AMD processors. This allows us
to do some vendor-specific work within the function if necessary.

Also, introduce a vendor flag bitfield for vendor-specific settings. The
severities code uses this to define error scope based on the prescence
of the flags field.

This is based off of work by Boris Petkov.

Testing details:
Fam10h, Model 9h (Greyhound)
Fam15h: Models 0h-0fh (Orochi), 30h-3fh (Kaveri) and 60h-6fh (Carrizo),
Fam16h Model 00h-0fh (Kabini)

Boris:
Intel SNB
AMD K8 (JH-E0)

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/1427125373-2918-2-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Fixup build, clean up comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-24 12:13:34 +01:00
Borislav Petkov
c9ce871283 x86/mce: Reindent __mcheck_cpu_apply_quirks() properly
Had some strange 3 tabs + 2 chars indentation, probably from me. Fix it.

No code changed:

  # arch/x86/kernel/cpu/mcheck/mce.o:

   text    data     bss     dec     hex filename
  21371    5923     264   27558    6ba6 mce.o.before
  21371    5923     264   27558    6ba6 mce.o.after

md5:
   eb3996c84d15e08ed836f043df2cbb01  mce.o.before.asm
   eb3996c84d15e08ed836f043df2cbb01  mce.o.after.asm

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:16:44 +01:00
Jesse Larrew
f77ac507f8 x86/mce: Use safe MSR accesses for AMD quirk
Certain MSRs are only relevant to a kernel in host mode, and kvm had
chosen not to implement these MSRs at all for guests. If a guest kernel
ever tried to access these MSRs, the result was a general protection
fault.

KVM will be separately patched to return 0 when these MSRs are read,
and this patch ensures that MSR accesses are tolerant of exceptions.

Signed-off-by: Jesse Larrew <jesse.larrew@amd.com>
[ Drop {} braces around loop ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joel Schopp <joel.schopp@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/1426262619-5016-1-git-send-email-jesse.larrew@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:16:43 +01:00
Ingo Molnar
fa45a45ca3 Merge tag 'ras_for_3.21' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras
Pull RAS updates from Borislav Petkov:

 "- Enable AMD thresholding IRQ by default if supported. (Aravind Gopalakrishnan)

  - Unify mce_panic() message pattern. (Derek Che)

  - A bit more involved simplification of the CMCI logic after yet another
    report about race condition with the adaptive logic. (Borislav Petkov)

  - ACPI APEI EINJ fleshing out of the user documentation. (Borislav Petkov)

  - Minor cleanup. (Jan Beulich.)"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19 13:31:33 +01:00
Derek Che
8af7043a3c x86/MCE: Make mce_panic() fatal machine check msg in the same pattern
There is another mce_panic call with "Fatal machine check on current CPU" in
the same mce.c file, why not keep them all in same pattern

	mce_panic("Fatal machine check on current CPU", &m, msg);

Signed-off-by: Derek Che <drc@yahoo-inc.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19 13:24:47 +01:00
Borislav Petkov
3f2f0680d1 x86/MCE/intel: Cleanup CMCI storm logic
Initially, this started with the yet another report about a race
condition in the CMCI storm adaptive period length thing. Yes, we have
to admit, it is fragile and error prone. So let's simplify it.

The simpler logic is: now, after we enter storm mode, we go straight to
polling with CMCI_STORM_INTERVAL, i.e. once a second. We remain in storm
mode as long as we see errors being logged while polling.

Theoretically, if we see an uninterrupted error stream, we will remain
in storm mode indefinitely and keep polling the MSRs.

However, when the storm is actually a burst of errors, once we have
logged them all, we back out of it after ~5 mins of polling and no more
errors logged.

If we encounter an error during those 5 minutes, we reset the polling
interval to 5 mins.

Making machine_check_poll() return a bool and denoting whether it has
seen an error or not lets us simplify a bunch of code and move the storm
handling private to mce_intel.c.

Some minor cleanups while at it.

Reported-by: Calvin Owens <calvinowens@fb.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1417746575-23299-1-git-send-email-calvinowens@fb.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19 13:24:25 +01:00
Linus Torvalds
d96c757efa Fix regression - functions on the mce notifier chain should
not be able to decide that an event should not be logged
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU47aGAAoJEKurIx+X31iBiX4P+wfq7uUKwQ4riD2jFppvhrcm
 W2Qx/iIv9QN77ZIw5I45VqGbDKXmlThl41ISem9BlKd8jKaldY3lUlQMfrPC5V11
 9bl/7LsZoQLlbuwYR6uiLdKqW9wd6d5Y1mczdSDM5wCtfMw/s+C/ETzuRVHsQZjF
 1LTB0rb0NPouX+y3D8aDrvk9Os5ozZsz3N/y6e3TsI/wV8d3rqwH8C8x3RjB2Evx
 3WRSwoSOq9kHEbeg1r7PMKYKWAoJs97Kwo4EgJELqn8fxYMWnSsoDZGr9P2PX8oT
 TKgSFPnhgCLw+qlWy81MM8hutnHKnN6oXcJKzE0nHtD8JlJ/M/HdDAPIg8G9aLIn
 ABxPg6OORs/4YJQYGFA8ixx3TfIMspMU2m9KGoCcerpGaHCHhrlylJyheUvhRkPP
 u8pjGz+31d3bVVRzCLJt1eqo3H/y0wcURWaemk23lcUIsdDqisjZDzZrZxyZuWaH
 eDTKmHsZB/I4wnOs4Ke+U7oo/u+NtBzPmBSJcshgKSONLPd7bSJtjckLoa3wSf5I
 q5DkZgxrUYkO6tIoAAi/N2tc/2qkjTOug79BP9YKN3elmv3nxW4gieSaUb5p16bd
 ORpZLt3SDKEPv+79a/1e7ZyR7ik9Evhzc+/72M1IZvmdr3uOjj+xkuE1uRU7vuvX
 44Jmx6mEtPK1mVBfwkdG
 =jAUs
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-fixmcelog' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull mcelog regression fix from Tony Luck:
 "Fix regression - functions on the mce notifier chain should not be
  able to decide that an event should not be logged"

* tag 'please-pull-fixmcelog' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  x86/mce: Fix regression. All error records should report via /dev/mcelog
2015-02-17 17:03:07 -08:00
Linus Torvalds
37507717de Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf updates from Ingo Molnar:
 "This series tightens up RDPMC permissions: currently even highly
  sandboxed x86 execution environments (such as seccomp) have permission
  to execute RDPMC, which may leak various perf events / PMU state such
  as timing information and other CPU execution details.

  This 'all is allowed' RDPMC mode is still preserved as the
  (non-default) /sys/devices/cpu/rdpmc=2 setting.  The new default is
  that RDPMC access is only allowed if a perf event is mmap-ed (which is
  needed to correctly interpret RDPMC counter values in any case).

  As a side effect of these changes CR4 handling is cleaned up in the
  x86 code and a shadow copy of the CR4 value is added.

  The extra CR4 manipulation adds ~ <50ns to the context switch cost
  between rdpmc-capable and rdpmc-non-capable mms"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
  perf/x86: Only allow rdpmc if a perf_event is mapped
  perf: Pass the event to arch_perf_update_userpage()
  perf: Add pmu callbacks to track event mapping and unmapping
  x86: Add a comment clarifying LDT context switching
  x86: Store a per-cpu shadow copy of CR4
  x86: Clean up cr4 manipulation
2015-02-16 14:58:12 -08:00
Linus Torvalds
e07e0d4cb0 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The changes in this cycle were:

   - allow mmcfg access to APEI error injection handlers

   - improve MCE error messages

   - smaller cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, mce: Fix sparse errors
  x86, mce: Improve timeout error messages
  ACPI, EINJ: Enhance error injection tolerance level
2015-02-09 18:22:04 -08:00
Tony Luck
a2413d8b29 x86/mce: Fix regression. All error records should report via /dev/mcelog
I'm getting complaints from validation teams that have updated their
Linux kernels from ancient versions to current. They don't see the
error logs they expect. I tell the to unload any EDAC drivers[1], and
things start working again.  The problem is that we short-circuit
the logging process if any function on the decoder chain claims to
have dealt with the problem:

	ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
	if (ret == NOTIFY_STOP)
		return;

The logic we used when we added this code was that we did not want
to confuse users with double reports of the same error.

But it turns out users are not confused - they are upset that they
don't see a log where their tools used to find a log.

I could also get into a long description of how the consumer of this
log does more than just decode model specific details of the error.
It keeps counts, tracks thresholds, takes actions and runs scripts
that can alert administrators to problems.

[1] We've recently compounded the problem because the acpi_extlog
driver also registers for this notifier and also returns NOTIFY_STOP.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-02-09 09:36:53 -08:00
Andy Lutomirski
375074cc73 x86: Clean up cr4 manipulation
CR4 manipulation was split, seemingly at random, between direct
(write_cr4) and using a helper (set/clear_in_cr4).  Unfortunately,
the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code,
which only a small subset of users actually wanted.

This patch replaces all cr4 access in functions that don't leave cr4
exactly the way they found it with new helpers cr4_set_bits,
cr4_clear_bits, and cr4_set_bits_and_update_boot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:41 +01:00
Luck, Tony
d4812e169d x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
We now switch to the kernel stack when a machine check interrupts
during user mode.  This means that we can perform recovery actions
in the tail of do_machine_check()

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-07 07:47:42 -08:00
Andy Lutomirski
9592747538 x86, traps: Track entry into and exit from IST context
We currently pretend that IST context is like standard exception
context, but this is incorrect.  IST entries from userspace are like
standard exceptions except that they use per-cpu stacks, so they are
atomic.  IST entries from kernel space are like NMIs from RCU's
perspective -- they are not quiescent states even if they
interrupted the kernel during a quiescent state.

Add and use ist_enter and ist_exit to track IST context.  Even
though x86_32 has no IST stacks, we track these interrupts the same
way.

This fixes two issues:

 - Scheduling from an IST interrupt handler will now warn.  It would
   previously appear to work as long as we got lucky and nothing
   overwrote the stack frame.  (I don't know of any bugs in this
   that would trigger the warning, but it's good to be on the safe
   side.)

 - RCU handling in IST context was dangerous.  As far as I know,
   only machine checks were likely to trigger this, but it's good to
   be on the safe side.

Note that the machine check handlers appears to have been missing
any context tracking at all before this patch.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-02 10:22:46 -08:00
Borislav Petkov
83737691e5 x86, mce: Fix sparse errors
Make stuff used in mce.c only, static.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-22 21:04:31 +01:00
Andy Lutomirski
6c80f87ed4 x86, mce: Improve timeout error messages
There are four different possible types of timeouts.  Distinguish
them in the logs to help debug them.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/0fa6d2653a54a01c48b43a3583caf950ea99606e.1419178397.git.luto@amacapital.net
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-22 17:47:45 +01:00
Borislav Petkov
c7c9b3929b x86/mce: Spell "panicked" correctly
We need the additional "k" to make it a hard-c:

  https://en.wiktionary.org/wiki/panicked

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417642605-15730-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 11:12:46 +01:00
Chen Yucong
fa92c58694 x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
Uncorrected no action required (UCNA) - is a uncorrected recoverable
machine check error that is not signaled via a machine check exception
and, instead, is reported to system software as a corrected machine
check error. UCNA errors indicate that some data in the system is
corrupted, but the data has not been consumed and the processor state
is valid and you may continue execution on this processor. UCNA errors
require no action from system software to continue execution. Note that
UCNA errors are supported by the processor only when IA32_MCG_CAP[24]
(MCG_SER_P) is set.
                                               -- Intel SDM Volume 3B

Deferred errors are errors that cannot be corrected by hardware, but
do not cause an immediate interruption in program flow, loss of data
integrity, or corruption of processor state. These errors indicate
that data has been corrupted but not consumed. Hardware writes information
to the status and address registers in the corresponding bank that
identifies the source of the error if deferred errors are enabled for
logging. Deferred errors are not reported via machine check exceptions;
they can be seen by polling the MCi_STATUS registers.
                                                -- AMD64 APM Volume 2

Above two items, both UCNA and Deferred errors belong to detected
errors, but they can't be corrected by hardware, and this is very
similar to Software Recoverable Action Optional (SRAO) errors.
Therefore, we can take some actions that have been used for handling
SRAO errors to handle UCNA and Deferred errors.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-11-19 10:56:51 -08:00
Chen Yucong
e3480271f5 x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.

This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.

In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.

Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-11-19 10:55:43 -08:00
Christoph Lameter
89cbc76768 x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	__this_cpu_inc(y)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:49 -04:00
Daniel Walter
164109e3cd arch/x86: replace strict_strto calls
Replace obsolete strict_strto calls with appropriate kstrto calls

Signed-off-by: Daniel Walter <dwalter@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:28 -07:00
Borislav Petkov
27c934158c x86, MCE: Robustify mcheck_init_device
BorisO reports that misc_register() fails often on xen. The current code
unregisters the CPU hotplug notifier in that case. If then a CPU is
offlined and onlined back again, we end up with a second timer running
on that CPU, leading to soft lockups and system hangs.

So let's leave the hotcpu notifier always registered - even if
mce_device_create failed for some cores and never unreg it so that we
can deal with the timer handling accordingly.

Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-06-24 15:17:01 +02:00
Borislav Petkov
38356c1fbd x86, MCE: Kill CPU_POST_DEAD
In conjunction with cleaning up CPU hotplug, we want to get rid of
CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the
end of CPU_DEAD.

Link: http://lkml.kernel.org/r/http://lkml.kernel.org/r/1400750624-19238-1-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-06-22 18:36:39 +02:00
Chen Yucong
65eb71823b hwpoison: remove unused global variable in do_machine_check()
Remove an unused global variable mce_entry and relative operations in
do_machine_check().

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:11 -07:00
Borislav Petkov
716079f66e mce: Panic when a core has reached a timeout
There is very little and maybe practically nothing we can do to recover
from a system where at least one core has reached a timeout during the
whole monarch cores gathering. So panic when that happens.

Link: http://lkml.kernel.org/r/20140523091041.GA21332@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-05-30 22:05:31 +02:00
Mathieu Souchaud
9c15a24b03 x86/mce: Improve mcheck_init_device() error handling
Check return code of every function called by mcheck_init_device().

Signed-off-by: Mathieu Souchaud <mattieu.souchaud@free.fr>
Link: http://lkml.kernel.org/r/1399151031-19905-1-git-send-email-mattieu.souchaud@free.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-05-30 22:01:40 +02:00
Ingo Molnar
ea431643d6 x86/mce: Fix CMCI preemption bugs
The following commit:

  27f6c573e0 ("x86, CMCI: Add proper detection of end of CMCI storms")

Added two preemption bugs:

 - machine_check_poll() does a get_cpu_var() without a matching
   put_cpu_var(), which causes preemption imbalance and crashes upon
   bootup.

 - it does percpu ops without disabling preemption. Preemption is not
   disabled due to the mistaken use of a raw spinlock.

To fix these bugs fix the imbalance and change
cmci_discover_lock to a regular spinlock.

Reported-by: Owen Kibel <qmewlo@gmail.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Todorov <atodorov@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
--
 arch/x86/kernel/cpu/mcheck/mce.c       |    4 +---
 arch/x86/kernel/cpu/mcheck/mce_intel.c |   18 +++++++++---------
 2 files changed, 10 insertions(+), 12 deletions(-)
2014-04-17 10:28:42 +02:00
Linus Torvalds
8eab6cd031 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This is a collection of minor fixes for x86, plus the IRET information
  leak fix (forbid the use of 16-bit segments in 64-bit mode)"

NOTE! We may have to relax the "forbid the use of 16-bit segments in
64-bit mode" part, since there may be people who still run and depend on
16-bit Windows binaries under Wine.

But I'm taking this in the current unconditional form for now to see who
(if anybody) screams bloody murder.  Maybe nobody cares.  And maybe
we'll have to update it with some kind of runtime enablement (like our
vm.mmap_min_addr tunable that people who run dosemu/qemu/wine already
need to tweak).

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
  efi: Pass correct file handle to efi_file_{read,close}
  x86/efi: Correct EFI boot stub use of code32_start
  x86/efi: Fix boot failure with EFI stub
  x86/platform/hyperv: Handle VMBUS driver being a module
  x86/apic: Reinstate error IRQ Pentium erratum 3AP workaround
  x86, CMCI: Add proper detection of end of CMCI storms
2014-04-11 11:58:33 -07:00
Chen, Gong
27f6c573e0 x86, CMCI: Add proper detection of end of CMCI storms
When CMCI storm persists for a long time(at least beyond predefined
threshold. It's 30 seconds for now), we can watch CMCI storm is
detected immediately after it subsides.

...
Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode
Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode
Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode
Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode
...

The problem is that our logic that determines that the storm has
ended is incorrect. We announce the end, re-enable interrupts and
realize that the storm is still going on, so we switch back to
polling mode. Rinse, repeat.

When a storm happens we disable signaling of errors via CMCI and begin
polling machine check banks instead. If we find any logged errors,
then we need to set a per-cpu flag so that our per-cpu tests that
check whether the storm is ongoing will see that errors are still
being logged independently of whether mce_notify_irq() says that the
error has been fully processed.

cmci_clear() is not the right tool to disable a bank. It disables the
interrupt for the bank as desired, but it also clears the bit for
this bank in "mce_banks_owned" so we will skip the bank when polling
(so we fail to see that the storm continues because we stop looking).
New cmci_storm_disable_banks() just disables the interrupt while
allowing polling to continue.

Reported-by: William Dauchy <wdauchy@gmail.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-03-28 13:40:16 -07:00
Srivatsa S. Bhat
82a8f131aa x86, mce: Fix CPU hotplug callback registration
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

	cpu_notifier_register_begin();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* Note the use of the double underscored version of the API */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_notifier_register_done();

Fix the mce code in x86 by using this latter form of callback registration.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 13:43:42 +01:00
Borislav Petkov
4f75d84127 x86, mce: Fix mce_start_timer semantics
So mce_start_timer() has a 'cpu' argument which is supposed to mean to
start a timer on that cpu. However, the code currently starts a timer on
the *current* cpu the function runs on and causes the sanity-check in
mce_timer_fn to fire:

	WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn

because it is running on the wrong cpu.

This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining
all the cpus in succession.

Then, we were fiddling with the CMCI storm settings when starting the
timer whereas there's no need for that - if there's storm happening
on this newly restarted cpu, we're going to be in normal CMCI mode
initially and then when the CMCI interrupt starts firing, we're going to
go to the polling mode with the timer real soon.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
2014-01-12 15:22:25 +01:00
Levente Kurusa
853d9b18f1 x86, mce: Call put_device on device_register failure
This patch adds a call to put_device() when the device_register() call
has failed. This is required so that the last reference to the device is
given up.

Signed-off-by: Levente Kurusa <levex@linux.com>
Link: http://lkml.kernel.org/r/5298F900.9000208@linux.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-30 12:14:51 +01:00
Ingo Molnar
0237d7f355 Merge branch 'x86/mce' into x86/ras
Pursue a single RAS/MCE topic branch on x86.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12 17:54:05 +02:00
Paul Gortmaker
148f9bb877 x86: delete __cpuinit usage from all x86 files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit  -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings.  In any case, they are temporary and harmless.

This removes all the arch/x86 uses of the __cpuinit macros from
all C files.  x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:56 -04:00
Naveen N. Rao
c3d1fb567a mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
The Corrected Machine Check structure (CMC) in HEST has a flag which can be
set by the firmware to indicate to the OS that it prefers to process the
corrected error events first. In this scenario, the OS is expected to not
monitor for corrected errors (through CMCI/polling). Instead, the firmware
notifies the OS on corrected error events through GHES.

Linux already has support for GHES. This patch adds support for parsing CMC
structure and to disable CMCI/polling if the firmware first flag is set.

Further, the list of machine check bank structures at the end of CMC is used
to determine which MCA banks function in FF mode, so that we continue to
monitor error events on the other banks.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-08 11:53:01 -07:00
Naveen N. Rao
0644414e62 mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
per-cpu bitmaps.  Provide comments so that we all know exactly what these
are used for, and why.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-25 13:53:27 -07:00
Srivatsa S. Bhat
7a0c819d28 x86/mce: Rework cmci_rediscover() to play well with CPU hotplug
Dave Jones reports that offlining a CPU leads to this trace:

numa_remove_cpu cpu 1 node 0: mask now 0,2-3
smpboot: CPU 1 is now offline
BUG: using smp_processor_id() in preemptible [00000000] code:
cpu-offline.sh/10591
caller is cmci_rediscover+0x6a/0xe0
Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
Call Trace:
 [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100
 [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0
 [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae
 [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150
 [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10
 [<ffffffff8104c2e3>] cpu_notify+0x23/0x50
 [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20
 [<ffffffff815ef082>] _cpu_down+0x302/0x350
 [<ffffffff815ef106>] cpu_down+0x36/0x50
 [<ffffffff815f1c9d>] store_online+0x8d/0xd0
 [<ffffffff813edc48>] dev_attr_store+0x18/0x30
 [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150
 [<ffffffff811adfb2>] vfs_write+0xa2/0x170
 [<ffffffff811ae16c>] sys_write+0x4c/0xa0
 [<ffffffff81613019>] system_call_fastpath+0x16/0x1b

However, a look at cmci_rediscover shows that it can be simplified quite
a bit, apart from solving the above issue. It invokes functions that
take spin locks with interrupts disabled, and hence it can run in atomic
context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
is already dead and out of the cpu_online_mask. So take these points into
account and simplify the code, and thereby also fix the above issue.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-04-02 14:04:01 -07:00
Linus Torvalds
9043a2650c The sweeping change is to make add_taint() explicitly indicate whether to disable
lockdep, but it's a mechanical change.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRJAcuAAoJENkgDmzRrbjxsw0P/3eXb+LddYnx0V0uHYdKpCUf
 4vdW7X0fX3Z+aUK69IWRL/6ahoO4TpaHYGHBDjEoivyQ0GDq14X7JNWsYYt3LdMf
 3wmDgRc2cn/mZOJbFeVpNV8ox5l/xc0CUvV+iQ8tMjfQItXMXgWUFZKMECsXKSO6
 eex3lrw9M2jAX2uL8LQPp9W8xtKu24nSZRC6tH5riE/8fCzi1cZPPAqfxP5c8Lee
 ZXtbCRSyAFENZLpKyMe1PC7HvtJyi5NDn9xwOQiXULZV/VOlvP94DGBLIKCM/6dn
 4QvZxpG0P0uOlpCgRAVLyh/z7g4XY4VF/fHopLCmEcqLsvgD+V2LQpQ9zWUalLPC
 Z+pUpz2vu0gIddPU1nR8R6oGpEdJ8O12aJle62p/RSXWZGx12qUQ+Tamu0tgKcv1
 AsiJfbUGNDYfxgU6sHsoQjl2f68LTVckCU1C1LqEbW/S104EIORtGx30CHM4LRiO
 32kDC5TtgYDBKQAIqJ4bL48ZMh+9W3uX40p7xzOI5khHQjvswUKa3jcxupU0C1uv
 lx8KXo7pn8WT33QGysWC782wJCgJuzSc2vRn+KQoqoynuHGM6agaEtR59gil3QWO
 rQEcxH63BBRDgHlg4FM9IkJwwsnC3PWKL8gbX0uAWXAPMbgapJkuuGZAwt0WDGVK
 +GszxsFkCjlW0mK0egTb
 =tiSY
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module update from Rusty Russell:
 "The sweeping change is to make add_taint() explicitly indicate whether
  to disable lockdep, but it's a mechanical change."

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  MODSIGN: Add option to not sign modules during modules_install
  MODSIGN: Add -s <signature> option to sign-file
  MODSIGN: Specify the hash algorithm on sign-file command line
  MODSIGN: Simplify Makefile with a Kconfig helper
  module: clean up load_module a little more.
  modpost: Ignore ARC specific non-alloc sections
  module: constify within_module_*
  taint: add explicit flag to show whether lock dep is still OK.
  module: printk message when module signature fail taints kernel.
2013-02-25 15:41:43 -08:00
Rusty Russell
373d4d0997 taint: add explicit flag to show whether lock dep is still OK.
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-21 17:17:57 +10:30
Tejun Heo
4d899be584 x86/mce: don't use [delayed_]work_pending()
There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from x86/mce.  Only compile tested.

v2: Local var work removed from mce_schedule_work() as suggested by
    Borislav.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
2012-12-28 13:40:16 -08:00
Borislav Petkov
1462594bf2 x86, MCA: Finish mca_config conversion
mce_ser, mce_bios_cmci_threshold and mce_disabled are the last three
bools which need conversion. Move them to the mca_config struct and
adjust usage sites accordingly.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
2012-10-26 14:37:58 +02:00
Borislav Petkov
7af19e4afd x86, MCA: Convert the next three variables batch
Move them into the mca_config struct and adjust code touching them
accordingly.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
2012-10-26 14:37:57 +02:00
Borislav Petkov
84c2559dee x86, MCA: Convert rip_msr, mce_bootlog, monarch_timeout
Move above configuration variables into struct mca_config and adjust
usage places accordingly.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
2012-10-26 14:37:57 +02:00
Borislav Petkov
d203f0b824 x86, MCA: Convert dont_log_ce, banks and tolerant
Move those MCA configuration variables into struct mca_config and adjust
the places they're used accordingly.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
2012-10-26 14:37:56 +02:00
Borislav Petkov
5bc66170dc x86, MCE: Remove bios_cmci_threshold sysfs attribute
450cc20103 ("x86/mce: Provide boot argument to honour bios-set CMCI
threshold") added the bios_cmci_threshold sysfs attribute which was
supposed to communicate to userspace tools that BIOS CMCI threshold has
been honoured.

However, this info is not of any importance to userspace - it should
rather get the actual error count it has been thresholded already from
MCi_STATUS[38:52].

So drop this before it becomes a used interface (good thing we caught
this early in 3.7-rc1, right after the merge window closed).

Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20121017105940.GA14590@x1.osrc.amd.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-10-19 15:22:29 +02:00
Naveen N. Rao
450cc20103 x86/mce: Provide boot argument to honour bios-set CMCI threshold
The ACPI spec doesn't provide for a way for the bios to pass down
recommended thresholds to the OS on a _per-bank_ basis. This patch adds
a new boot option, which if passed, tells Linux to use CMCI thresholds
set by the bios.

As fail-safe, we initialize threshold to 1 if some banks have not been
initialized by the bios and warn the user.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-09-27 10:08:00 -07:00
Ingo Molnar
f1f6524476 Linux 3.6-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJQVkutAAoJEHm+PkMAQRiGW8sH/36FVQ3zI75QH16AmR++2nMZ
 BRJGoxcRFMssrXTYVdkMyzygf8b7MZbNEn1qt2g63MNzGaJucPlw5NVL4GLzR+zr
 x/EglLrTEPCD5el9wJ3ls9iC1soudKQTvC2BjcdUjpoSwHrDM/7GKfbOacE54Wqc
 C1VHCcg5DWOD7F0RnYT2SQEVCeDODNmcyFdk7Oi4cUicTPJoYWJ9O9MGfBDBok0N
 M+dXxa9nvsl7EeEKpBKH9vo4TfXn3Gsj6LCRdedvI15ilZjfo8jdHYbSn7KBfQuZ
 JIKRnqkaQ1JfMFt+M/JJZ1b/+Wrd4HLMmmn5oUmrGGIvhpi32nJfi/97+nSy8iU=
 =c5gW
 -----END PGP SIGNATURE-----

Merge tag 'v3.6-rc6' into x86/mce

Merge Linux v3.6-rc6, to refresh this tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-19 17:01:25 +02:00
Chen Gong
55babd8f41 x86/mce: Add CMCI poll mode
On Intel systems corrected machine check interrupts (CMCI) may be sent to
multiple logical processors; possibly to all processors on the affected
socket (SDM Volume 3B "15.5.1 CMCI Local APIC Interface").  This means
that a persistent error (such as a stuck bit in ECC memory) may cause
a storm of interrupts that greatly hinders or prevents forward progress
(probably on many processors).

To solve this we keep track of the rate at which each processor sees
CMCI. If we exceed a threshold, we disable CMCI delivery and switch to
polling the machine check banks. If the storm subsides (none of the
affected processors see any more errors for a complete poll interval) we
re-enable CMCI.

[Tony: Added console messages when storm begins/ends and increased storm
threshold from 5 to 15 so we have a few more logged entries before we
disable interrupts and start dropping reports]

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-08-09 11:44:51 -07:00
Thomas Gleixner
1a65f970d1 x86: mce: Remove the frozen cases in the hotplug code
No point in having double cases if we can simply mask the FROZEN bit
out.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-08-03 11:46:50 -07:00