linux-uconsole/arch/x86/kernel/cpu/mcheck
Fenghua Yu 85f07ccc53 x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog
commit 29e9bf1841 upstream.

Thermal throttle and power limit events are not defined as MCE errors in x86
architecture and should not generate MCE errors in mcelog.

Current kernel generates fake software defined MCE errors for these events.
This may confuse users because they may think the machine has real MCE errors
while actually only thermal throttle or power limit events happen.

To make it worse, buggy firmware on some platforms may falsely generate
the events. Therefore, kernel reports MCE errors which users think as real
hardware errors. Although the firmware bugs should be fixed, on the other hand,
kernel should not report MCE errors either.

So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count,
package_power_limit_count, core_throttle_count, and package_throttle_count) in
/sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters
for each event on each CPU. Please note that all CPU's on one package report
duplicate counters. It's user application's responsibity to retrieve a package
level counter for one package.

This patch doesn't report package level power limit, core level power limit, and
package level thermal throttle events in mcelog. When the events happen, only
report them in respective counters in sysfs.

Since core level thermal throttle has been legacy code in kernel for a while and
users accepted it as MCE error in mcelog, core level thermal throttle is still
reported in mcelog. In the mean time, the event is counted in a counter in sysfs
as well.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Borislav Petkov <bp@amd64.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03 12:59:16 -08:00
..
Makefile ACPI, APEI, Generic Hardware Error Source memory error support 2010-05-19 22:41:16 -04:00
mce-apei.c ACPI, APEI, Add ERST record ID cache 2011-03-21 22:59:06 -04:00
mce-inject.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
mce-internal.h ACPI, APEI, Use ERST for persistent storage of MCE 2010-05-19 22:41:40 -04:00
mce-severity.c x86/mce: Fix check for processor context when machine check was taken. 2012-06-01 15:13:00 +08:00
mce.c MCE: Fix vm86 handling for 32bit mce handler 2012-10-02 09:47:55 -07:00
mce_amd.c x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86' 2012-08-15 12:04:09 -07:00
mce_intel.c x86: Replace uses of current_cpu_data with this_cpu ops 2010-12-30 12:22:03 +01:00
p5.c x86, mce: make mce_disabled boolean 2009-06-16 16:56:07 -07:00
therm_throt.c x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog 2012-12-03 12:59:16 -08:00
threshold.c x86, mce: enable MCE_INTEL for 32bit new MCE 2009-05-28 09:24:13 -07:00
winchip.c x86, mce: unify mce.h 2009-06-16 16:56:07 -07:00