Commit graph

599141 commits

Author SHA1 Message Date
Ingo Molnar
5319e5ad0c Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 10:14:45 +02:00
Andrey Ryabinin
6d6f2833bf perf/x86: Fix undefined shift on 32-bit kernels
Jim reported:

	UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12
	shift exponent 35 is too large for 32-bit type 'long unsigned int'

The use of 'unsigned long' type obviously is not correct here, make it
'unsigned long long' instead.

Reported-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Imre Palik <imrep@amazon.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 2c33645d36 ("perf/x86: Honor the architectural performance monitoring version")
Link: http://lkml.kernel.org/r/1462974711-10037-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 10:14:31 +02:00
Peter Zijlstra
3c3116b745 perf/x86/msr: Fix SMI overflow
We compute 'delta' and properly sign extend it and then ignore it and
recompute the raw value, loosing the sign extention.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Cc: linux-kernel@vger.kernel.org
Cc: luto@kernel.org
Cc: ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 10:14:30 +02:00
hchrzani
ec336c879c perf/x86/intel/uncore: Fix CHA registers configuration procedure for Knights Landing platform
CHA events in Knights Landing platform require programming filter registers properly.
Remote node, local node and NonNearMemCachable bits should be set to 1 at all times.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Lawrence F Meadows <lawrence.f.meadows@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bp@suse.de
Cc: harish.chegondi@intel.com
Cc: hpa@zytor.com
Cc: izumi.taku@jp.fujitsu.com
Cc: kan.liang@intel.com
Cc: lukasz.anaczkowski@intel.com
Cc: vthakkar1994@gmail.com
Fixes: 77af0037de ('perf/x86/intel/uncore: Add Knights Landing uncore PMU support')
Link: http://lkml.kernel.org/r/1462779419-17115-2-git-send-email-hubert.chrzaniuk@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 10:14:30 +02:00
Thomas Gleixner
50605ffbda sched/core: Provide a tsk_nr_cpus_allowed() helper
tsk_nr_cpus_allowed() is an accessor for task->nr_cpus_allowed which allows
us to change the representation of ->nr_cpus_allowed if required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1462969411-17735-2-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:36 +02:00
Thomas Gleixner
ade42e092b sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed
Use the future-safe accessor for struct task_struct's.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1462969411-17735-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:35 +02:00
Vik Heyndrickx
20878232c5 sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with no
processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
(multiplied by number of cores).

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

By changing the way the internally kept value is rounded, that internal
value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
increasing load, the internally kept load value is rounded up, when the
load is decreasing, the load value is rounded down.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
tested on single, dual, and octal cores system. It was tested on virtual
hosts and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.

Tested-by: Damien Wyart <damien.wyart@free.fr>
Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Doug Smythies <dsmythies@telus.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0f004f5a69 ("sched: Cure more NO_HZ load average woes")
Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:34 +02:00
Morten Rasmussen
cfa1033431 sched/fair: Correct unit of load_above_capacity
In calculate_imbalance() load_above_capacity currently has the unit
[capacity] while it is used as being [load/capacity]. Not only is it
wrong it also makes it unlikely that load_above_capacity is ever used
as the subsequent code picks the smaller of load_above_capacity and
the avg_load

This patch ensures that load_above_capacity has the right unit
[load/capacity].

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
[ Changed changelog to note it was in capacity unit; +rebase. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggemann@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:33 +02:00
Peter Zijlstra
1be0eb2a97 sched/fair: Clean up scale confusion
Wanpeng noted that the scale_load_down() in calculate_imbalance() was
weird. I agree, it should be SCHED_CAPACITY_SCALE, since we're going
to compare against busiest->group_capacity, which is in [capacity]
units.

Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:33 +02:00
Wanpeng Li
444969223c sched/nohz: Fix affine unpinned timers mess
The following commit:

  9642d18eee ("nohz: Affine unpinned timers to housekeepers")'

intended to affine unpinned timers to housekeepers:

  unpinned timers(full dynaticks, idle)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(full dynaticks, busy)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(houserkeepers, idle)    =>   nearest busy housekeepers(otherwise, fallback to itself)

However, the !idle_cpu(i) && is_housekeeping_cpu(cpu) check modified the
intention to:

  unpinned timers(full dynaticks, idle)   =>   any housekeepers(no mattter cpu topology)
  unpinned timers(full dynaticks, busy)   =>   any housekeepers(no mattter cpu topology)
  unpinned timers(housekeepers, idle)     =>   any busy cpus(otherwise, fallback to any housekeepers)

This patch fixes it by checking if there are busy housekeepers nearby,
otherwise falls to any housekeepers/itself. After the patch:

  unpinned timers(full dynaticks, idle)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(full dynaticks, busy)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(housekeepers, idle)     =>   nearest busy housekeepers(otherwise, fallback to itself)

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Fixed the changelog. ]
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 'commit 9642d18eee ("nohz: Affine unpinned timers to housekeepers")'
Link: http://lkml.kernel.org/r/1462344334-8303-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:32 +02:00
Peter Zijlstra
2f950354e6 sched/fair: Fix fairness issue on migration
Pavan reported that in the presence of very light tasks (or cgroups)
the placement of migrated tasks can cause severe fairness issues.

The problem is that enqueue_entity() places the task before it updates
time, thereby it can place the task far in the past (remember that
light tasks will shoot virtual time forward at a high speed, so in
relation to the pre-existing light task, we can land far in the past).

This is done because update_curr() needs the current task, and we
might be placing the current task.

The obvious solution is to differentiate between the current and any
other task; placing the current before we update time, and placing any
other task after, such that !curr tasks end up at the current moment
in time, and not in the past.

This commit re-introduces the previously reverted commit:

  3a47d5124a ("sched/fair: Fix fairness issue on migration")

... which is now safe to do, after we've also fixed another
underlying bug first, in:

  sched/fair: Prepare to fix fairness problems on migration

and cleaned up other details in the migration code:

  sched/core: Kill sched_class::task_waking

Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:32 +02:00
Peter Zijlstra
59efa0bac9 sched/core: Kill sched_class::task_waking to clean up the migration logic
With sched_class::task_waking being called only when we do
set_task_cpu(), we can make sched_class::migrate_task_rq() do the work
and eliminate sched_class::task_waking entirely.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:31 +02:00
Peter Zijlstra
b5179ac70d sched/fair: Prepare to fix fairness problems on migration
Mike reported that our recent attempt to fix migration problems:

  3a47d5124a ("sched/fair: Fix fairness issue on migration")

broke interactivity and the signal starve test. We reverted that
commit and now let's try it again more carefully, with some other
underlying problems fixed first.

One problem is that I assumed ENQUEUE_WAKING was only set when we do a
cross-cpu wakeup (migration), which isn't true. This means we now
destroy the vruntime history of tasks and wakeup-preemption suffers.

Cure this by making my assumption true, only call
sched_class::task_waking() when we do a cross-cpu wakeup. This avoids
the indirect call in the case we do a local wakeup.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: linux-kernel@vger.kernel.org
Fixes: 3a47d5124a ("sched/fair: Fix fairness issue on migration")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:31 +02:00
Peter Zijlstra
c58d25f371 sched/fair: Move record_wakee()
Since I want to make ->task_woken() conditional on the task getting
migrated, we cannot use it to call record_wakee().

Move it to select_task_rq_fair(), which gets called in almost all the
same conditions. The only exception is if the woken task (@p) is
CPU-bound (as per the nr_cpus_allowed test in select_task_rq()).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:30 +02:00
Ingo Molnar
4eb8676517 Merge branch 'smp/hotplug' into sched/core, to resolve conflicts
Conflicts:
	kernel/sched/core.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:51:36 +02:00
Ingo Molnar
eb60b3e5e8 Merge branch 'sched/urgent' into sched/core to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:18:13 +02:00
Yazen Ghannam
754a923059 x86/RAS: Add SMCA support to AMD Error Injector
Use SMCA MSRs when writing to MCA_{STATUS,ADDR,MISC} and
MCA_DE{STAT,ADDR} when injecting Deferred Errors on SMCA platforms.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:23 +02:00
Yazen Ghannam
a348ed83d9 EDAC, mce_amd: Detect SMCA using X86_FEATURE_SMCA
Use X86_FEATURE_SMCA when detecting if SMCA is available instead of
directly using CPUID 0x80000007_EBX.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:23 +02:00
Yazen Ghannam
14cddfd530 x86/mce: Update AMD mcheck init to use cpu_has() facilities
Use cpu_has() facilities to find available RAS features rather than
directly reading CPUID 0x80000007_EBX.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Use the struct cpuinfo_x86 ptr instead. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:22 +02:00
Yazen Ghannam
71faad4306 x86/cpu: Add detection of AMD RAS Capabilities
Add a new CPUID leaf to hold the contents of CPUID 0x80000007_EBX (RasCap).

Define bits that are currently in use:

 Bit 0: McaOverflowRecov
 Bit 1: SUCCOR
 Bit 3: ScalableMca

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Shorten comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:22 +02:00
Borislav Petkov
e128b4f483 x86/mce/AMD: Save an indentation level in prepare_threshold_block()
Do the !SMCA work first and then save us an indentation level for the
SMCA code.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:21 +02:00
Yazen Ghannam
32544f0603 x86/mce/AMD: Disable LogDeferredInMcaStat for SMCA systems
Disable Deferred Error logging in MCA_{STATUS,ADDR} additionally for
SMCA systems as this information will retrieved from MCA_DE{STAT,ADDR}
on those systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Simplify, drop SMCA_MCAX_EN_OFF define too. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:20 +02:00
Yazen Ghannam
3410200958 x86/mce/AMD: Log Deferred Errors using SMCA MCA_DE{STAT,ADDR} registers
Scalable MCA provides new registers for all banks for logging deferred
errors: MCA_DESTAT and MCA_DEADDR. Deferred errors are always logged to
these registers.

Update the AMD deferred error handler to use these registers, if
available.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Sanity-check __log_error() args, massage a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:19 +02:00
Ingo Molnar
9ab975a029 perf/core improvements and fixes:
User visible:
 
 - Fix symbol insertion and callchain behavior in db-export (Chris Phlipot)
 
 Infrastructure:
 
 - Add libunwind build test (feature query), working towards supporting
   cross-platform DWARF callchains, starting with arm/arm64 (He Kuang)
 
 - Use lsdir() more extensively (Masami Hiramatsu)
 
 - Use SBUILD_ID_SIZE in places where the equivalent expression was
   being used (Masami Hiramatsu)
 
 - Split some more 'perf trace' syscall arg beautifiers (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXM1oVAAoJENZQFvNTUqpAHpkP/1k2Ciin/zvyH2fVAO+R3dsW
 yPwdZ6FI+O7gS/4X2ZM5qzbbwb4jNq3obJZrRL3xyY4/eoo4H2vdubwpCCJJMuSe
 NRLqmf2rU7W6fuVjdZAobnOYuUNzZrGwAcInrPhZ/YQTHvsvjDX+xjBFB3nbSClM
 m7a9UbOPw38p5irfdTZFARThovYYoUWHHECly4/QKRFB8v0aMOHpj8ai+aD03+3U
 i737l+MAKD9p0toz20RcStq0Wt0/2OsLlaQZ7D9bu278LZ0GccVQsug8J1cO9J+v
 0ZCbirA0FMdeQfyjsWrogQf7kTFBG3SI/6KXo4wcBqxsE4HaX340kPuuGubQQJFA
 1a7KXbyIry5qjZXXp2gFkiu/w3xiG7REmkRqRvXJ4+jeOiHnZvXwSXG+ciaC9gnP
 2yCWbsNkTN2CSkSqsgKxwN1Qjofffb8/FBQlLh7VkPVvIib6N9a8Ei8Vwz2CYezN
 fV2pL9nx57X3TX4uIOTqhwN+hau4vtwz0n59U+aaBGToAFeteMQhFvE1/+6ose5V
 dyqhwh8IhG8qjNBf6TOnXWBo2xKw9EVdQ1IXiHXplEcNrpH6DwhdKAhgxCo60dmQ
 1zNHE4g+iZOJcQ4mzfjD5KoiYi9OEz3tFckFS9wGKTvDso5DhIt/CV2tyAnAA0iG
 ab/j/kPg5FxPUP3GAtoG
 =9OqC
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-20160511' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

- Fix symbol insertion and callchain behavior in db-export (Chris Phlipot)

Infrastructure changes:

- Add libunwind build test (feature query), working towards supporting
  cross-platform DWARF callchains, starting with arm/arm64 (He Kuang)

- Use lsdir() more extensively (Masami Hiramatsu)

- Use SBUILD_ID_SIZE in places where the equivalent expression was
  being used (Masami Hiramatsu)

- Split some more 'perf trace' syscall arg beautifiers (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 08:57:52 +02:00
Paul Mackerras
b1a4286b8f KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts
Commit c9a5eccac1 ("kvm/eventfd: add arch-specific set_irq",
2015-10-16) added the possibility for architecture-specific code
to handle the generation of virtual interrupts in atomic context
where possible, without having to schedule a work function.

Since we can easily generate virtual interrupts on XICS without
having to do anything worse than take a spinlock, we define a
kvm_arch_set_irq_inatomic() for XICS.  We also remove kvm_set_msi()
since it is not used any more.

The one slightly tricky thing is that with the new interface, we
don't get told whether the interrupt is an MSI (or other edge
sensitive interrupt) vs. level-sensitive.  The difference as far
as interrupt generation is concerned is that for LSIs we have to
set the asserted flag so it will continue to fire until it is
explicitly cleared.

In fact the XICS code gets told which interrupts are LSIs by userspace
when it configures the interrupt via the KVM_DEV_XICS_GRP_SOURCES
attribute group on the XICS device.  To store this information, we add
a new "lsi" field to struct ics_irq_state.  With that we can also do a
better job of returning accurate values when reading the attribute
group.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-05-12 16:40:55 +10:00
Kedareswara rao Appana
07b0e7d49c dmaengine: vdma: Add Support for Xilinx AXI Central Direct Memory Access Engine
This patch adds support for the AXI Central Direct Memory Access
(AXI CDMA) core to the existing vdma driver, AXI CDMA is a
soft Xilinx IP core that provides high-bandwidth
Direct Memory Access(DMA) between a memory-mapped
source address and a memory-mapped destination address.

Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-05-12 11:58:58 +05:30
Kedareswara rao Appana
3843dc282e Documentation: DT: vdma: update binding doc for AXI CDMA
This patch updates the device-tree binding doc for
adding support for AXI CDMA.

Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-05-12 11:58:41 +05:30
Kedareswara rao Appana
c0bba3a99f dmaengine: vdma: Add Support for Xilinx AXI Direct Memory Access Engine
This patch adds support for the AXI Direct Memory Access (AXI DMA)
core in the existing vdma driver, AXI DMA Core is a
soft Xilinx IP core that provides high-bandwidth
direct memory access between memory and AXI4-Stream
type target peripherals.

Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-05-12 11:58:30 +05:30
Kedareswara rao Appana
7e4cda70c0 Documentation: DT: vdma: update binding doc for AXI DMA
This patch updates the device-tree binding doc for
adding support for AXI DMA.
Also this patch differentiates required properties b/w
DMA and VDMA.

Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-05-12 11:58:30 +05:30
Kedareswara rao Appana
42c1a2ede4 dmaengine: vdma: Rename xilinx_vdma_ prefix to xilinx_dma
This patch renames the xilinx_vdma_ prefix to xilinx_dma
for the API's and masks that will be shared b/w three DMA
IP cores.

Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-05-12 11:58:30 +05:30
Andy Shevchenko
dd4e91d538 dmaengine: slave means at least one of DMA_SLAVE, DMA_CYCLIC
When check for capabilities recognize slave support by either DMA_SLAVE or
DMA_CYCLIC bit set. If we don't do that the user can't get a normally worked
DMA support for engines that doesn't have one of the mentioned bits set.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-05-12 11:14:56 +05:30
David S. Miller
488992916c Merge branch 'qed-sriov'
Yuval Mintz says:

====================
qed*: Add SR-IOV support

This patch adds SR-IOV support to qed/qede drivers, adding a new PCI
device ID for a VF that is shared between all the various PFs that
support IOV.

This is quite a massive series - the first 7 parts of the series add
the infrastructure of supporting vfs in qed - mainly adding support in a
HW-based vf<->pf channel, as well as diverging all existing configuration
flows based on the pf/vf decision. I.e., while PF-originated requests
head directly to HW/FW, the VF requests first have to traverse to the PF
which will perform the configuration.

The 8th patch is the one that adds the support for the VF device in qede.

The remaining 6 patches each adds some user-based API support related to
VFs that can be used over the PF - forcing mac/vlan, changing speed, etc.

Dave,

Sorry in advance for the length of the series. Most of the bulk here is in
the infrastructure patches that have to go together [or at least, it makes
little sense to try splitting them up].

Please consider applying this to `net-next'.

Thanks,
Yuval

Changes from previous revision:
------------------------------
 - V2 - Replace aligned_u64 with regular u64; This was possible as the
        shared structures [between PF and VF] were already sufficiently
        padded as-is in the API, making this redundant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:08 -04:00
Yuval Mintz
831bfb0e88 qed*: Tx-switching configuration
Device should be configured by default to VEB once VFs are active.
This changes the configuration of both PFs' and VFs' vports into enabling
tx-switching once sriov is enabled.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:08 -04:00
Yuval Mintz
73390ac9d8 qed*: support ndo_get_vf_config
Allows the user to view the VF configuration by observing the PF's
device.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:08 -04:00
Yuval Mintz
6ddc760825 qed*: IOV support spoof-checking
Add support in `ndo_set_vf_spoofchk' for allowing PF control over
its VF spoof-checking configuration.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:08 -04:00
Yuval Mintz
733def6a04 qed*: IOV link control
This adds support in 2 ndo that allow PF to tweak the VF's view of the
link - `ndo_set_vf_link_state' to allow it a view independent of the PF's,
and `ndo_set_vf_rate' which would allow the PF to limit the VF speed.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:08 -04:00
Yuval Mintz
eff169608c qed*: Support forced MAC
Allows the PF to enforce the VF's mac.
i.e., by using `ip link ... vf <x> mac <value>'.

While a MAC is forced, PF would prevent the VF from configuring any other
MAC.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:08 -04:00
Yuval Mintz
08feecd7fc qed*: Support PVID configuration
This adds support for PF control over the VF vlan configuration.
I.e., `ip link ... vf <x> vlan <vid>' should now be supported.

 1. <vid> != 0 => VF receives [unknowingly] only traffic tagged by
    <vid> and tags all outgoing traffic sent by VF with <vid>.
 2. <vid> == 0 ==> Remove the pvid configuration, reverting to previous.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:07 -04:00
Yuval Mintz
fefb0202cc qede: Add VF support
Adding a PCI callback for `sriov_configure' and a new PCI device id for
the VF [+ Some minor changes to accomodate differences between PF and VF
at the qede].
Following this, VF creation should be possible and the entire subset of
existing PF functionality that's allow to VFs should be supported.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:07 -04:00
Yuval Mintz
17b235c145 qed: Align TLVs
As the VF infrastructure is supposed to offer backward/forward
compatibility, the various types associated with VF<->PF communication
should be aligned across all various platforms that support IOV
on our family of adapters.

This adds a couple of currently missing values, specifically aligning
the enum for the various TLVs possible in the communication between them.

It then adds the PF implementation for some of those missing VF requests.
This support isn't really necessary for the Linux VF as those VFs aren't
requiring it [at least today], but are required by VFs running on other
OSes. LRO is an example of one such configuration.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:07 -04:00
Yuval Mintz
36558c3d77 qed: Bulletin and Link
Up to this point, VF and PF communication always originates from VF.
As a result, VF cannot be notified of any async changes, and specifically
cannot be informed of the current link state.

This introduces the bulletin board, the mechanism through which the PF
is going to communicate async notifications back to the VF. basically,
it's a well-defined structure agreed by both PF and VF which the VF would
continuously poll and into which the PF would DMA messages when needed.
[Bulletin board is actually allocated and communicated in previous patches
but never before used]

Based on the bulletin infrastructure, the VF can query its link status
and receive said async carrier changes.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:07 -04:00
Yuval Mintz
dacd88d6f6 qed: IOV l2 functionality
This adds sufficient changes to allow VFs l2-configuration flows to work.

While the fastpath of the VF and the PF are meant to be exactly the same,
the configuration of the VF is done by the PF.
This diverges all VF-related configuration flows that originate from a VF,
making them pass through the VF->PF channel and adding sufficient logic
on the PF side to support them.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:07 -04:00
Yuval Mintz
0b55e27d56 qed: IOV configure and FLR
While previous patches have already added the necessary logic to probe
VFs as well as enabling them in the HW, this patch adds the ability to
support VF FLR & SRIOV disable.

It then wraps both flows together into the first IOV callback to be
provided to the protocol driver - `configure'. This would later to be used
to enable and disable SRIOV in the adapter.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:07 -04:00
Yuval Mintz
1408cc1fa4 qed: Introduce VFs
This adds the qed VFs for the first time -
The vfs are limited functions, with a very different PCI bar structure
[when compared with PFs] to better impose the related security demands
associated with them.

This patch includes the logic neccesary to allow VFs to successfully probe
[without actually adding the ability to enable iov].
This includes diverging all the flows that would occur as part of the pci
probe of the driver, preventing VF from accessing registers/memories it
can't and instead utilize the VF->PF channel to query the PF for needed
information.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:07 -04:00
Yuval Mintz
37bff2b9c6 qed: Add VF->PF channel infrastructure
Communication between VF and PF is based on a dedicated HW channel;
VF will prepare a messge, and by signaling the HW the PF would get a
notification of that message existance. The PF would then copy the
message, process it and DMA an answer back to the VF as a response.

The messages themselves are TLV-based - allowing easier backward/forward
compatibility.

This patch adds the infrastructure of the channel on the PF side -
starting with the arrival of the notification and ending with DMAing
the response back to the VF.

It also adds a dummy-response as reference, as it only lays the
groundwork of the communication; it doesn't really add support of any
actual messages.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:07 -04:00
Yuval Mintz
32a47e72c9 qed: Add CONFIG_QED_SRIOV
Add support for a new Kconfig option for qed* driver which would allow
[eventually] the support in VFs.

This patch adds the necessary logic in the PF to learn about the possible
VFs it will have to support [Based on PCI configuration space and HW],
and prepare a database with an entry per-VF as infrastructure for future
interaction with said VFs.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12 00:04:06 -04:00
David S. Miller
1b7cc307a8 Merge branch 'bnxt_en-fixes'
Michael Chan says:

====================
bnxt_en: Add workaround to detect bad opaque in rx completion.

2-part workaround for this hardware bug.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11 23:46:09 -04:00
Michael Chan
fa7e28127a bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)
Add detection and recovery code when the hardware returned opaque value
does not match the expected consumer index.  Once the issue is detected,
we skip the processing of all RX and LRO/GRO packets.  These completion
entries are discarded without sending the SKB to the stack and without
producing new buffers.  The function will be reset from a workqueue.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11 23:46:09 -04:00
Michael Chan
376a5b8647 bnxt_en: Add workaround to detect bad opaque in rx completion (part 1)
There is a rare hardware bug that can cause a bad opaque value in the RX
or TPA completion.  When this happens, the hardware may have used the
same buffer twice for 2 rx packets.  In addition, the driver will also
crash later using the bad opaque as the index into the ring.

The rx opaque value is predictable and is always monotonically increasing.
The workaround is to keep track of the expected next opaque value and
compare it with the one returned by hardware during RX and TPA start
completions.  If they miscompare, we will not process any more RX and
TPA completions and exit NAPI.  We will then schedule a workqueue to
reset the function.

This patch adds the logic to keep track of the next rx consumer index.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11 23:46:09 -04:00
Dan Carpenter
5f46feab87 qlcnic: potential NULL dereference in qlcnic_83xx_get_minidump_template()
If qlcnic_fw_cmd_get_minidump_temp() fails then "fw_dump->tmpl_hdr" is
NULL or possibly freed.  It can lead to an oops later.

Fixes: d01a6d3c8a ('qlcnic: Add support to enable capability to extend minidump for iSCSI')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11 23:44:56 -04:00