Commit graph

10573 commits

Author SHA1 Message Date
Thomas Gleixner
12e189d3cf x86/x2apic: Add proper state tracking
Having 3 different variables to track the state is just silly and
error prone. Add a proper state tracking variable which covers the
three possible states: ON/OFF/DISABLED.

We cannot use x2apic_mode for this as this would require to change all
users of x2apic_mode with explicit comparisons for a state value
instead of treating it as boolean.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20150115211702.955392443@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:55 +01:00
Thomas Gleixner
62e61633da x86/x2apic: Clarify remapping mode for x2apic enablement
Rename the argument of try_to_enable_x2apic() so the purpose becomes
more clear.

Make the pr_warning more consistent and avoid the double print of
"disabling".

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20150115211702.876012628@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:55 +01:00
Thomas Gleixner
55eae7de72 x86/x2apic: Move code in conditional region
No point in having try_to_enable_x2apic() outside of the
CONFIG_X86_X2APIC section and having inline functions and more ifdefs
to deal with it. Move the code into the existing ifdef section and
remove the inline cruft.

Fixup the printk about not enabling interrupt remapping as suggested
by Boris.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211702.795388613@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Thomas Gleixner
d524165cb8 x86/apic: Check x2apic early
No point in delaying the x2apic detection for the CONFIG_X86_X2APIC=n
case to enable_IR_x2apic(). We rather detect that before we try to
setup anything there.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211702.702479404@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Thomas Gleixner
9aa1636527 x86/apic: Make disable x2apic work really
If x2apic_preenabled is not enabled, then disable_x2apic() is not
called from various places which results in x2apic_disabled not being
set. So other code pathes can happily reenable the x2apic.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211702.621431109@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Thomas Gleixner
2ca5b40479 x86/ioapic: Check x2apic really
The x2apic_preenabled flag is just a horrible hack and if X2APIC
support is disabled it does not reflect the actual hardware
state. Check the hardware instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211702.541280622@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Thomas Gleixner
bfb0507029 x86/apic: Move x2apic code to one place
Having several disjunct pieces of code for x2apic support makes
reading the code unnecessarily hard. Move it to one ifdeffed section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20150115211702.445212133@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Thomas Gleixner
81a46dd824 x86/apic: Make x2apic_mode depend on CONFIG_X86_X2APIC
No point in having a static variable around which is always 0. Let the
compiler optimize code out if disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20150115211702.363274310@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Thomas Gleixner
8d80696060 x86/apic: Avoid open coded x2apic detection
enable_IR_x2apic() grew a open coded x2apic detection. Implement a
proper helper function which shares the code with the already existing
x2apic_enabled().

Made it use rdmsrl_safe as suggested by Boris.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20150115211702.285038186@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
K. Y. Srinivasan
32c6590d12 x86, hyperv: Mark the Hyper-V clocksource as being continuous
The Hyper-V clocksource is continuous; mark it accordingly.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: jasowang@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: devel@linuxdriverproject.org
Cc: olaf@aepfle.de
Cc: apw@canonical.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1421108762-3331-1-git-send-email-kys@microsoft.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 14:36:25 +01:00
Oleg Nesterov
7575637ab2 x86, fpu: Fix math_state_restore() race with kernel_fpu_begin()
math_state_restore() can race with kernel_fpu_begin() if irq comes
right after __thread_fpu_begin(), __save_init_fpu() will overwrite
fpu->state we are going to restore.

Add 2 simple helpers, kernel_fpu_disable() and kernel_fpu_enable()
which simply set/clear in_kernel_fpu, and change math_state_restore()
to exclude kernel_fpu_begin() in between.

Alternatively we could use local_irq_save/restore, but probably these
new helpers can have more users.

Perhaps they should disable/enable preemption themselves, in this case
we can remove preempt_disable() in __restore_xstate_sig().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: matt.fleming@intel.com
Cc: bp@suse.de
Cc: pbonzini@redhat.com
Cc: luto@amacapital.net
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150115192028.GD27332@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 13:53:07 +01:00
Oleg Nesterov
33a3ebdc07 x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end()
Now that we have in_kernel_fpu we can remove __thread_clear_has_fpu()
in __kernel_fpu_begin(). And this allows to replace the asymmetrical
and nontrivial use_eager_fpu + tsk_used_math check in kernel_fpu_end()
with the same __thread_has_fpu() check.

The logic becomes really simple; if _begin() does save() then _end()
needs restore(), this is controlled by __thread_has_fpu(). Otherwise
they do clts/stts unless use_eager_fpu().

Not only this makes begin/end symmetrical and imo more understandable,
potentially this allows to change irq_fpu_usable() to avoid all other
checks except "in_kernel_fpu".

Also, with this patch __kernel_fpu_end() does restore_fpu_checking()
and WARNs if it fails instead of math_state_restore(). I think this
looks better because we no longer need __thread_fpu_begin(), and it
would be better to report the failure in this case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: matt.fleming@intel.com
Cc: bp@suse.de
Cc: pbonzini@redhat.com
Cc: luto@amacapital.net
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150115192005.GC27332@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 13:53:07 +01:00
Oleg Nesterov
14e153ef75 x86, fpu: Introduce per-cpu in_kernel_fpu state
interrupted_kernel_fpu_idle() tries to detect if kernel_fpu_begin()
is safe or not. In particular it should obviously deny the nested
kernel_fpu_begin() and this logic looks very confusing.

If use_eager_fpu() == T we rely on a) __thread_has_fpu() check in
interrupted_kernel_fpu_idle(), and b) on the fact that _begin() does
__thread_clear_has_fpu().

Otherwise we demand that the interrupted task has no FPU if it is in
kernel mode, this works because __kernel_fpu_begin() does clts() and
interrupted_kernel_fpu_idle() checks X86_CR0_TS.

Add the per-cpu "bool in_kernel_fpu" variable, and change this code
to check/set/clear it. This allows to do more cleanups and fixes, see
the next changes.

The patch also moves WARN_ON_ONCE() under preempt_disable() just to
make this_cpu_read() look better, this is not really needed. And in
fact I think we should move it into __kernel_fpu_begin().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: matt.fleming@intel.com
Cc: bp@suse.de
Cc: pbonzini@redhat.com
Cc: luto@amacapital.net
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150115191943.GB27332@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 13:53:07 +01:00
Andy Shevchenko
0e1540208e x86: pmc_atom: Expose contents of PSS
The PSS register reflects the power state of each island on SoC. It would be
useful to know which of the islands is on or off at the momemnt.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com>
Link: http://lkml.kernel.org/r/1421253575-22509-6-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 12:50:14 +01:00
Andy Shevchenko
4b25f42a37 x86: pmc_atom: Clean up init function
There is no need to use err variable.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com>
Link: http://lkml.kernel.org/r/1421253575-22509-5-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 12:50:14 +01:00
Andy Shevchenko
4922b9ce89 x86: pmc-atom: Remove unused macro
DRIVER_NAME seems unused. This patch just removes it. There is no functional
change.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com>
Link: http://lkml.kernel.org/r/1421253575-22509-4-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 12:50:14 +01:00
Andy Shevchenko
d5df8fe34b x86: pmc_atom: don%27t check for NULL twice
debugfs_remove_recursive() is NULL-aware, thus, we may safely remove the check
here. There is no need to assing NULL to variable since it will be not used
anywhere.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com>
Link: http://lkml.kernel.org/r/1421253575-22509-3-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 12:50:14 +01:00
Andy Shevchenko
1b43d7125f x86: pmc-atom: Assign debugfs node as soon as possible
pmc_dbgfs_unregister() will be called when pmc->dbgfs_dir is unconditionally
NULL on error path in pmc_dbgfs_register(). To prevent this we move the
assignment to where is should be.

Fixes: f855911c1f (x86/pmc_atom: Expose PMC device state and platform sleep state)
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com>
Link: http://lkml.kernel.org/r/1421253575-22509-2-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 12:50:14 +01:00
Jan Beulich
4a0d3107d6 x86, irq: Properly tag virtualization entry in /proc/interrupts
The mis-naming likely was a copy-and-paste effect.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/54B9408B0200007800055E8B@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 12:37:23 +01:00
Jiang Liu
b568b8601f x86/xen: Treat SCI interrupt as normal GSI interrupt
Currently Xen Domain0 has special treatment for ACPI SCI interrupt,
that is initialize irq for ACPI SCI at early stage in a special way as:
xen_init_IRQ()
	->pci_xen_initial_domain()
		->xen_setup_acpi_sci()
			Allocate and initialize irq for ACPI SCI

Function xen_setup_acpi_sci() calls acpi_gsi_to_irq() to get an irq
number for ACPI SCI. But unfortunately acpi_gsi_to_irq() depends on
IOAPIC irqdomains through following path
acpi_gsi_to_irq()
	->mp_map_gsi_to_irq()
		->mp_map_pin_to_irq()
			->check IOAPIC irqdomain

For PV domains, it uses Xen event based interrupt manangement and
doesn't make uses of native IOAPIC, so no irqdomains created for IOAPIC.
This causes Xen domain0 fail to install interrupt handler for ACPI SCI
and all ACPI events will be lost. Please refer to:
https://lkml.org/lkml/2014/12/19/178

So the fix is to get rid of special treatment for ACPI SCI, just treat
ACPI SCI as normal GSI interrupt as:
acpi_gsi_to_irq()
	->acpi_register_gsi()
		->acpi_register_gsi_xen()
			->xen_register_gsi()

With above change, there's no need for xen_setup_acpi_sci() anymore.
The above change also works with bare metal kernel too.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Tony Luck <tony.luck@intel.com>
Cc: xen-devel@lists.xenproject.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1421720467-7709-2-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 11:44:40 +01:00
Rusty Russell
be1f221c04 module: remove mod arg from module_free, rename module_memfree().
Nothing needs the module pointer any more, and the next patch will
call it from RCU, where the module itself might no longer exist.
Removing the arg is the safest approach.

This just codifies the use of the module_alloc/module_free pattern
which ftrace and bpf use.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86@kernel.org
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: nios2-dev@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: netdev@vger.kernel.org
2015-01-20 11:38:33 +10:30
Linus Torvalds
59b2858f57 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also two PMU driver fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools powerpc: Use dwfl_report_elf() instead of offline.
  perf tools: Fix segfault for symbol annotation on TUI
  perf test: Fix dwarf unwind using libunwind.
  perf tools: Avoid build splat for syscall numbers with uclibc
  perf tools: Elide strlcpy warning with uclibc
  perf tools: Fix statfs.f_type data type mismatch build error with uclibc
  tools: Remove bitops/hweight usage of bits in tools/perf
  perf machine: Fix __machine__findnew_thread() error path
  perf tools: Fix building error in x86_64 when dwarf unwind is on
  perf probe: Propagate error code when write(2) failed
  perf/x86/intel: Fix bug for "cycles:p" and "cycles:pp" on SLM
  perf/rapl: Fix sysfs_show() initialization for RAPL PMU
2015-01-18 06:24:30 +12:00
Andy Lutomirski
0fcedc8631 x86_64 entry: Fix RCX for ptraced syscalls
The int_ret_from_sys_call and syscall tracing code disagrees
with the sysret path as to the value of RCX.

The Intel SDM, the AMD APM, and my laptop all agree that sysret
returns with RCX == RIP.  The syscall tracing code does not
respect this property.

For example, this program:

int main()
{
	extern const char syscall_rip[];
	unsigned long rcx = 1;
	unsigned long orig_rcx = rcx;
	asm ("mov $-1, %%eax\n\t"
	     "syscall\n\t"
	     "syscall_rip:"
	     : "+c" (rcx) : : "r11");
	printf("syscall: RCX = %lX  RIP = %lX  orig RCX = %lx\n",
	       rcx, (unsigned long)syscall_rip, orig_rcx);
	return 0;
}

prints:

  syscall: RCX = 400556  RIP = 400556  orig RCX = 1

Running it under strace gives this instead:

  syscall: RCX = FFFFFFFFFFFFFFFF  RIP = 400556  orig RCX = 1

This changes FIXUP_TOP_OF_STACK to match sysret, causing the
test to show RCX == RIP even under strace.

It looks like this is a partial revert of:
88e4bc32686e ("[PATCH] x86-64 architecture specific sync for 2.5.8")
from the historic git tree.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/c9a418c3dc3993cb88bb7773800225fd318a4c67.1421453410.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-17 11:02:53 +01:00
Linus Torvalds
23aa4b416a This holds a few fixes to the ftrace infrastructure as well as
the mixture of function graph tracing and kprobes.
 
 When jprobes and function graph tracing is enabled at the same time
 it will crash the system.
 
   # modprobe jprobe_example
   # echo function_graph > /sys/kernel/debug/tracing/current_tracer
 
 After the first fork (jprobe_example probes it), the system will crash.
 This is due to the way jprobes copies the stack frame and does not
 do a normal function return. This messes up with the function graph
 tracing accounting which hijacks the return address from the stack
 and replaces it with a hook function. It saves the return addresses in
 a separate stack to put back the correct return address when done.
 But because the jprobe functions do not do a normal return, their
 stack addresses are not put back until the function they probe is called,
 which means that the probed function will get the return address of
 the jprobe handler instead of its own.
 
 The simple fix here was to disable function graph tracing while the
 jprobe handler is being called.
 
 While debugging this I found two minor bugs with the function graph
 tracing.
 
 The first was about the function graph tracer sharing its function hash
 with the function tracer (they both get filtered by the same input).
 The changing of the set_ftrace_filter would not sync the function recording
 records after a change if the function tracer was disabled but the
 function graph tracer was enabled. This was due to the update only checking
 one of the ops instead of the shared ops to see if they were enabled and
 should perform the sync. This caused the ftrace accounting to break and
 a ftrace_bug() would be triggered, disabling ftrace until a reboot.
 
 The second was that the check to update records only checked one of the
 filter hashes. It needs to test both the "filter" and "notrace" hashes.
 The "filter" hash determines what functions to trace where as the "notrace"
 hash determines what functions not to trace (trace all but these).
 Both hashes need to be passed to the update code to find out what change
 is being done during the update. This also broke the ftrace record
 accounting and triggered a ftrace_bug().
 
 This patch set also include two more fixes that were reported separately
 from the kprobe issue.
 
 One was that init_ftrace_syscalls() was called twice at boot up.
 This is not a major bug, but that call performed a rather large kmalloc
 (NR_syscalls * sizeof(*syscalls_metadata)). The second call made the first
 one a memory leak, and wastes memory.
 
 The other fix is a regression caused by an update in the v3.19 merge window.
 The moving to enable events early, moved the enabling before PID 1 was
 created. The syscall events require setting the TIF_SYSCALL_TRACEPOINT
 for all tasks. But for_each_process_thread() does not include the swapper
 task (PID 0), and ended up being a nop. A suggested fix was to add
 the init_task() to have its flag set, but I didn't really want to mess
 with PID 0 for this minor bug. Instead I disable and re-enable events again
 at early_initcall() where it use to be enabled. This also handles any other
 event that might have its own reg function that could break at early
 boot up.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUt9vmAAoJEEjnJuOKh9ldLHEIAJ9XrPW2xMIY5yI69jT1F7pv
 PkSRqENnOK0l4UulD52SvIBecQTTBcEEjao4yVGkc7DCJBOws/1LZ5gW8OfNlKjq
 rMB8yaosL1tXJ1ARVPMjcQVy+228zkgTXznwEZCjku1g7LuScQ28qyXsXO7B6yiK
 xKoHqKjygmM/a2aVn+8tdiVKiDp6jdmkbYicbaFT4xP7XB5DaMmIiXRHxdvW6xdR
 azKrVfYiMyJqTZNt/EVSWUk2WjeaYhoXyNtvgPx515wTo/llCnzhjcsocXBtH2P/
 YOtwl+1L7Z89ukV9oXqrtrUJZ6Ps7+g7I1flJuL7/1FlNGnklcP9JojD+t6HeT8=
 =vkec
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fixes from Steven Rostedt:
 "This holds a few fixes to the ftrace infrastructure as well as the
  mixture of function graph tracing and kprobes.

  When jprobes and function graph tracing is enabled at the same time it
  will crash the system:

      # modprobe jprobe_example
      # echo function_graph > /sys/kernel/debug/tracing/current_tracer

  After the first fork (jprobe_example probes it), the system will
  crash.

  This is due to the way jprobes copies the stack frame and does not do
  a normal function return.  This messes up with the function graph
  tracing accounting which hijacks the return address from the stack and
  replaces it with a hook function.  It saves the return addresses in a
  separate stack to put back the correct return address when done.  But
  because the jprobe functions do not do a normal return, their stack
  addresses are not put back until the function they probe is called,
  which means that the probed function will get the return address of
  the jprobe handler instead of its own.

  The simple fix here was to disable function graph tracing while the
  jprobe handler is being called.

  While debugging this I found two minor bugs with the function graph
  tracing.

  The first was about the function graph tracer sharing its function
  hash with the function tracer (they both get filtered by the same
  input).  The changing of the set_ftrace_filter would not sync the
  function recording records after a change if the function tracer was
  disabled but the function graph tracer was enabled.  This was due to
  the update only checking one of the ops instead of the shared ops to
  see if they were enabled and should perform the sync.  This caused the
  ftrace accounting to break and a ftrace_bug() would be triggered,
  disabling ftrace until a reboot.

  The second was that the check to update records only checked one of
  the filter hashes.  It needs to test both the "filter" and "notrace"
  hashes.  The "filter" hash determines what functions to trace where as
  the "notrace" hash determines what functions not to trace (trace all
  but these).  Both hashes need to be passed to the update code to find
  out what change is being done during the update.  This also broke the
  ftrace record accounting and triggered a ftrace_bug().

  This patch set also include two more fixes that were reported
  separately from the kprobe issue.

  One was that init_ftrace_syscalls() was called twice at boot up.  This
  is not a major bug, but that call performed a rather large kmalloc
  (NR_syscalls * sizeof(*syscalls_metadata)).  The second call made the
  first one a memory leak, and wastes memory.

  The other fix is a regression caused by an update in the v3.19 merge
  window.  The moving to enable events early, moved the enabling before
  PID 1 was created.  The syscall events require setting the
  TIF_SYSCALL_TRACEPOINT for all tasks.  But for_each_process_thread()
  does not include the swapper task (PID 0), and ended up being a nop.

  A suggested fix was to add the init_task() to have its flag set, but I
  didn't really want to mess with PID 0 for this minor bug.  Instead I
  disable and re-enable events again at early_initcall() where it use to
  be enabled.  This also handles any other event that might have its own
  reg function that could break at early boot up"

* tag 'trace-fixes-v3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix enabling of syscall events on the command line
  tracing: Remove extra call to init_ftrace_syscalls()
  ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing
  ftrace: Check both notrace and filter for old hash
  ftrace: Fix updating of filters for shared global_ops filters
2015-01-17 07:55:52 +13:00
Kan Liang
33636732dc perf/x86/intel: Fix bug for "cycles:p" and "cycles:pp" on SLM
cycles:p and cycles:pp do not work on SLM since commit:

   86a04461a9 ("perf/x86: Revamp PEBS event selection")

UOPS_RETIRED.ALL is not a PEBS capable event, so it should not be used
to count cycle number.

Actually SLM calls intel_pebs_aliases_core2() which uses INST_RETIRED.ANY_P
to count the number of cycles. It's a PEBS capable event. But inv and
cmask must be set to count cycles.

Considering SLM allows all events as PEBS with no flags, only
INST_RETIRED.ANY_P, inv=1, cmask=16 needs to handled specially.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1421084541-31639-1-git-send-email-kan.liang@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-16 09:06:59 +01:00
Stephane Eranian
433678bdc6 perf/rapl: Fix sysfs_show() initialization for RAPL PMU
This patch fixes a problem with the initialization of the
sysfs_show() routine for the RAPL PMU.

The current code was wrongly relying on the EVENT_ATTR_STR()
macro which uses the events_sysfs_show() function in the x86
PMU code. That function itself was relying on the x86_pmu data
structure. Yet RAPL and the core PMU (x86_pmu) have nothing to
do with each other. They should therefore not interact with
each other.

The x86_pmu structure is initialized at boot time based on
the host CPU model. When the host CPU is not supported, the
x86_pmu remains uninitialized and some of the callbacks it
contains are NULL.

The false dependency with x86_pmu could potentially cause crashes
in case the x86_pmu is not initialized while the RAPL PMU is. This
may, for instance, be the case in virtualized environments.

This patch fixes the problem by using a private sysfs_show()
routine for exporting the RAPL PMU events.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150113225953.GA21525@thinkpad
Cc: vincent.weaver@maine.edu
Cc: jolsa@redhat.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-16 09:06:58 +01:00
Steven Rostedt (Red Hat)
237d28db03 ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing
If the function graph tracer traces a jprobe callback, the system will
crash. This can easily be demonstrated by compiling the jprobe
sample module that is in the kernel tree, loading it and running the
function graph tracer.

 # modprobe jprobe_example.ko
 # echo function_graph > /sys/kernel/debug/tracing/current_tracer
 # ls

The first two commands end up in a nice crash after the first fork.
(do_fork has a jprobe attached to it, so "ls" just triggers that fork)

The problem is caused by the jprobe_return() that all jprobe callbacks
must end with. The way jprobes works is that the function a jprobe
is attached to has a breakpoint placed at the start of it (or it uses
ftrace if fentry is supported). The breakpoint handler (or ftrace callback)
will copy the stack frame and change the ip address to return to the
jprobe handler instead of the function. The jprobe handler must end
with jprobe_return() which swaps the stack and does an int3 (breakpoint).
This breakpoint handler will then put back the saved stack frame,
simulate the instruction at the beginning of the function it added
a breakpoint to, and then continue on.

For function tracing to work, it hijakes the return address from the
stack frame, and replaces it with a hook function that will trace
the end of the call. This hook function will restore the return
address of the function call.

If the function tracer traces the jprobe handler, the hook function
for that handler will not be called, and its saved return address
will be used for the next function. This will result in a kernel crash.

To solve this, pause function tracing before the jprobe handler is called
and unpause it before it returns back to the function it probed.

Some other updates:

Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the
code look a bit cleaner and easier to understand (various tries to fix
this bug required this change).

Note, if fentry is being used, jprobes will change the ip address before
the function graph tracer runs and it will not be able to trace the
function that the jprobe is probing.

Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org

Cc: stable@vger.kernel.org # 2.6.30+
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-15 09:39:18 -05:00
Ingo Molnar
2372673c64 Minor cleanups.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUtScFAAoJEBLB8Bhh3lVKhUsP/37W3Sm+0kpQ/fo7h0BmiP6d
 U+3w8wvIRfqwp3e9id+YXaTpYBfD81ejcgqP/HV1HYYYPqpzpJY/cXfDpU9WEkKS
 rZZKQlw5KB+pA4nbs6GgroYjeoqBpW07Mz3FOHpIVyZ6W9wAyTtL76JK5nNIcBb4
 db9H9cQOEs8mzaLMEDu36QGbnN/fQr4R3ULSAZCYBMrA7eDdpfd2mGJR57eOSutL
 o3XsQaIaIvnDKfjuGbBcLKkqdfwAgSZfVulKxgcBiSjdH6kN2blC9HHkQ+8vuEZp
 t+ouxzNZw4Ml1CbpzGU0hi9K3DkxXbhml7bMo9yZVlhjPglqyZXqVZU35rgIgEaB
 NxoenKSVybe2hi0K3S5kNtwig1GwadxUmK5S9M3HFbugu1OtKpgPvBp7GSy+fF28
 aphe3pSh8o58rLmp6npigv0YTyIRkGKw1XYHKsP7cClvU2UbRmJrJpD6CGyMEBKC
 Npss2Sfon1+Ig4iP13VkUbJjxYsf/obbTSaLXsJ8mEWkv1nfNDeGmBaHWwlA1aMP
 i4toba9H0Ax264aApIQ4FAvict/Qvmh0Hh9sghG6ERpeeuXtzQMnfKy3Ts7tPc1s
 ZfmBq6IWs2ZkB2tIOUz2Caiw1ybWd2CSQdtbeu2B9wt0KzQW1xm3uTNGl27cmwR0
 2MjjnO/uh0fZ1hKWFIr4
 =Ut02
 -----END PGP SIGNATURE-----

Merge tag 'x86_queue' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/cleanups

Pull minor x86 cleanups from Borislav Petkov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-15 11:38:51 +01:00
Ingo Molnar
37e4d3b951 Nothing special this time, just an error messages improvement from Andy
and a cleanup from me.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUtQlAAAoJEBLB8Bhh3lVKB5wQAIgnT9hZazgGtlvns8UIA1SY
 ZTM/4yEBGo/a5iIaD37ahfUm1Eb8Y/u3ITlKu8/DFyS7bEq9uEf+IGV9ZqeH8F9t
 QWe079ObsURsxUFz3jg9iW7Er4uWlvHoyaeWoGS8MCKpTB8oYLr70vZafNLLQ0Qy
 QYCV6SSFO51WZ2J3x7PBRXNANvvVPe8AhNx8rb3VOaP7aDlBXk8rzu3MVJwUJjOb
 /tgRz7uChp3HAW6PADehM+ELNDINMAW8wJB1XwbfHnwGJYTVrBdWpBnF2h/qULoN
 p/KU+zpuTZpopfkaHiNb7cgwR1B+Ig5DVvXHMTMkJCHe0y978ch4kwSy6nLGXvZ6
 ig5h8yi83K1cXGdl6/HwRidge83Y97nceOi8hyqiVsOfGuOQMYbGw3lbzyLGPlVT
 RzRyaWToS7UlRtlr2qDvqzmLGujDt1bpSLhoLxNaQ3BKJ2tPZJ/TyH9BfvmE+Ed8
 1zITTL88B7bXxJLIdyS8pgQxHeuoRuDutd0uRh0Uolm6Z6PHxOIt5euly99zhA9u
 s5Xl/7dv66VA8PmY7VoZlmCuxlU8uY/RUct/v7bwIspYO8NvUe7A7cSoFycge286
 MON6rVXkvWqeJIxqXEs8K6tbtF+D6LhBUxUKGqXHbJvmslse0FwrwHoq0e4jmfth
 BgYBxhd/nH+CQO4OYBBF
 =r9gb
 -----END PGP SIGNATURE-----

Merge tag 'ras_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras

Pull RAS updates from Borislav Petkov:

  "Nothing special this time, just an error messages improvement from Andy
   and a cleanup from me."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-15 11:29:49 +01:00
Jiang Liu
c392f56c94 iommu/irq_remapping: Kill function irq_remapping_supported() and related code
Simplify irq_remapping code by killing irq_remapping_supported() and
related interfaces.

Joerg posted a similar patch at https://lkml.org/lkml/2014/12/15/490,
so assume an signed-off from Joerg.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
Link: http://lkml.kernel.org/r/1420615903-28253-14-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-15 11:24:23 +01:00
Jiang Liu
5fcee53ce7 x86/apic: Only disable CPU x2apic mode when necessary
When interrupt remapping hardware is not in X2APIC, CPU X2APIC mode
will be disabled if:
1) Maximum CPU APIC ID is bigger than 255
2) hypervisior doesn't support x2apic mode.

But we should only check whether hypervisor supports X2APIC mode when
hypervisor(CONFIG_HYPERVISOR_GUEST) is enabled, otherwise X2APIC will
always be disabled when CONFIG_HYPERVISOR_GUEST is disabled and IR
doesn't work in X2APIC mode.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
Link: http://lkml.kernel.org/r/1420615903-28253-12-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-15 11:24:23 +01:00
Jiang Liu
ef1b2b8ad1 x86/apic: Handle XAPIC remap mode proper.
If remapping is in XAPIC mode, the setup code just skips X2APIC
initialization without checking max CPU APIC ID in system, which may
cause problem if system has a CPU with APIC ID bigger than 255.

Handle IR in XAPIC mode the same way as if remapping is disabled.

[ tglx: Split out from previous patch ]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
Link: http://lkml.kernel.org/r/1420615903-28253-8-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-15 11:24:23 +01:00
Jiang Liu
07806c50bd x86/apic: Refine enable_IR_x2apic() and related functions
Refine enable_IR_x2apic() and related functions for better readability.

[ tglx: Removed the XAPIC mode change and split it out into a seperate
  	patch. Added comments. ]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
Link: http://lkml.kernel.org/r/1420615903-28253-8-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-15 11:24:23 +01:00
Jiang Liu
89356cf20e x86/apic: Correctly detect X2APIC status in function enable_IR()
X2APIC will be disabled if user specifies "nox2apic" on kernel command
line, even when x2apic_preenabled is true. So correctly detect X2APIC
status by using x2apic_enabled() instead of x2apic_preenabled.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
Link: http://lkml.kernel.org/r/1420615903-28253-7-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-15 11:24:23 +01:00
Jiang Liu
7f530a2771 x86/apic: Kill useless variable x2apic_enabled in function enable_IR_x2apic()
Local variable x2apic_enabled has been assigned to but never referred,
so kill it.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
Link: http://lkml.kernel.org/r/1420615903-28253-6-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-15 11:24:22 +01:00
Jiang Liu
2599094f6e x86/apic: Panic if kernel doesn't support x2apic but BIOS has enabled x2apic
When kernel doesn't support X2APIC but BIOS has enabled X2APIC, system
may panic or hang without useful messages. On the other hand, it's
hard to dynamically disable X2APIC when CONFIG_X86_X2APIC is disabled.
So panic with a clear message in such a case.

Now system panics as below when X2APIC is disabled and interrupt remapping
is enabled:
[    0.316118] LAPIC pending interrupts after 512 EOI
[    0.322126] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.368655] Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
[    0.378300] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.0+ #340
[    0.385300] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0051.L05.1406240953 06/24/2014
[    0.396997]  ffff88046dc03000 ffff88046c307dd8 ffffffff8179dada 00000000000043f2
[    0.405629]  ffffffff81a92158 ffff88046c307e58 ffffffff8179b757 0000000000000002
[    0.414261]  0000000000000008 ffff88046c307e68 ffff88046c307e08 ffffffff813ad82b
[    0.422890] Call Trace:
[    0.425711]  [<ffffffff8179dada>] dump_stack+0x45/0x57
[    0.431533]  [<ffffffff8179b757>] panic+0xc1/0x1f5
[    0.436978]  [<ffffffff813ad82b>] ? delay_tsc+0x3b/0x70
[    0.442910]  [<ffffffff8166fa2c>] panic_if_irq_remap+0x1c/0x20
[    0.449524]  [<ffffffff81d73645>] setup_IO_APIC+0x405/0x82e
[    0.464979]  [<ffffffff81d6fcc2>] native_smp_prepare_cpus+0x2d9/0x31c
[    0.472274]  [<ffffffff81d5d0ac>] kernel_init_freeable+0xd6/0x223
[    0.479170]  [<ffffffff81792ad0>] ? rest_init+0x80/0x80
[    0.485099]  [<ffffffff81792ade>] kernel_init+0xe/0xf0
[    0.490932]  [<ffffffff817a537c>] ret_from_fork+0x7c/0xb0
[    0.497054]  [<ffffffff81792ad0>] ? rest_init+0x80/0x80
[    0.502983] ---[ end Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC

System hangs as below when X2APIC and interrupt remapping are both disabled:
[    1.102782] pci 0000:00:02.0: System wakeup disabled by ACPI
[    1.109351] pci 0000:00:03.0: System wakeup disabled by ACPI
[    1.115915] pci 0000:00:03.2: System wakeup disabled by ACPI
[    1.122479] pci 0000:00:03.3: System wakeup disabled by ACPI
[    1.132274] pci 0000:00:1c.0: Enabling MPC IRBNCE
[    1.137620] pci 0000:00:1c.0: Intel PCH root port ACS workaround enabled
[    1.145239] pci 0000:00:1c.0: System wakeup disabled by ACPI
[    1.151790] pci 0000:00:1c.7: Enabling MPC IRBNCE
[    1.157128] pci 0000:00:1c.7: Intel PCH root port ACS workaround enabled
[    1.164748] pci 0000:00:1c.7: System wakeup disabled by ACPI
[    1.171447] pci 0000:00:1e.0: System wakeup disabled by ACPI
[    1.178612] acpiphp: Slot [8] registered
[    1.183095] pci 0000:00:02.0: PCI bridge to [bus 01]
[    1.188867] acpiphp: Slot [2] registered

With this patch applied, the system panics in both cases with a proper
panic message.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
Link: http://lkml.kernel.org/r/1420615903-28253-5-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-15 11:24:22 +01:00
Thomas Gleixner
f7ccadac2d x86/apic: Clear stale x2apic mode
If x2apic got disabled on the kernel command line, then the following
issue can happen:

enable_IR_x2apic()
   ....
   x2apic_mode = 1;
   enable_x2apic();

     if (x2apic_disabled) {
	__disable_x2apic();
	return;
     }

That leaves X2APIC disabled in hardware, but x2apic_mode stays 1. So
all other code which checks x2apic_mode gets the wrong information.

Set x2apic_mode to 0 after disabling it in hardware.

This is just a hotfix. The proper solution is to rework this code so
it has seperate functions for the initial setup on the boot processor
and the secondary cpus, but that's beyond the scope of this fix.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
2015-01-15 11:24:22 +01:00
Thomas Gleixner
a1dafe857d iommu, x86: Restructure setup of the irq remapping feature
enable_IR_x2apic() calls setup_irq_remapping_ops() which by default
installs the intel dmar remapping ops and then calls the amd iommu irq
remapping prepare callback to figure out whether we are running on an
AMD machine with irq remapping hardware.

Right after that it calls irq_remapping_prepare() which pointlessly
checks:
	if (!remap_ops || !remap_ops->prepare)
               return -ENODEV;
and then calls

    remap_ops->prepare()

which is silly in the AMD case as it got called from
setup_irq_remapping_ops() already a few microseconds ago.

Simplify this and just collapse everything into
irq_remapping_prepare().

The irq_remapping_prepare() remains still silly as it assigns blindly
the intel ops, but that's not scope of this patch.

The scope here is to move the preperatory work, i.e. memory
allocations out of the atomic section which is required to enable irq
remapping.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Acked-and-tested-by: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141205084147.232633738@linutronix.de
Link: http://lkml.kernel.org/r/1420615903-28253-2-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-15 11:24:22 +01:00
Denys Vlasenko
f6f64681d9 x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user
No code changes.

This is a preparatory patch for change in "struct pt_regs" handling.

CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-13 14:18:08 -08:00
Denys Vlasenko
af9cfe270d x86: entry_64.S: delete unused code
A define, two macros and an unreferenced bit of assembly are gone.

Acked-by: Borislav Petkov <bp@suse.de>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-13 14:00:33 -08:00
Masami Hiramatsu
cbf6ab52ad kprobes: Pass the original kprobe for preparing optimized kprobe
Pass the original kprobe for preparing an optimized kprobe arch-dep
part, since for some architecture (e.g. ARM32) requires the information
in original kprobe.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2015-01-13 16:10:16 +00:00
Linus Torvalds
505569d208 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: two vdso fixes, two kbuild fixes and a boot failure fix
  with certain odd memory mappings"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, vdso: Use asm volatile in __getcpu
  x86/build: Clean auto-generated processor feature files
  x86: Fix mkcapflags.sh bash-ism
  x86: Fix step size adjustment during initial memory mapping
  x86_64, vdso: Fix the vdso address randomization algorithm
2015-01-11 11:53:46 -08:00
Linus Torvalds
ddb321a8dd Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also some kernel side fixes: uncore PMU
  driver fix, user regs sampling fix and an instruction decoder fix that
  unbreaks PEBS precise sampling"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes
  perf/x86_64: Improve user regs sampling
  perf: Move task_pt_regs sampling into arch code
  x86: Fix off-by-one in instruction decoder
  perf hists browser: Fix segfault when showing callchain
  perf callchain: Free callchains when hist entries are deleted
  perf hists: Fix children sort key behavior
  perf diff: Fix to sort by baseline field by default
  perf list: Fix --raw-dump option
  perf probe: Fix crash in dwarf_getcfi_elf
  perf probe: Fix to fall back to find probe point in symbols
  perf callchain: Append callchains only when requested
  perf ui/tui: Print backtrace symbols when segfault occurs
  perf report: Show progress bar for output resorting
2015-01-11 11:47:45 -08:00
Steven Honeyman
f94fe119f2 x86, CPU: Fix trivial printk formatting issues with dmesg
dmesg (from util-linux) currently has two methods for reading the kernel
message ring buffer: /dev/kmsg and syslog(2). Since kernel 3.5.0 kmsg
has been the default, which escapes control characters (e.g. new lines)
before they are shown.

This change means that when dmesg is using /dev/kmsg, a 2 line printk
makes the output messy, because the second line does not get a
timestamp.

For example:

[    0.012863] CPU0: Thermal monitoring enabled (TM1)
[    0.012869] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
[    0.012958] Freeing SMP alternatives memory: 28K (ffffffff81d86000 - ffffffff81d8d000)
[    0.014961] dmar: Host address width 39

Because printk.c intentionally escapes control characters, they should
not be there in the first place. This patch fixes two occurrences of
this.

Signed-off-by: Steven Honeyman <stevenhoneyman@gmail.com>
Link: https://lkml.kernel.org/r/1414856696-8094-1-git-send-email-stevenhoneyman@gmail.com
[ Boris: make cpu_detect_tlb() static, while at it. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-11 01:54:54 +01:00
Andi Kleen
5306c31c57 perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes
There was another report of a boot failure with a #GP fault in the
uncore SBOX initialization. The earlier work around was not enough
for this system.

The boot was failing while trying to initialize the third SBOX.

This patch detects parts with only two SBOXes and limits the number
of SBOX units to two there.

Stable material, as it affects boot problems on 3.18.

Tested-by: Andreas Oehler <andreas@oehler-net.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1420583675-9163-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:30 +01:00
Andy Lutomirski
86c269fea3 perf/x86_64: Improve user regs sampling
Perf reports user regs for kernel-mode samples so that samples can
be backtraced through user code.  The old code was very broken in
syscall context, resulting in useless backtraces.

The new code, in contrast, is still dangerously racy, but it should
at least work most of the time.

Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: chenggang.qcg@taobao.com
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/243560c26ff0f739978e2459e203f6515367634d.1420396372.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:29 +01:00
Andy Lutomirski
88a7c26af8 perf: Move task_pt_regs sampling into arch code
On x86_64, at least, task_pt_regs may be only partially initialized
in many contexts, so x86_64 should not use it without extra care
from interrupt context, let alone NMI context.

This will allow x86_64 to override the logic and will supply some
scratch space to use to make a cleaner copy of user regs.

Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: chenggang.qcg@taobao.com
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/e431cd4c18c2e1c44c774f10758527fb2d1025c4.1420396372.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:28 +01:00
Luck, Tony
d4812e169d x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
We now switch to the kernel stack when a machine check interrupts
during user mode.  This means that we can perform recovery actions
in the tail of do_machine_check()

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-07 07:47:42 -08:00
Hanjun Guo
d02dc27db0 ACPI / processor: Rename acpi_(un)map_lsapic() to acpi_(un)map_cpu()
acpi_map_lsapic() will allocate a logical CPU number and map it to
physical CPU id (such as APIC id) for the hot-added CPU, it will also
do some mapping for NUMA node id and etc, acpi_unmap_lsapic() will
do the reverse.

We can see that the name of the function is a little bit confusing and
arch (IA64) dependent so rename them as acpi_(un)map_cpu() to make arch
agnostic and explicit.

Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-05 23:34:26 +01:00
Andy Lutomirski
bced35b65a x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic
In some IST handlers, if the interrupt came from user mode,
we can safely enable preemption.  Add helpers to do it safely.

This is intended to be used my the memory failure code in
do_machine_check.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-02 10:22:46 -08:00