Commit graph

22,795 commits

Author SHA1 Message Date
Ingo Molnar
f320ead76a Merge branch 'x86/asm' into locking/core
Upcoming changes to static keys is interacting/conflicting with the following
pending TSC commits in tip:x86/asm:

  4ea1636b04 x86/asm/tsc: Rename native_read_tsc() to rdtsc()
  ...

So merge it into the locking tree to have a smoother resolution.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 11:04:00 +02:00
Andrey Konovalov
76695af20c locking, arch: use WRITE_ONCE()/READ_ONCE() in smp_store_release()/smp_load_acquire()
Replace ACCESS_ONCE() macro in smp_store_release() and smp_load_acquire()
with WRITE_ONCE() and READ_ONCE() on x86, arm, arm64, ia64, metag, mips,
powerpc, s390, sparc and asm-generic since ACCESS_ONCE() does not work
reliably on non-scalar types.

WRITE_ONCE() and READ_ONCE() were introduced in the following commits:

  230fa253df ("kernel: Provide READ_ONCE and ASSIGN_ONCE")
  43239cbe79 ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)")

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1438528264-714-1-git-send-email-andreyknvl@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 10:59:30 +02:00
Ingo Molnar
3a7651e683 Linux 4.2-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVvsVPAAoJEHm+PkMAQRiGkMAH/AqQUjzltvIwDq39vlbrwpc1
 DIgpm15zaIHKVThQRA69cBIDOprckk6pChFhA/aZVhRBsVva/Z3k8vIjaAzW7eDs
 OK3zE1VsQ0QSK9FYo/8DJoy8844DF5beVwZVE4/xc8TFbabA6BgWawAgVxdpgzVQ
 LQb6jMHQPGGpAQrdPJJcfkeQRi9GBpyXLX6x7nO4jKQAPQGVUqT1QLFN/XYMNp7n
 xmdWogyNfis+c/Vx2OIQUmS/kFO5oyGaSWB1pK2MKeTG5XJ7AITzeHOGfRPmVinn
 x9ozeMLPjTMNFlzPYYrTL+xnqdCPHzKW7KP2LBvNb9PRl7j1vtvNKNNzcD8cAbI=
 =Zmjn
 -----END PGP SIGNATURE-----

Merge branch 'locking/urgent', tag 'v4.2-rc5' into locking/core, to pick up fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 10:52:25 +02:00
Linus Torvalds
51d2e09b94 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Fallout from the recent NMI fixes: make x86 LDT handling more robust.

  Also some EFI fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ldt: Make modify_ldt synchronous
  x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
  x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()
  efi: Check for NULL efi kernel parameters
  x86/efi: Use all 64 bit of efi_memmap in setup_e820()
2015-08-01 09:16:33 -07:00
David S. Miller
5510b3c2a1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/s390/net/bpf_jit_comp.c
	drivers/net/ethernet/ti/netcp_ethss.c
	net/bridge/br_multicast.c
	net/ipv4/ip_fragment.c

All four conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31 23:52:20 -07:00
Linus Torvalds
7c764cec37 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Must teardown SR-IOV before unregistering netdev in igb driver, from
    Alex Williamson.

 2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell.

 3) Default route selection in ipv4 should take the prefix length, table
    ID, and TOS into account, from Julian Anastasov.

 4) sch_plug must have a reset method in order to purge all buffered
    packets when the qdisc is reset, likewise for sch_choke, from WANG
    Cong.

 5) Fix deadlock and races in slave_changelink/br_setport in bridging.
    From Nikolay Aleksandrov.

 6) mlx4 bug fixes (wrong index in port even propagation to VFs,
    overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack
    Morgenstein, and Or Gerlitz.

 7) Turn off klog message about SCTP userspace interface compat that
    makes no sense at all, from Daniel Borkmann.

 8) Fix unbounded restarts of inet frag eviction process, causing NMI
    watchdog soft lockup messages, from Florian Westphal.

 9) Suspend/resume fixes for r8152 from Hayes Wang.

10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from
    Sabrina Dubroca.

11) Fix performance regression when removing a lot of routes from the
    ipv4 routing tables, from Alexander Duyck.

12) Fix device leak in AF_PACKET, from Lars Westerhoff.

13) AF_PACKET also has a header length comparison bug due to signedness,
    from Alexander Drozdov.

14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann.

15) Memory leaks, TSO stats, watchdog timeout and other fixes to
    thunderx driver from Sunil Goutham and Thanneeru Srinivasulu.

16) act_bpf can leak memory when replacing programs, from Daniel
    Borkmann.

17) WOL packet fixes in gianfar driver, from Claudiu Manoil.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
  stmmac: fix missing MODULE_LICENSE in stmmac_platform
  gianfar: Enable device wakeup when appropriate
  gianfar: Fix suspend/resume for wol magic packet
  gianfar: Fix warning when CONFIG_PM off
  act_pedit: check binding before calling tcf_hash_release()
  net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
  net: sched: fix refcount imbalance in actions
  r8152: reset device when tx timeout
  r8152: add pre_reset and post_reset
  qlcnic: Fix corruption while copying
  act_bpf: fix memory leaks when replacing bpf programs
  net: thunderx: Fix for crash while BGX teardown
  net: thunderx: Add PCI driver shutdown routine
  net: thunderx: Fix crash when changing rss with mutliple traffic flows
  net: thunderx: Set watchdog timeout value
  net: thunderx: Wakeup TXQ only if CQE_TX are processed
  net: thunderx: Suppress alloc_pages() failure warnings
  net: thunderx: Fix TSO packet statistic
  net: thunderx: Fix memory leak when changing queue count
  net: thunderx: Fix RQ_DROP miscalculation
  ...
2015-07-31 17:10:56 -07:00
Brian Gerst
decd275e62 x86/vm86: Rename vm86->v86flags and v86mask
Rename v86flags to veflags, and v86mask to veflags_mask.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-9-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:11 +02:00
Brian Gerst
1342635638 x86/vm86: Rename vm86->vm86_info to user_vm86
Make it clearer that this is the pointer to the userspace vm86
state area.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-8-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:11 +02:00
Brian Gerst
ba3e127ec1 x86/vm86: Clean up vm86.h includes
vm86.h was being implicitly included in alot of places via
processor.h, which in turn got it from math_emu.h.  Break that
chain and explicitly include vm86.h in all files that need it.
Also remove unused vm86 field from math_emu_info.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-7-git-send-email-brgerst@gmail.com
[ Fixed build failure. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:10 +02:00
Ingo Molnar
af3e565a85 x86/vm86: Move the vm86 IRQ definitions to vm86.h
Move vm86 specific definitions from irq_vectors.h to vm86.h.

Based on patch from Brian Gerst.

Originally-from: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-6-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:10 +02:00
Brian Gerst
5ed92a8ab7 x86/vm86: Use the normal pt_regs area for vm86
Change to use the normal pt_regs area to enter and exit vm86
mode.  This is done by increasing the padding at the top of the
stack to make room for the extra vm86 segment slots in the IRET
frame.  It then saves the 32-bit regs in the off-stack vm86
data, and copies in the vm86 regs.  Exiting back to 32-bit mode
does the reverse.  This allows removing the hacks to jump
directly into the exit asm code due to having to change the
stack pointer.  Returning normally from the vm86 syscall and the
exception handlers allows things like ptrace and auditing to work properly.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-5-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:09 +02:00
Brian Gerst
90c6085a24 x86/vm86: Eliminate 'struct kernel_vm86_struct'
Now there is no vm86-specific data left on the kernel stack
while in userspace, except for the 32-bit regs.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:08 +02:00
Brian Gerst
d4ce0f26c7 x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'
Move the non-regs fields to the off-stack data.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:08 +02:00
Brian Gerst
9fda6a0681 x86/vm86: Move vm86 fields out of 'thread_struct'
Allocate a separate structure for the vm86 fields.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-2-git-send-email-brgerst@gmail.com
[ Build fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:07 +02:00
Andy Lutomirski
a5b9e5a2f1 x86/ldt: Make modify_ldt() optional
The modify_ldt syscall exposes a large attack surface and is
unnecessary for modern userspace.  Make it optional.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/a605166a771c343fd64802dece77a903507333bd.1438291540.git.luto@kernel.org
[ Made MATH_EMULATION dependent on MODIFY_LDT_SYSCALL. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:30:45 +02:00
Oleg Nesterov
db087ef69a uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever
The previous change documents that cleanup_return_instances()
can't always detect the dead frames, the stack can grow. But
there is one special case which imho worth fixing:
arch_uretprobe_is_alive() can return true when the stack didn't
actually grow, but the next "call" insn uses the already
invalidated frame.

Test-case:

	#include <stdio.h>
	#include <setjmp.h>

	jmp_buf jmp;
	int nr = 1024;

	void func_2(void)
	{
		if (--nr == 0)
			return;
		longjmp(jmp, 1);
	}

	void func_1(void)
	{
		setjmp(jmp);
		func_2();
	}

	int main(void)
	{
		func_1();
		return 0;
	}

If you ret-probe func_1() and func_2() prepare_uretprobe() hits
the MAX_URETPROBE_DEPTH limit and "return" from func_2() is not
reported.

When we know that the new call is not chained, we can do the
more strict check. In this case "sp" points to the new ret-addr,
so every frame which uses the same "sp" must be dead. The only
complication is that arch_uretprobe_is_alive() needs to know was
it chained or not, so we add the new RP_CHECK_CHAIN_CALL enum
and change prepare_uretprobe() to pass RP_CHECK_CALL only if
!chained.

Note: arch_uretprobe_is_alive() could also re-read *sp and check
if this word is still trampoline_vaddr. This could obviously
improve the logic, but I would like to avoid another
copy_from_user() especially in the case when we can't avoid the
false "alive == T" positives.

Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134028.GA4786@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:38:06 +02:00
Oleg Nesterov
86dcb702e7 uprobes: Add the "enum rp_check ctx" arg to arch_uretprobe_is_alive()
arch/x86 doesn't care (so far), but as Pratyush Anand pointed
out other architectures might want why arch_uretprobe_is_alive()
was called and use different checks depending on the context.
Add the new argument to distinguish 2 callers.

Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134026.GA4779@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:38:06 +02:00
Oleg Nesterov
7b868e4802 uprobes/x86: Reimplement arch_uretprobe_is_alive()
Add the x86 specific version of arch_uretprobe_is_alive()
helper. It returns true if the stack frame mangled by
prepare_uretprobe() is still on stack. So if it returns false,
we know that the probed function has already returned.

We add the new return_instance->stack member and change the
generic code to initialize it in prepare_uretprobe, but it
should be equally useful for other architectures.

TODO: this assumes that the probed application can't use
      multiple stacks (say sigaltstack). We will try to improve
      this logic later.

Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134018.GA4766@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:38:05 +02:00
Mathias Krause
b1c599b8ff x86/cpufeature: Add feature bit for Intel's Silicon Debug CPUID bit
Add a CPUID feature bit for the SDBG (Silicon Debug) CPU feature
found on recent Intel systems starting with Haswell.

Using the IA32_DEBUG_INTERFACE MSR (index C80H) one can at least
detect if SDBG has been enabled by the firmware and if it has
been used or not.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437330403-12102-1-git-send-email-minipli@googlemail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:34:07 +02:00
Ingo Molnar
5b929bd11d Merge branch 'x86/urgent' into x86/asm, before applying dependent patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:23:35 +02:00
Andy Lutomirski
37868fe113 x86/ldt: Make modify_ldt synchronous
modify_ldt() has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:23:23 +02:00
Andy Lutomirski
aa1acff356 x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:23:22 +02:00
Ingo Molnar
1adb9123f9 * Fix an EFI boot issue preventing a Parallels virtual machine from
booting because the upper 32-bits of the EFI memmap pointer were
    being discarded in setup_e820() - Dmitry Skorodumov
 
  * Validate that the "efi" kernel parameter gets used with an argument,
    otherwise we will oops - Ricardo Neri
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVupCGAAoJEC84WcCNIz1VkmMP/jcfvKrgeNjO3qV83qeCSjWL
 BwkQ0BhWCvReQUrdGVTP0gxCIZsZI/RbFueSUekONuZYj9fJiWyTA88uj74SD35G
 Or/dOMhcLbhI0j/zaoZCxTN582+bPdE016/R8pq0uvnsTKJu95dFaNs0XPD5OzGz
 p3we3l6O2BY7NQO0rku+RUJmKRN74q89sAaB+/2v7WCbcONJhiAj0OVQhH1BbyX7
 QAiqxetubgNadLdxc8h2Dqcj3YAUD2yVancP6x4RAEwAcZfjEPuXiHyEH+xGOsfU
 F6r9T/YHHnOyjKUMZP03WV2fXr9ACX/hDj5p5NUkMgQK1hAKY2KtXNUNJIyRSKL5
 alKNX40EG0I2WllA5wYZuIPaGvWRmajfz9YgBivaEMEif0ix0BEeQ/Q0qJGbUDTB
 pSCvkOoJJyqfzXj4ZWp3zUNmJk5zQKw+rHsjthy34QAPEHId32rGwI8Whcdszzgi
 Ytqy6jK/vEnbD3O7KvGCnJNTu+xzfsYX/0wlAiwQs7x+TO4m2MZ0vhC+C1/tDlz4
 YnUqFTnscAZW+nPoNXk+emlvojgcqbII/ziDh8R7WdEBt14e32uHt6Bzxhb10evg
 MEDT86Ur4zffs8hBkKANK+RO5TM4aAIFQk2oROUd7CYjrTeoyyX7QUH1/9t6m8am
 +2nLy5vN//C4QGPB/46g
 =kohF
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

 * Fix an EFI boot issue preventing a Parallels virtual machine from
   booting because the upper 32-bits of the EFI memmap pointer were
   being discarded in setup_e820(). (Dmitry Skorodumov)

 * Validate that the "efi" kernel parameter gets used with an argument,
   otherwise we will oops. (Ricardo Neri)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 09:55:26 +02:00
Viresh Kumar
c8b5db7de6 x86/hpet: Migrate to new set_state interface
Migrate hpet driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

Forward definition of 'hpet_clockevent' wasn't required and so it is
placed after all the callback are defined, to avoid forward declaring
all the callbacks.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: http://lkml.kernel.org/r/8cc9864b6d6342dfac28f270cf69f4cba46fffae.1437042675.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 21:28:25 +02:00
Viresh Kumar
955381dd65 x86/xen/time: Migrate to new set-state interface
Migrate xen driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

Callbacks aren't implemented for modes where we weren't doing anything.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org (moderated list:XEN HYPERVISOR INTERFACE)
Link: http://lkml.kernel.org/r/881eea6e1a3d483cd33e044cd34827cce26a57fd.1437042675.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 21:25:38 +02:00
Viresh Kumar
ca53d434f7 x86/uv/time: Migrate to new set-state interface
Migrate uv driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

We weren't doing anything while switching modes other than in shutdown
mode and so those are not implemented.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/52e04139746222a2e82a96d13953cbc306cfb59b.1437042675.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 21:25:38 +02:00
Viresh Kumar
c2e13cc2ce x86/lguest/timer: Migrate to new set-state interface
Migrate lguest driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

We weren't doing anything while switching modes other than in shutdown
mode and so those are not implemented.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-and-tested-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: linaro-kernel@lists.linaro.org
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: lguest@lists.ozlabs.org (open list:LGUEST)
Link: http://lkml.kernel.org/r/b96f1c308f4523255c5394a4e6e13f2b67685402.1437042675.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 21:25:38 +02:00
Jiang Liu
646c4b7549 x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()
Commit d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical
irqdomain interfaces") introduced a regression which causes
malfunction of interrupt lines.

The reason is that the conversion of mp_check_pin_attr() missed to
update the polarity selection of the interrupt pin with the caller
provided setting and instead uses a stale attribute value. That in
turn results in chosing the wrong interrupt flow handler.

Use the caller supplied setting to configure the pin correctly which
also choses the correct interrupt flow handler.

This restores the original behaviour and on the affected
machine/driver (Surface Pro 3, i2c controller) all IOAPIC IRQ
configuration are identical to v4.1.

Fixes: d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-and-tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-and-tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1438242695-23531-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 21:15:29 +02:00
Jiang Liu
811a4e6fce PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed
Add pci_has_managed_irq(), pci_set_managed_irq(), and
pci_reset_managed_irq() to simplify code.  No functional change.

[bhelgaas: changelog]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 14:13:20 -05:00
Jiang Liu
991de2e590 PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()
To support IOAPIC hotplug, we need to allocate PCI IRQ resources on demand
and free them when not used anymore.

Implement pcibios_alloc_irq() and pcibios_free_irq() to dynamically
allocate and free PCI IRQs.

Remove mp_should_keep_irq(), which is no longer used.

[bhelgaas: changelog]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 14:05:57 -05:00
Daniel Borkmann
485d6511e7 bpf, x86/sparc: show actual number of passes in bpf_jit_dump
When bpf_jit_compile() got split into two functions via commit
f3c2af7ba1 ("net: filter: x86: split bpf_jit_compile()"), bpf_jit_dump()
was changed to always show 0 as number of compiler passes. Change it to
dump the actual number. Also on sparc, we count passes starting from 0,
so add 1 for the debug dump as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30 11:13:21 -07:00
Ricardo Neri
9115c7589b efi: Check for NULL efi kernel parameters
Even though it is documented how to specifiy efi parameters, it is
possible to cause a kernel panic due to a dereference of a NULL pointer when
parsing such parameters if "efi" alone is given:

PANIC: early exception 0e rip 10:ffffffff812fb361 error 0 cr2 0
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-rc1+ #450
[ 0.000000]  ffffffff81fe20a9 ffffffff81e03d50 ffffffff8184bb0f 00000000000003f8
[ 0.000000]  0000000000000000 ffffffff81e03e08 ffffffff81f371a1 64656c62616e6520
[ 0.000000]  0000000000000069 000000000000005f 0000000000000000 0000000000000000
[ 0.000000] Call Trace:
[ 0.000000]  [<ffffffff8184bb0f>] dump_stack+0x45/0x57
[ 0.000000]  [<ffffffff81f371a1>] early_idt_handler_common+0x81/0xae
[ 0.000000]  [<ffffffff812fb361>] ? parse_option_str+0x11/0x90
[ 0.000000]  [<ffffffff81f4dd69>] arch_parse_efi_cmdline+0x15/0x42
[ 0.000000]  [<ffffffff81f376e1>] do_early_param+0x50/0x8a
[ 0.000000]  [<ffffffff8106b1b3>] parse_args+0x1e3/0x400
[ 0.000000]  [<ffffffff81f37a43>] parse_early_options+0x24/0x28
[ 0.000000]  [<ffffffff81f37691>] ? loglevel+0x31/0x31
[ 0.000000]  [<ffffffff81f37a78>] parse_early_param+0x31/0x3d
[ 0.000000]  [<ffffffff81f3ae98>] setup_arch+0x2de/0xc08
[ 0.000000]  [<ffffffff8109629a>] ? vprintk_default+0x1a/0x20
[ 0.000000]  [<ffffffff81f37b20>] start_kernel+0x90/0x423
[ 0.000000]  [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[ 0.000000]  [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
[ 0.000000] RIP 0xffffffff81ba2efc

This panic is not reproducible with "efi=" as this will result in a non-NULL
zero-length string.

Thus, verify that the pointer to the parameter string is not NULL. This is
consistent with other parameter-parsing functions which check for NULL pointers.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-07-30 18:07:11 +01:00
Dmitry Skorodumov
7cc03e4896 x86/efi: Use all 64 bit of efi_memmap in setup_e820()
The efi_info structure stores low 32 bits of memory map
in efi_memmap and high 32 bits in efi_memmap_hi.

While constructing pointer in the setup_e820(), need
to take into account all 64 bit of the pointer.

It is because on 64bit machine the function
efi_get_memory_map() may return full 64bit pointer and before
the patch that pointer was truncated.

The issue is triggered on Parallles virtual machine and
fixed with this patch.

Signed-off-by: Dmitry Skorodumov <sdmitry@parallels.com>
Cc: Denis V. Lunev <den@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-07-30 18:07:10 +01:00
Paolo Bonzini
71ba994c94 KVM: x86: clean/fix memory barriers in irqchip_in_kernel
The memory barriers are trying to protect against concurrent RCU-based
interrupt injection, but the IRQ routing table is not valid at the time
kvm->arch.vpic is written.  Fix this by writing kvm->arch.vpic last.
kvm_destroy_pic then need not set kvm->arch.vpic to NULL; modify it
to take a struct kvm_pic* and reuse it if the IOAPIC creation fails.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-30 16:02:56 +02:00
Daniel Borkmann
2482abb93e ebpf, x86: fix general protection fault when tail call is invoked
With eBPF JIT compiler enabled on x86_64, I was able to reliably trigger
the following general protection fault out of an eBPF program with a simple
tail call, f.e. tracex5 (or a stripped down version of it):

  [  927.097918] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
  [...]
  [  927.100870] task: ffff8801f228b780 ti: ffff880016a64000 task.ti: ffff880016a64000
  [  927.102096] RIP: 0010:[<ffffffffa002440d>]  [<ffffffffa002440d>] 0xffffffffa002440d
  [  927.103390] RSP: 0018:ffff880016a67a68  EFLAGS: 00010006
  [  927.104683] RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000000 RCX: 0000000000000001
  [  927.105921] RDX: 0000000000000000 RSI: ffff88014e438000 RDI: ffff880016a67e00
  [  927.107137] RBP: ffff880016a67c90 R08: 0000000000000000 R09: 0000000000000001
  [  927.108351] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880016a67e00
  [  927.109567] R13: 0000000000000000 R14: ffff88026500e460 R15: ffff880220a81520
  [  927.110787] FS:  00007fe7d5c1f740(0000) GS:ffff880265000000(0000) knlGS:0000000000000000
  [  927.112021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  927.113255] CR2: 0000003e7bbb91a0 CR3: 000000006e04b000 CR4: 00000000001407e0
  [  927.114500] Stack:
  [  927.115737]  ffffc90008cdb000 ffff880016a67e00 ffff88026500e460 ffff880220a81520
  [  927.117005]  0000000100000000 000000000000001b ffff880016a67aa8 ffffffff8106c548
  [  927.118276]  00007ffcdaf22e58 0000000000000000 0000000000000000 ffff880016a67ff0
  [  927.119543] Call Trace:
  [  927.120797]  [<ffffffff8106c548>] ? lookup_address+0x28/0x30
  [  927.122058]  [<ffffffff8113d176>] ? __module_text_address+0x16/0x70
  [  927.123314]  [<ffffffff8117bf0e>] ? is_ftrace_trampoline+0x3e/0x70
  [  927.124562]  [<ffffffff810c1a0f>] ? __kernel_text_address+0x5f/0x80
  [  927.125806]  [<ffffffff8102086f>] ? print_context_stack+0x7f/0xf0
  [  927.127033]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
  [  927.128254]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
  [  927.129461]  [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
  [  927.130654]  [<ffffffff8119ee4a>] trace_call_bpf+0x8a/0x140
  [  927.131837]  [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
  [  927.133015]  [<ffffffff8119f008>] kprobe_perf_func+0x28/0x220
  [  927.134195]  [<ffffffff811a1668>] kprobe_dispatcher+0x38/0x60
  [  927.135367]  [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
  [  927.136523]  [<ffffffff81061400>] kprobe_ftrace_handler+0xf0/0x150
  [  927.137666]  [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
  [  927.138802]  [<ffffffff8117950c>] ftrace_ops_recurs_func+0x5c/0xb0
  [  927.139934]  [<ffffffffa022b0d5>] 0xffffffffa022b0d5
  [  927.141066]  [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
  [  927.142199]  [<ffffffff81174b95>] seccomp_phase1+0x5/0x230
  [  927.143323]  [<ffffffff8102c0a4>] syscall_trace_enter_phase1+0xc4/0x150
  [  927.144450]  [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
  [  927.145572]  [<ffffffff8102c0a4>] ? syscall_trace_enter_phase1+0xc4/0x150
  [  927.146666]  [<ffffffff817f9a9f>] tracesys+0xd/0x44
  [  927.147723] Code: 48 8b 46 10 48 39 d0 76 2c 8b 85 fc fd ff ff 83 f8 20 77 21 83
                       c0 01 89 85 fc fd ff ff 48 8d 44 d6 80 48 8b 00 48 83 f8 00 74
                       0a <48> 8b 40 20 48 83 c0 33 ff e0 48 89 d8 48 8b 9d d8 fd ff
                       ff 4c
  [  927.150046] RIP  [<ffffffffa002440d>] 0xffffffffa002440d

The code section with the instructions that traps points into the eBPF JIT
image of the root program (the one invoking the tail call instruction).

Using bpf_jit_disasm -o on the eBPF root program image:

  [...]
  4e:   mov    -0x204(%rbp),%eax
        8b 85 fc fd ff ff
  54:   cmp    $0x20,%eax               <--- if (tail_call_cnt > MAX_TAIL_CALL_CNT)
        83 f8 20
  57:   ja     0x000000000000007a
        77 21
  59:   add    $0x1,%eax                <--- tail_call_cnt++
        83 c0 01
  5c:   mov    %eax,-0x204(%rbp)
        89 85 fc fd ff ff
  62:   lea    -0x80(%rsi,%rdx,8),%rax  <--- prog = array->prog[index]
        48 8d 44 d6 80
  67:   mov    (%rax),%rax
        48 8b 00
  6a:   cmp    $0x0,%rax                <--- check for NULL
        48 83 f8 00
  6e:   je     0x000000000000007a
        74 0a
  70:   mov    0x20(%rax),%rax          <--- GPF triggered here! fetch of bpf_func
        48 8b 40 20                              [ matches <48> 8b 40 20 ... from above ]
  74:   add    $0x33,%rax               <--- prologue skip of new prog
        48 83 c0 33
  78:   jmpq   *%rax                    <--- jump to new prog insns
        ff e0
  [...]

The problem is that rax has 5a5a5a5a5a5a5a5a, which suggests a tail call
jump to map slot 0 is pointing to a poisoned page. The issue is the following:

lea instruction has a wrong offset, i.e. it should be ...

  lea    0x80(%rsi,%rdx,8),%rax

... but it actually seems to be ...

  lea   -0x80(%rsi,%rdx,8),%rax

... where 0x80 is offsetof(struct bpf_array, prog), thus the offset needs
to be positive instead of negative. Disassembling the interpreter, we btw
similarly do:

  [...]
  c88:  lea     0x80(%rax,%rdx,8),%rax  <--- prog = array->prog[index]
        48 8d 84 d0 80 00 00 00
  c90:  add     $0x1,%r13d
        41 83 c5 01
  c94:  mov     (%rax),%rax
        48 8b 00
  [...]

Now the other interesting fact is that this panic triggers only when things
like CONFIG_LOCKDEP are being used. In that case offsetof(struct bpf_array,
prog) starts at offset 0x80 and in non-CONFIG_LOCKDEP case at offset 0x50.
Reason is that the work_struct inside struct bpf_map grows by 48 bytes in my
case due to the lockdep_map member (which also has CONFIG_LOCK_STAT enabled
members).

Changing the emitter to always use the 4 byte displacement in the lea
instruction fixes the panic on my side. It increases the tail call instruction
emission by 3 more byte, but it should cover us from various combinations
(and perhaps other future increases on related structures).

After patch, disassembly:

  [...]
  9e:   lea    0x80(%rsi,%rdx,8),%rax   <--- CONFIG_LOCKDEP/CONFIG_LOCK_STAT
        48 8d 84 d6 80 00 00 00
  a6:   mov    (%rax),%rax
        48 8b 00
  [...]

  [...]
  9e:   lea    0x50(%rsi,%rdx,8),%rax   <--- No CONFIG_LOCKDEP
        48 8d 84 d6 50 00 00 00
  a6:   mov    (%rax),%rax
        48 8b 00
  [...]

Fixes: b52f00e6a7 ("x86: bpf_jit: implement bpf_tail_call() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 17:02:19 -07:00
Thomas Gleixner
c948c26048 x86/apic: Drop local_irq_save/restore in timer callbacks
These callbacks are called with interrupts disabled from the core
code. Fixup the local caller to disable interrupts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
2015-07-30 00:51:47 +02:00
Viresh Kumar
b23d8e5278 x86/apic: Migrate apic timer to new set_state interface
Migrate apic driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

We weren't doing anything while switching to resume mode and so that
callback isn't implemented.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Bandan Das <bsd@redhat.com>
Link: http://lkml.kernel.org/r/1896ac5989d27f2ac37f4786af9bd537e1921b83.1437042675.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 00:51:47 +02:00
Thomas Gleixner
4b979e4c61 Merge branch 'linus' into irq/core
Pull in upstream fixes before applying conflicting changes
2015-07-30 00:13:24 +02:00
Thomas Gleixner
5054e1e639 x86/pci/intel_mid_pci: Use proper constants for irq polarity
polarity = 0 means active high. Not really intuitive, so people add
comments to it instead of just using a self explaining constant.

Use the IOAPIC_POL_ constants and get rid of those horrible to read
tail comments.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
2015-07-29 21:23:50 +02:00
Andy Shevchenko
0abbdea1e9 x86/pci/intel_mid_pci: Make intel_mid_pci_ops static
This fixes the following sparse warning.
arch/x86/pci/intel_mid_pci.c:265:16: warning: symbol 'intel_mid_pci_ops' was not declared. Should it be static?

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1438161409-4671-4-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-29 21:23:50 +02:00
Andy Shevchenko
2a61c8eaf1 x86/pci/intel_mid_pci: Propagate actual return code
mp_map_gsi_to_irq() returns different codes if it fails.
intel_mid_pci_irq_enable() hides this under -EBUSY.

Return the actual failure code.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1438161409-4671-3-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-29 21:23:50 +02:00
Andy Shevchenko
39d9b77b8d x86/pci/intel_mid_pci: Work around for IRQ0 assignment
On Intel Tangier the MMC host controller is wired up to irq 0. But
several other devices have irq 0 associated as well due to a bogus PCI
configuration.

The first initialized driver will acquire irq 0 and make it
unavailable for other devices. If the sdhci driver is not the first
one it will fail to acquire the interrupt and therefor be non
functional.

Add a quirk to the pci irq enable function which denies irq 0 to
anything else than the MMC host controller driver on Tangier
platforms.

Fixes: 90b9aacf91 (serial: 8250_pci: add Intel Tangier support)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1438161409-4671-2-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-29 21:23:50 +02:00
Frederic Weisbecker
031a7f456a apm32: Fix cputime == jiffies assumption
That code wrongly assumes that cputime_t wraps jiffies_t. Lets use
the correct accessors/mutators.

No real harm now as that code can't be used with full dynticks.

Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc; John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-29 15:44:58 +02:00
Paolo Bonzini
c847fe8895 KVM: x86: remove unnecessary memory barriers for shared MSRs
There is no smp_rmb matching the smp_wmb.  shared_msr_update is called from
hardware_enable, which in turn is called via on_each_cpu.  on_each_cpu
and must imply a read memory barrier (on x86 the rmb is achieved simply
through asm volatile in native_apic_mem_write).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-29 14:27:23 +02:00
Paolo Bonzini
d71ba78834 KVM: move code related to KVM_SET_BOOT_CPU_ID to x86
This is another remnant of ia64 support.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-29 14:27:21 +02:00
Peter Zijlstra
de9e432cb5 atomic: Collapse all atomic_{set,clear}_mask definitions
Move the now generic definitions of atomic_{set,clear}_mask() into
linux/atomic.h to avoid endless and pointless repetition.

Also, provide an atomic_andnot() wrapper for those few archs that can
implement that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-27 14:06:24 +02:00
Peter Zijlstra
e6942b7de2 atomic: Provide atomic_{or,xor,and}
Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-27 14:06:24 +02:00
Peter Zijlstra
7fc1845dd4 x86: Provide atomic_{or,xor,and}
Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-27 14:06:23 +02:00
Linus Torvalds
2579d019ad Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
 "A single fix for the intel cqm perf facility to prevent IPIs from
  interrupt context"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/cqm: Return cached counter value from IRQ context
2015-07-26 11:46:32 -07:00
Linus Torvalds
2800348613 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "This update contains:

   - the manual revert of the SYSCALL32 changes which caused a
     regression

   - a fix for the MPX vma handling

   - three fixes for the ioremap 'is ram' checks.

   - PAT warning fixes

   - a trivial fix for the size calculation of TLB tracepoints

   - handle old EFI structures gracefully

  This also contains a PAT fix from Jan plus a revert thereof.  Toshi
  explained why the code is correct"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/pat: Revert 'Adjust default caching mode translation tables'
  x86/asm/entry/32: Revert 'Do not use R9 in SYSCALL32' commit
  x86/mm: Fix newly introduced printk format warnings
  mm: Fix bugs in region_is_ram()
  x86/mm: Remove region_is_ram() call from ioremap
  x86/mm: Move warning from __ioremap_check_ram() to the call site
  x86/mm/pat, drivers/media/ivtv: Move the PAT warning and replace WARN() with pr_warn()
  x86/mm/pat, drivers/infiniband/ipath: Replace WARN() with pr_warn()
  x86/mm/pat: Adjust default caching mode translation tables
  x86/fpu: Disable dependent CPU features on "noxsave"
  x86/mpx: Do not set ->vm_ops on MPX VMAs
  x86/mm: Add parenthesis for TLB tracepoint size calculation
  efi: Handle memory error structures produced based on old versions of standard
2015-07-26 11:14:04 -07:00