Commit graph

14,571 commits

Author SHA1 Message Date
Igor Mammedov
d198d49914 xen: x86_32: do not enable iterrupts when returning from exception in interrupt context
If vmalloc page_fault happens inside of interrupt handler with interrupts
disabled then on exit path from exception handler when there is no pending
interrupts, the following code (arch/x86/xen/xen-asm_32.S:112):

	cmpw $0x0001, XEN_vcpu_info_pending(%eax)
	sete XEN_vcpu_info_mask(%eax)

will enable interrupts even if they has been previously disabled according to
eflags from the bounce frame (arch/x86/xen/xen-asm_32.S:99)

	testb $X86_EFLAGS_IF>>8, 8+1+ESP_OFFSET(%esp)
	setz XEN_vcpu_info_mask(%eax)

Solution is in setting XEN_vcpu_info_mask only when it should be set
according to
	cmpw $0x0001, XEN_vcpu_info_pending(%eax)
but not clearing it if there isn't any pending events.

Reproducer for bug is attached to RHBZ 707552

CC: stable@kernel.org
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-01 12:54:42 -04:00
David Vrabel
d312ae878b xen: use maximum reservation to limit amount of usable RAM
Use the domain's maximum reservation to limit the amount of extra RAM
for the memory balloon. This reduces the size of the pages tables and
the amount of reserved low memory (which defaults to about 1/32 of the
total RAM).

On a system with 8 GiB of RAM with the domain limited to 1 GiB the
kernel reports:

Before:

Memory: 627792k/4472000k available

After:

Memory: 549740k/11132224k available

A increase of about 76 MiB (~1.5% of the unused 7 GiB).  The reserved
low memory is also reduced from 253 MiB to 32 MiB.  The total
additional usable RAM is 329 MiB.

For dom0, this requires at patch to Xen ('x86: use 'dom0_mem' to limit
the number of pages for dom0') (c/s 23790)

CC: stable@kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-01 09:41:40 -04:00
Andrey Vagin
20afc60f89 x86, perf: Check that current->mm is alive before getting user callchain
An event may occur when an mm is already released.

I added an event in dequeue_entity() and caught a panic with
the following backtrace:

[  434.421110] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[  434.421258] IP: [<ffffffff810464ac>] __get_user_pages_fast+0x9c/0x120
...
[  434.421258] Call Trace:
[  434.421258]  [<ffffffff8101ae81>] copy_from_user_nmi+0x51/0xf0
[  434.421258]  [<ffffffff8109a0d5>] ? sched_clock_local+0x25/0x90
[  434.421258]  [<ffffffff8101b048>] perf_callchain_user+0x128/0x170
[  434.421258]  [<ffffffff811154cd>] ? __perf_event_header__init_id+0xed/0x100
[  434.421258]  [<ffffffff81116690>] perf_prepare_sample+0x200/0x280
[  434.421258]  [<ffffffff81118da8>] __perf_event_overflow+0x1b8/0x290
[  434.421258]  [<ffffffff81065240>] ? tg_shares_up+0x0/0x670
[  434.421258]  [<ffffffff8104fe1a>] ? walk_tg_tree+0x6a/0xb0
[  434.421258]  [<ffffffff81118f44>] perf_swevent_overflow+0xc4/0xf0
[  434.421258]  [<ffffffff81119150>] do_perf_sw_event+0x1e0/0x250
[  434.421258]  [<ffffffff81119204>] perf_tp_event+0x44/0x70
[  434.421258]  [<ffffffff8105701f>] ftrace_profile_sched_block+0xdf/0x110
[  434.421258]  [<ffffffff8106121d>] dequeue_entity+0x2ad/0x2d0
[  434.421258]  [<ffffffff810614ec>] dequeue_task_fair+0x1c/0x60
[  434.421258]  [<ffffffff8105818a>] dequeue_task+0x9a/0xb0
[  434.421258]  [<ffffffff810581e2>] deactivate_task+0x42/0xe0
[  434.421258]  [<ffffffff814bc019>] thread_return+0x191/0x808
[  434.421258]  [<ffffffff81098a44>] ? switch_task_namespaces+0x24/0x60
[  434.421258]  [<ffffffff8106f4c4>] do_exit+0x464/0x910
[  434.421258]  [<ffffffff8106f9c8>] do_group_exit+0x58/0xd0
[  434.421258]  [<ffffffff8106fa57>] sys_exit_group+0x17/0x20
[  434.421258]  [<ffffffff8100b202>] system_call_fastpath+0x16/0x1b

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314693156-24131-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-31 15:56:31 +02:00
Duncan Sands
3b217116ed KVM: Fix instruction size issue in pvclock scaling
Commit de2d1a524e ("KVM: Fix register corruption in pvclock_scale_delta")
introduced a mul instruction that may have only a memory operand; the
assembler therefore cannot select the correct size:

   pvclock.s:229: Error: no instruction mnemonic suffix given and no register
operands; can't size instruction

In this example the assembler is:

         #APP
         mul -48(%rbp) ; shrd $32, %rdx, %rax
         #NO_APP

A simple solution is to use mulq.

Signed-off-by: Duncan Sands <baldrick@free.fr>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-08-30 14:42:30 +03:00
Jeremy Fitzhardinge
61e2cd0acc x86, cmpxchg: Use __compiletime_error() to make usage messages a bit nicer
Use __compiletime_error() to produce a compile-time error rather than
link-time, where available.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 17:20:40 -07:00
Jeremy Fitzhardinge
229855d6f3 x86, ticketlock: Make __ticket_spin_trylock common
Make trylock code common regardless of ticket size.

(Also, rename arch_spinlock.slock to head_tail.)

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 13:46:34 -07:00
Jeremy Fitzhardinge
2994488fe5 x86, ticketlock: Convert __ticket_spin_lock to use xadd()
Convert the two variants of __ticket_spin_lock() to use xadd(), which
has the effect of making them identical, so remove the duplicate function.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 13:46:07 -07:00
Jeremy Fitzhardinge
c576a3ea90 x86, ticketlock: Convert spin loop to C
The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 13:45:43 -07:00
Jeremy Fitzhardinge
84eb950db1 x86, ticketlock: Clean up types and accessors
A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 13:45:19 -07:00
Jeremy Fitzhardinge
8b8bc2f731 x86: Use xadd helper more widely
This covers the trivial cases from open-coded xadd to the xadd macros.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 13:44:12 -07:00
Jeremy Fitzhardinge
433b352061 x86: Add xadd helper macro
Add a common xadd implementation.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 13:42:20 -07:00
Jeremy Fitzhardinge
e9826380d8 x86, cmpxchg: Unify cmpxchg into cmpxchg.h
Everything that's actually common between 32 and 64-bit is moved into
cmpxchg.h.

xchg/cmpxchg will fail with a link error if they're passed an
unsupported size (which includes 64-bit args on 32-bit systems).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 13:42:10 -07:00
Jeremy Fitzhardinge
00a41546e8 x86, cmpxchg: Move 64-bit set64_bit() to match 32-bit
Reduce arbitrary differences between 32 and 64 bits.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 13:41:40 -07:00
Jeremy Fitzhardinge
416185bd5a x86, cmpxchg: Move 32-bit __cmpxchg_wrong_size to match 64 bit.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 13:41:17 -07:00
Jeremy Fitzhardinge
4009338d62 x86, cmpxchg: <linux/alternative.h> has LOCK_PREFIX
Not <linux/bitops.h>.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29 13:40:25 -07:00
Greg Kroah-Hartman
6eafa4604c Merge 3.1-rc4 into staging-next
This resolves a conflict with:
	drivers/staging/brcm80211/brcmsmac/types.h

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29 08:47:46 -07:00
NeilBrown
f5b9409973 All Arch: remove linkage for sys_nfsservctl system call
The nfsservctl system call is now gone, so we should remove all
linkage for it.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-26 15:09:58 -07:00
Feng Tang
efe3ed9837 x86/mrst: Add platform data for Max3110 devices
Those info will be used when spi controller driver setup
max3110 as a slave device

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-26 11:01:14 -07:00
Kirill A. Shutemov
a94cc4e6c0 sfi: table irq 0xFF means 'no interrupt'
According to the SFI specification irq number 0xFF means device has no
interrupt or interrupt attached via GPIO.

Currently, we don't handle this special case and set irq field in
*_board_info structs to 255.  It leads to confusion in some drivers.
Accelerometer driver tries to register interrupt 255, fails and prints
"Cannot get IRQ" to dmesg.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-26 09:03:29 -07:00
K. Y. Srinivasan
5289d3d160 Staging: hv: vmbus: Retry vmbus_post_msg() before giving up
The function hv_post_msg() can fail because of transient resource
conditions. It may be useful to retry the operation.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-25 15:23:19 -07:00
Andy Lutomirski
b4ca46e4e8 x86-32: Fix boot with CONFIG_X86_INVD_BUG
entry_32.S contained a hardcoded alternative instruction entry, and the
format changed in commit 59e97e4d6f ("x86: Make alternative
instruction pointers relative").

Replace the hardcoded entry with the altinstruction_entry macro.  This
fixes the 32-bit boot with CONFIG_X86_INVD_BUG=y.

Reported-and-tested-by: Arnaud Lacombe <lacombar@gmail.com>
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 13:27:14 -07:00
Tejun Heo
cbbfa38fcb mtrr: fix UP breakage caused during switch to stop_machine
While removing custom rendezvous code and switching to stop_machine,
commit 192d885742 ("x86, mtrr: use stop_machine APIs for doing MTRR
rendezvous") completely dropped mtrr setting code on !CONFIG_SMP
breaking MTRR settting on UP.

Fix it by removing the incorrect CONFIG_SMP.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Anders Eriksson <aeriksson@fastmail.fm>
Tested-and-acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 11:02:29 -07:00
Andy Lutomirski
7e49b1c8c6 x86-64, unistd: Remove bogus __IGNORE_getcpu
The change:

commit fce8dc0642
Author: Andy Lutomirski <luto@mit.edu>
Date:   Wed Aug 10 11:15:31 2011 -0400

    x86-64: Wire up getcpu syscall

added getcpu as a real syscall, so we shouldn't ignore it any more.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/b4cb60ef45db3a675a0e2b9d51bcb022b0a9ab9c.1314195481.git.luto@mit.edu
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-24 10:23:51 -07:00
Jeremy Fitzhardinge
f1c39625d6 xen: use non-tracing preempt in xen_clocksource_read()
The tracing code used sched_clock() to get tracing timestamps, which
ends up calling xen_clocksource_read().  xen_clocksource_read() must
disable preemption, but if preemption tracing is enabled, this results
in infinite recursion.

I've only noticed this when boot-time tracing tests are enabled, but it
seems like a generic bug.  It looks like it would also affect
kvm_clocksource_read().

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
2011-08-24 09:54:24 -07:00
Linus Torvalds
14c62e78dc Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32, vdso: On system call restart after SYSENTER, use int $0x80
  x86, UV: Remove UV delay in starting slave cpus
  x86, olpc: Wait for last byte of EC command to be accepted
2011-08-23 18:09:08 -07:00
H. Peter Anvin
7ca0758cdb x86-32, vdso: On system call restart after SYSENTER, use int $0x80
When we enter a 32-bit system call via SYSENTER or SYSCALL, we shuffle
the arguments to match the int $0x80 calling convention.  This was
probably a design mistake, but it's what it is now.  This causes
errors if the system call as to be restarted.

For SYSENTER, we have to invoke the instruction from the vdso as the
return address is hardcoded.  Accordingly, we can simply replace the
jump in the vdso with an int $0x80 instruction and use the slower
entry point for a post-restart.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFztZ=r5wa0x26KJQxvZOaQq8s2v3u50wCyJcA-Sc4g8gQ@mail.gmail.com
Cc: <stable@kernel.org>
2011-08-23 16:20:10 -07:00
Zhao Jin
c812d8f7ad x86, mm, trivial: Remove unnecessary get_order() in free_thread_info()
Because THREAD_SIZE is defined as PAGE_SIZE << THREAD_ORDER on x86, the
call of get_order(THREAD_SIZE) can be replaced with THREAD_ORDER.

Signed-off-by: Zhao Jin <cronozhj@gmail.com>
Link: http://lkml.kernel.org/r/4E4FB5A9.700@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-23 21:49:35 +02:00
Jesper Juhl
7a4e8beabc x86, cleanup: Remove unneeded version.h include from arch/x86/
It was pointed out by 'make versioncheck' that the include of
linux/version.h is not needed in arch/x86/mm/mmio-mod.c .
This patch removes it.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1108012305570.31999@swampdragon.chaosbits.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-22 15:35:43 -07:00
Arun Thomas
be604e695f x86, cpu: Add cpufeature flag for PCIDs
This patch add a flag for Process-Context Identifiers (PCIDs) aka
Address Space Identifiers (ASIDs) aka Tagged TLB support.

Signed-off-by: Arun Thomas <arun.thomas@gmail.com>
Link: http://lkml.kernel.org/r/1313782943-3898-1-git-send-email-arun.thomas@gmail.com
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-22 13:52:32 -07:00
Linus Torvalds
4762e252f4 Merge branch 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/tracing: Fix tracing config option properly
  xen: Do not enable PV IPIs when vector callback not present
  xen/x86: replace order-based range checking of M2P table by linear one
  xen: xen-selfballoon.c needs more header files
2011-08-22 11:25:44 -07:00
Jeremy Fitzhardinge
60c5f08e15 xen/tracing: Fix tracing config option properly
Steven Rostedt says we should use CONFIG_EVENT_TRACING.

Cc:Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-08-22 11:28:33 -04:00
Stefano Stabellini
3c05c4bed4 xen: Do not enable PV IPIs when vector callback not present
Fix regression for HVM case on older (<4.1.1) hypervisors caused by

  commit 99bbb3a84a
  Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
  Date:   Thu Dec 2 17:55:10 2010 +0000

    xen: PV on HVM: support PV spinlocks and IPIs

This change replaced the SMP operations with event based handlers without
taking into account that this only works when the hypervisor supports
callback vectors. This causes unexplainable hangs early on boot for
HVM guests with more than one CPU.

BugLink: http://bugs.launchpad.net/bugs/791850

CC: stable@kernel.org
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-and-Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-08-22 11:28:09 -04:00
Kevin Winchester
2f3aa7a06f x86: jump_label: arch_jump_label_text_poke_early: add missing __init
arch_jump_label_text_poke_early calls text_poke_early, which is
an __init function.  Thus arch_jump_label_text_poke_early should
be the same.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: jbaron@redhat.com
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1313539478-30303-1-git-send-email-kjwinchester@gmail.com

[ Use __init_or_module instead of __init ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:04 -04:00
Linus Torvalds
0c3bef6128 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: OF: Don't crash when bridge parent is NULL.
  PCI: export pcie_bus_configure_settings symbol
  PCI: code and comments cleanup
  PCI: make cardbus-bridge resources optional
  PCI: make SRIOV resources optional
  PCI : ability to relocate assigned pci-resources
  PCI: honor child buses add_size in hot plug configuration
  PCI: Set PCI-E Max Payload Size on fabric
2011-08-19 10:02:37 -07:00
Ingo Molnar
8bc84f8731 Merge branch 'perf/urgent' into perf/core
Merge reason: add the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-18 21:56:47 +02:00
Jan Beulich
ccbcdf7cf1 xen/x86: replace order-based range checking of M2P table by linear one
The order-based approach is not only less efficient (requiring a shift
and a compare, typical generated code looking like this

	mov	eax, [machine_to_phys_order]
	mov	ecx, eax
	shr	ebx, cl
	test	ebx, ebx
	jnz	...

whereas a direct check requires just a compare, like in

	cmp	ebx, [machine_to_phys_nr]
	jae	...

), but also slightly dangerous in the 32-on-64 case - the element
address calculation can wrap if the next power of two boundary is
sufficiently far away from the actual upper limit of the table, and
hence can result in user space addresses being accessed (with it being
unknown what may actually be mapped there).

Additionally, the elimination of the mistaken use of fls() here (should
have been __fls()) fixes a latent issue on x86-64 that would trigger
if the code was run on a system with memory extending beyond the 44-bit
boundary.

CC: stable@kernel.org
Signed-off-by: Jan Beulich <jbeulich@novell.com>
[v1: Based on Jeremy's feedback]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-08-17 10:26:48 -04:00
Robert Richter
298557db42 oprofile, x86: Fix overflow and warning (commit 1d12d35)
Following fixes for:

 1d12d35 oprofile, x86: Convert memory allocation to static array

Fix potential buffer overflow.

Fix the following warning:

 arch/x86/oprofile/op_model_ppro.c: In function ‘ppro_check_ctrs’:
 arch/x86/oprofile/op_model_ppro.c:143: warning: label ‘out’ defined but not used

Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2011-08-16 23:51:00 +02:00
Linus Torvalds
ab7e2dbf9b Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: uses TASKSTATS, depends on NET
  KVM: fix TASK_DELAY_ACCT kconfig warning
2011-08-16 11:14:44 -07:00
Randy Dunlap
df3d8ae1f8 KVM: uses TASKSTATS, depends on NET
CONFIG_TASKSTATS just had a change to use netlink, including
a change to "depends on NET".  Since "select" does not follow
dependencies, KVM also needs to depend on NET to prevent build
errors when CONFIG_NET is not enabled.

Sample of the reported "undefined reference" build errors:

taskstats.c:(.text+0x8f686): undefined reference to `nla_put'
taskstats.c:(.text+0x8f721): undefined reference to `nla_reserve'
taskstats.c:(.text+0x8f8fb): undefined reference to `init_net'
taskstats.c:(.text+0x8f905): undefined reference to `netlink_unicast'
taskstats.c:(.text+0x8f934): undefined reference to `kfree_skb'
taskstats.c:(.text+0x8f9e9): undefined reference to `skb_clone'
taskstats.c:(.text+0x90060): undefined reference to `__alloc_skb'
taskstats.c:(.text+0x901e9): undefined reference to `skb_put'
taskstats.c:(.init.text+0x4665): undefined reference to `genl_register_family'
taskstats.c:(.init.text+0x4699): undefined reference to `genl_register_ops'
taskstats.c:(.init.text+0x4710): undefined reference to `genl_unregister_ops'
taskstats.c:(.init.text+0x471c): undefined reference to `genl_unregister_family'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-08-16 19:00:41 +03:00
H. Peter Anvin
fab1167c46 x86, vsyscall: Add missing <asm/fixmap.h> to arch/x86/mm/fault.c
arch/x86/mm/fault.c now depend on having the symbol VSYSCALL_START
defined, which is best handled by including <asm/fixmap.h> (it isn't
unreasonable we may want other fixed addresses in this file in the
future, and so it is cleaner than including <asm/vsyscall.h>
directly.)

This addresses an x86-64 allnoconfig build failure.  On other
configurations it was masked by an indirect path:

<asm/smp.h> -> <asm/apic.h> -> <asm/fixmap.h> -> <asm/vsyscall.h>

... however, the first such include is conditional on CONFIG_X86_LOCAL_APIC.

Originally-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/CA%2B55aFxsOMc9=p02r8-QhJ=h=Mqwckk4_Pnx9LQt5%2BfqMp_exQ@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-16 08:04:02 -07:00
Randy Dunlap
cedf03bd9a x86: fix mm/fault.c build
arch/x86/mm/fault.c needs to include asm/vsyscall.h to fix a
build error:

  arch/x86/mm/fault.c: In function '__bad_area_nosemaphore':
  arch/x86/mm/fault.c:728: error: 'VSYSCALL_START' undeclared (first use in this function)

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-15 19:10:50 -07:00
Peter Zijlstra
7fdba1ca10 perf, x86: Avoid kfree() in CPU_STARTING
On -rt kfree() can schedule, but CPU_STARTING is before the CPU is
fully up and running. These are contradictory, so avoid it. Instead
push the kfree() to CPU_ONLINE where we're free to schedule.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-kwd4j6ayld5thrscvaxgjquv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-14 11:53:02 +02:00
Linus Torvalds
06e727d2a5 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip:
  x86-64: Rework vsyscall emulation and add vsyscall= parameter
  x86-64: Wire up getcpu syscall
  x86: Remove unnecessary compile flag tweaks for vsyscall code
  x86-64: Add vsyscall:emulate_vsyscall trace event
  x86-64: Add user_64bit_mode paravirt op
  x86-64, xen: Enable the vvar mapping
  x86-64: Work around gold bug 13023
  x86-64: Move the "user" vsyscall segment out of the data segment.
  x86-64: Pad vDSO to a page boundary
2011-08-12 20:46:24 -07:00
Linus Torvalds
1d229d54db Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf symbols: Check '/tmp/perf-' symbol file ownership
  perf sched: Usage leftover from trace -> script rename
  perf sched: Do not delete session object prematurely
  perf tools: Check $HOME/.perfconfig ownership
  perf, x86: Add model 45 SandyBridge support
  perf tools: Add support to install perf python extension
  perf tools: do not look at ./config for configuration
  perf tools: Make clean leaves some files
  perf lock: Dropping unsupported ':r' modifier
  perf probe: Fix coredump introduced by probe module option
  jump label: Reduce the cycle count by changing the link order
  perf report: Use ui__warning in some more places
  perf python: Add PERF_RECORD_{LOST,READ,SAMPLE} routine tables
  perf evlist: Introduce 'disable' method
  trace events: Update version number reference to new 3.x scheme for EVENT_POWER_TRACING_DEPRECATED
  perf buildid-cache: Zero out buffer of filenames when adding/removing buildid
2011-08-11 09:03:48 -07:00
Andy Lutomirski
3ae36655b9 x86-64: Rework vsyscall emulation and add vsyscall= parameter
There are three choices:

vsyscall=native: Vsyscalls are native code that issues the
corresponding syscalls.

vsyscall=emulate (default): Vsyscalls are emulated by instruction
fault traps, tested in the bad_area path.  The actual contents of
the vsyscall page is the same as the vsyscall=native case except
that it's marked NX.  This way programs that make assumptions about
what the code in the page does will not be confused when they read
that code.

vsyscall=none: Trying to execute a vsyscall will segfault.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/8449fb3abf89851fd6b2260972666a6f82542284.1312988155.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-10 19:26:46 -05:00
Andy Lutomirski
fce8dc0642 x86-64: Wire up getcpu syscall
getcpu is available as a vdso entry and an emulated vsyscall.
Programs that for some reason don't want to use the vdso should
still be able to call getcpu without relying on the slow emulated
vsyscall.  It costs almost nothing to expose it as a real syscall.

We also need this for the following patch in vsyscall=native mode.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/6b19f55bdb06a0c32c2fa6dba9b6f222e1fde999.1312988155.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-10 19:26:46 -05:00
Andy Lutomirski
f3fb5b7bb7 x86: Remove unnecessary compile flag tweaks for vsyscall code
As of commit 98d0ac38ca
Author: Andy Lutomirski <luto@mit.edu>
Date:   Thu Jul 14 06:47:22 2011 -0400

    x86-64: Move vread_tsc and vread_hpet into the vDSO

user code no longer directly calls into code in arch/x86/kernel/, so
we don't need compile flag hacks to make it safe.  All vdso code is
in the vdso directory now.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/835cd05a4c7740544d09723d6ba48f4406f9826c.1312988155.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-10 18:55:29 -05:00
Mathias Krause
66be895158 crypto: sha1 - SSSE3 based SHA1 implementation for x86-64
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).

Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.

Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.

With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.

Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).

Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:

 frame size    sha1-generic     sha1-ssse3    delta
    64 byte     37.5 MBit/s    37.5 MBit/s     0.0%
   128 byte     56.3 MBit/s    62.5 MBit/s   +11.0%
   256 byte     87.5 MBit/s   100.0 MBit/s   +14.3%
   512 byte    131.3 MBit/s   150.0 MBit/s   +14.2%
  1024 byte    162.5 MBit/s   193.8 MBit/s   +19.3%
  1280 byte    175.0 MBit/s   212.5 MBit/s   +21.4%
  1420 byte    175.0 MBit/s   218.7 MBit/s   +25.0%
  1518 byte    150.0 MBit/s   181.2 MBit/s   +20.8%

The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-08-10 19:00:29 +08:00
Stephen Rothwell
5cdd174fea x86, amd: Include elf.h explicitly, prepare the code for the module.h split
When the moduleu.h splitting tree is merged to the latest
tip:x86/cpu tree, the x86_64 allmodconfig build fails like this:

 arch/x86/kernel/cpu/amd.c: In function 'bsp_init_amd':
 arch/x86/kernel/cpu/amd.c:437:3: error: 'va_align' undeclared (first use in this function)
 arch/x86/kernel/cpu/amd.c:438:23: error: 'ALIGN_VA_32' undeclared (first use in this function)
 arch/x86/kernel/cpu/amd.c:438:37: error: 'ALIGN_VA_64' undeclared (first use in this function)

This is caused by the module.h split up intreacting with commit
dfb09f9b7a ("x86, amd: Avoid cache aliasing penalties on AMD
family 15h") from the tip:x86/cpu tree.

I have added the following patch for today (this, or something
similar, could be applied to the tip tree directly - the
export.h include below was added by the module.h splitup).

So include elf.h to use va_align and remove this implicit
dependency on module.h doing it for us.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/20110810114956.238d66772883636e3040d29f@canb.auug.org.au
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-10 10:14:09 +02:00
Konrad Rzeszutek Wilk
10fe570fc1 Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
We don' use it anymore and there are more false positives.

This reverts commit fc25151d9a.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-08-09 13:04:08 -04:00