WARNING: arch/x86/mm/built-in.o(.text+0x3a1): Section mismatch in
reference from the function set_pte_phys() to the function
.init.text:spp_getpage()
The function set_pte_phys() references
the function __init spp_getpage().
This is often because set_pte_phys lacks a __init
annotation or the annotation of spp_getpage is wrong.
arch/x86/mm/init_64.c: In function 'early_memtest':
arch/x86/mm/init_64.c:520: warning: passing argument 2 of
'find_e820_area_size' from incompatible pointer type
Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This converts x86, x86-64, and xen to use the new helpers for
smp_call_function() and friends, and adds support for
smp_call_function_single().
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Some Xen hypercalls accept an array of operations to work on. In
general this is because its more efficient for the hypercall to the
work all at once rather than as separate hypercalls (even batched as a
multicall).
This patch adds a mechanism (xen_mc_extend_args()) to allocate more
argument space to the last-issued multicall, in order to extend its
argument list.
The user of this mechanism is xen/mmu.c, which uses it to extend the
args array of mmu_update. This is particularly valuable when doing
the update for a large mprotect, which goes via
ptep_modify_prot_commit(), but it also manages to batch updates to
pgd/pmds as well.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Xen has a pte update function which will update a pte while preserving
its accessed and dirty bits. This means that ptep_modify_prot_start() can be
implemented as a simple read of the pte value. The hardware may
update the pte in the meantime, but ptep_modify_prot_commit() updates it while
preserving any changes that may have happened in the meantime.
The updates in ptep_modify_prot_commit() are batched if we're currently in lazy
mmu mode.
The mmu_update hypercall can take a batch of updates to perform, but
this code doesn't make particular use of that feature, in favour of
using generic multicall batching to get them all into the hypervisor.
The net effect of this is that each mprotect pte update turns from two
expensive trap-and-emulate faults into they hypervisor into a single
hypercall whose cost is amortized in a batched multicall.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds paravirt-ops hooks in pv_mmu_ops for ptep_modify_prot_start and
ptep_modify_prot_commit. This allows the hypervisor-specific backends to
implement these in some more efficient way.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: Remove now unused structs from kvm_para.h
x86: KVM guest: Use the paravirt clocksource structs and functions
KVM: Make kvm host use the paravirt clocksource structs
x86: Make xen use the paravirt clocksource structs and functions
x86: Add structs and functions for paravirt clocksource
KVM: VMX: Fix host msr corruption with preemption enabled
KVM: ioapic: fix lost interrupt when changing a device's irq
KVM: MMU: Fix oops on guest userspace access to guest pagetable
KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
KVM: MMU: Fix rmap_write_protect() hugepage iteration bug
KVM: close timer injection race window in __vcpu_run
KVM: Fix race between timer migration and vcpu migration
This patch updates the kvm host code to use the pvclock structs
and functions, thereby making it compatible with Xen.
The patch also fixes an initialization bug: on SMP systems the
per-cpu has two different locations early at boot and after CPU
bringup. kvmclock must take that in account when registering the
physical address within the host.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch updates the kvm host code to use the pvclock structs.
It also makes the paravirt clock compatible with Xen.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch updates the xen guest to use the pvclock structs
and helper functions.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds structs for the paravirt clocksource ABI
used by both xen and kvm (pvclock-abi.h).
It also adds some helper functions to read system time and
wall clock time from a paravirtual clocksource (pvclock.[ch]).
They are based on the xen code. They are enabled using
CONFIG_PARAVIRT_CLOCK.
Subsequent patches of this series will put the code in use.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used. xen-unstable has now officially dropped
non-PAE support. Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As suggested by Ingo, remove all references to tsc from init/calibrate.c
TSC is x86 specific, and using tsc in variable names in a generic file should
be avoided. lpj_tsc is now called lpj_fine, since it is related to fine tuning
of lpj value. Also tsc_rate_* is called timer_rate_*
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
BUG: using smp_processor_id() in preemptible [00000000] code: oprofiled/27301
caller is nmi_shutdown+0x11/0x60
Pid: 27301, comm: oprofiled Not tainted 2.6.26-rc7 #25
[<c028a90d>] debug_smp_processor_id+0xbd/0xc0
[<c045fba1>] nmi_shutdown+0x11/0x60
[<c045dd4a>] oprofile_shutdown+0x2a/0x60
Note that we don't need this for the other functions, since they are all
called with on_each_cpu() (which disables preemption for us anyway).
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: oprofile-list@lists.sf.net
Cc: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
... to strip down loop body in reserve_memtype.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
... and move last debug message out of locked section.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- add BUG statement to catch invalid start and end parameters
- No need to track the actual type in both req_type and actual_type --
keep req_type unchanged.
- removed (IMHO) superfluous comments
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- parse/ml => entry (within list_for_each and friends)
- new_entry => new
- ret_type => new_type (to avoid confusion with req_type)
(... to make it more readable...)
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switching msrs can occur either synchronously as a result of calls to
the msr management functions (usually in response to the guest touching
virtualized msrs), or asynchronously when preempting a kvm thread that has
guest state loaded. If we're unlucky enough to have the two at the same
time, host msrs are corrupted and the machine goes kaput on the next syscall.
Most easily triggered by Windows Server 2008, as it does a lot of msr
switching during bootup.
Signed-off-by: Avi Kivity <avi@qumranet.com>
KVM has a heuristic to unshadow guest pagetables when userspace accesses
them, on the assumption that most guests do not allow userspace to access
pagetables directly. Unfortunately, in addition to unshadowing the pagetables,
it also oopses.
This never triggers on ordinary guests since sane OSes will clear the
pagetables before assigning them to userspace, which will trigger the flood
heuristic, unshadowing the pagetables before the first userspace access. One
particular guest, though (Xenner) will run the kernel in userspace, triggering
the oops. Since the heuristic is incorrect in this case, we can simply
remove it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed
guests properly. It will instantiate two 2MB sptes pointing to the same
physical 2MB page when a guest large pte update is trapped.
Instead of duplicating code to handle this, disallow directory level
updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes
emulating one guest 4MB pte can be correctly created by the page fault
handling path.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
rmap_next() does not work correctly after rmap_remove(), as it expects
the rmap chains not to change during iteration. Fix (for now) by restarting
iteration from the beginning.
Signed-off-by: Avi Kivity <avi@qumranet.com>
If a timer fires after kvm_inject_pending_timer_irqs() but before
local_irq_disable() the code will enter guest mode and only inject such
timer interrupt the next time an unrelated event causes an exit.
It would be simpler if the timer->pending irq conversion could be done
with IRQ's disabled, so that the above problem cannot happen.
For now introduce a new vcpu requests bit to cancel guest entry.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
A guest vcpu instance can be scheduled to a different physical CPU
between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable().
If that happens, the timer will only be migrated to the current pCPU on
the next exit, meaning that guest LAPIC timer event can be delayed until
a host interrupt is triggered.
Fix it by cancelling guest entry if any vcpu request is pending. This
has the side effect of nicely consolidating vcpu->requests checks.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
fix:
arch/x86/kernel/tsc_32.c: In function ‘tsc_init':
arch/x86/kernel/tsc_32.c:421: error: ‘lpj_tsc' undeclared (first use in this function)
arch/x86/kernel/tsc_32.c:421: error: (Each undeclared identifier is reported only once
arch/x86/kernel/tsc_32.c:421: error: for each function it appears in.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On the x86 platform we can use the value of tsc_khz computed during tsc
calibration to calculate the loops_per_jiffy value. Its very important
to keep the error in lpj values to minimum as any error in that may
result in kernel panic in check_timer. In virtualization environment, On
a highly overloaded host the guest delay calibration may sometimes
result in errors beyond the ~50% that timer_irq_works can handle,
resulting in the guest panicking.
Does some formating changes to lpj_setup code to now have a single
printk to print the bogomips value.
We do this only for the boot processor because the AP's can have
different base frequencies or the BIOS might boot a AP at a different
frequency.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>