Commit graph

602118 commits

Author SHA1 Message Date
Paolo Bonzini
add6a0cd1c KVM: MMU: try to fix up page faults before giving up
The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.

This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
and fixup_user_fault together help supporting it.  The patch also supports
VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
reference counting.

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Neo Jia <cjia@nvidia.com>
Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 14:41:26 +02:00
Paolo Bonzini
92176a8ede KVM: MMU: prepare to support mapping of VM_IO and VM_PFNMAP frames
Handle VM_IO like VM_PFNMAP, as is common in the rest of Linux; extract
the formula to convert hva->pfn into a new function, which will soon
gain more capabilities.

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 14:41:03 +02:00
David Hildenbrand
5ffe466cd3 KVM: s390: inject PER i-fetch events on applicable icpts
In case we have to emuluate an instruction or part of it (instruction,
partial instruction, operation exception), we have to inject a PER
instruction-fetching event for that instruction, if hardware told us to do
so.

In case we retry an instruction, we must not inject the PER event.

Please note that we don't filter the events properly yet, so guest
debugging will be visible for the guest.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-07-05 12:02:56 +02:00
Marc Zyngier
6c41a413fd arm/arm64: Get rid of KERN_TO_HYP
We have both KERN_TO_HYP and kern_hyp_va, which do the exact same
thing. Let's standardize on the latter.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
eac378a9ce arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
This is more of a safety measure than anything else: If we end-up
with an idmap page that intersect with the range picked for the
the HYP VA space, abort the KVM setup, as it is unsafe to go
further.

I cannot imagine it happening on 64bit (we have a mechanism to
work around it), but could potentially occur on a 32bit system with
the kernel loaded high enough in memory so that in conflicts with
the kernel VA.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
f7bec68d2f arm/arm64: KVM: Prune unused #defines
We can now remove a number of dead #defines, thanks to the trampoline
code being gone.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
e537ecd7ef arm: KVM: Allow hyp teardown
So far, KVM was getting in the way of kexec on 32bit (and the arm64
kexec hackers couldn't be bothered to fix it on 32bit...).

With simpler page tables, tearing KVM down becomes very easy, so
let's just do it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
cd602a37e8 arm: KVM: Simplify HYP init
Just like for arm64, we can now make the HYP setup a lot simpler,
and we can now initialise it in one go (instead of the two
phases we currently have).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
26781f9ce1 arm/arm64: KVM: Kill free_boot_hyp_pgd
There is no way to free the boot PGD, because it doesn't exist
anymore as a standalone entity.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
12fda8123d arm/arm64: KVM: Drop boot_pgd
Since we now only have one set of page tables, the concept of
boot_pgd is useless and can be removed. We still keep it as
an element of the "extended idmap" thing.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
3421e9d88d arm64: KVM: Simplify HYP init/teardown
Now that we only have the "merged page tables" case to deal with,
there is a bunch of things we can simplify in the HYP code (both
at init and teardown time).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
0535a3e2b2 arm/arm64: KVM: Always have merged page tables
We're in a position where we can now always have "merged" page
tables, where both the runtime mapping and the idmap coexist.

This results in some code being removed, but there is more to come.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
d174591016 arm64: KVM: Runtime detection of lower HYP offset
Add the code that enables the switch to the lower HYP VA range.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
1df3e2347a arm/arm64: KVM: Export __hyp_text_start/end symbols
Declare the __hyp_text_start/end symbols in asm/virt.h so that
they can be reused without having to declare them locally.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
fd81e6bf39 arm64: KVM: Refactor kern_hyp_va to deal with multiple offsets
As we move towards a selectable HYP VA range, it is obvious that
we don't want to test a variable to find out if we need to use
the bottom VA range, the top VA range, or use the address as is
(for VHE).

Instead, we can expand our current helper to generate the right
mask or nop with code patching. We default to using the top VA
space, with alternatives to switch to the bottom one or to nop
out the instructions.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
d53d9bc652 arm64: KVM: Define HYP offset masks
Define the two possible HYP VA regions in terms of VA_BITS,
and keep HYP_PAGE_OFFSET_MASK as a temporary compatibility
definition.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
853c3b21ff arm64: Add ARM64_HYP_OFFSET_LOW capability
As we need to indicate to the rest of the kernel which region of
the HYP VA space is safe to use, add a capability that will
indicate that KVM should use the [VA_BITS-2:0] range.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
fd16fe6820 arm64: KVM: Kill HYP_PAGE_OFFSET
HYP_PAGE_OFFSET is not massively useful. And the way we use it
in KERN_HYP_VA is inconsistent with the equivalent operation in
EL2, where we use a mask instead.

Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK,
and get rid of the pointless macro.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
3f0f8830d4 arm/arm64: KVM: Remove hyp_kern_va helper
hyp_kern_va is now completely unused, so let's remove it entirely.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
cf7df13d3c arm64: KVM: Always reference __hyp_panic_string via its kernel VA
__hyp_panic_string is passed via the HYP panic code to the panic
function, and is being "upgraded" to a kernel address, as it is
referenced by the HYP code (in a PC-relative way).

This is a bit silly, and we'd be better off obtaining the kernel
address and not mess with it at all. This patch implements this
with a tiny bit of asm glue, by forcing the string pointer to be
read from the literal pool.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
82a81bff90 arm64: KVM: Merged page tables documentation
Since dealing with VA ranges tends to hurt my brain badly, let's
start with a bit of documentation that will hopefully help
understanding what comes next...

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
50926d82fa KVM: arm/arm64: The GIC is dead, long live the GIC
I don't think any single piece of the KVM/ARM code ever generated
as much hatred as the GIC emulation.

It was written by someone who had zero experience in modeling
hardware (me), was riddled with design flaws, should have been
scrapped and rewritten from scratch long before having a remote
chance of reaching mainline, and yet we supported it for a good
three years. No need to mention the names of those who suffered,
the git log is singing their praises.

Thankfully, we now have a much more maintainable implementation,
and we can safely put the grumpy old GIC to rest.

Fellow hackers, please raise your glass in memory of the GIC:

	The GIC is dead, long live the GIC!

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:09:37 +02:00
Wanpeng Li
196f20ca52 KVM: vmx: fix missed cancellation of TSC deadline timer
INFO: rcu_sched detected stalls on CPUs/tasks:
 1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663
 (detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
Task dump for CPU 1:
qemu-system-x86 R  running task        0  3529   3525 0x00080808
 ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
 ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
 0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
Call Trace:
? kvm_write_guest_cached+0xb9/0x160 [kvm]
? __delay+0xf/0x20
? wait_lapic_expire+0x14a/0x200 [kvm]
? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0x5/0x210
? do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
? SyS_ioctl+0x79/0x90
? do_syscall_64+0x7c/0x1e0
? entry_SYSCALL64_slow_path+0x25/0x25

This can be reproduced readily by running a full dynticks guest(since hrtimer
in guest is heavily used) w/ lapic_timer_advance disabled.

If fail to program hardware preemption timer, we will fallback to hrtimer based
method, however, a previous programmed preemption timer miss to cancel in this
scenario which results in one hardware preemption timer and one hrtimer emulated
tsc deadline timer run simultaneously. So sometimes the target guest deadline
tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer
can underflow and cause delta_tsc to be set a huge value, then host soft lockup
as above.

This patch fix it by cancelling the previous programmed preemption timer if there
is once we failed to program the new preemption timer and fallback to hrtimer
based method.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:42 +02:00
Wanpeng Li
bd97ad0e7e KVM: x86: introduce cancel_hv_tscdeadline
Introduce cancel_hv_tscdeadline() to encapsulate preemption
timer cancel stuff.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:41 +02:00
Paolo Bonzini
9175d2e97b KVM: vmx: fix underflow in TSC deadline calculation
If the TSC deadline timer is programmed really close to the deadline or
even in the past, the computation in vmx_set_hv_timer can underflow and
cause delta_tsc to be set to a huge value.  This generally results
in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting
delta_tsc to be positive or zero.

Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:39 +02:00
Paolo Bonzini
f2485b3e0c KVM: x86: use guest_exit_irqoff
This gains a few clock cycles per vmexit.  On Intel there is no need
anymore to enable the interrupts in vmx_handle_external_intr, since
we are using the "acknowledge interrupt on exit" feature.  AMD
needs to do that, and must be careful to avoid the interrupt shadow.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:38 +02:00
Paolo Bonzini
91fa0f8e9e KVM: x86: always use "acknowledge interrupt on exit"
This is necessary to simplify handle_external_intr in the next patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:36 +02:00
Paolo Bonzini
6edaa5307f KVM: remove kvm_guest_enter/exit wrappers
Use the functions from context_tracking.h directly.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:21 +02:00
Marc Zyngier
0996353f8e arm/arm64: KVM: Make default HYP mappings non-excutable
Structures that can be generally written to don't have any requirement
to be executable (quite the opposite). This includes the kvm and vcpu
structures, as well as the stacks.

Let's change the default to incorporate the XN flag.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29 14:01:34 +02:00
Marc Zyngier
5900270550 arm/arm64: KVM: Map the HYP text as read-only
There should be no reason for mapping the HYP text read/write.

As such, let's have a new set of flags (PAGE_HYP_EXEC) that allows
execution, but makes the page as read-only, and update the two call
sites that deal with mapping code.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29 14:01:34 +02:00
Marc Zyngier
74a6b8885f arm/arm64: KVM: Enforce HYP read-only mapping of the kernel's rodata section
In order to be able to use C code in HYP, we're now mapping the kernel's
rodata in HYP. It works absolutely fine, except that we're mapping it RWX,
which is not what it should be.

Add a new HYP_PAGE_RO protection, and pass it as the protection flags
when mapping the rodata section.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29 13:59:14 +02:00
Marc Zyngier
1166f3fe6a arm64: Add PTE_HYP_XN page table flag
EL2 page tables can be configured to deny code from being
executed, which is done by setting bit 54 in the page descriptor.

It is the same bit as PTE_UXN, but the "USER" reference felt odd
in the hypervisor code.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29 13:59:14 +02:00
Marc Zyngier
c8dddecdeb arm/arm64: KVM: Add a protection parameter to create_hyp_mappings
Currently, create_hyp_mappings applies a "one size fits all" page
protection (PAGE_HYP). As we're heading towards separate protections
for different sections, let's make this protection a parameter, and
let the callers pass their prefered protection (PAGE_HYP for everyone
for the time being).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29 13:59:14 +02:00
Paolo Bonzini
ebaac17362 context_tracking: move rcu_virt_note_context_switch out of kvm_host.h
Make kvm_guest_{enter,exit} and __kvm_guest_{enter,exit} trivial wrappers
around the code in context_tracking.h.  Name the context_tracking.h functions
consistently with what those for kernel<->user switch.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-28 14:15:25 +02:00
James Hogan
fb6cec1492 MIPS: KVM: Combine entry trace events into class
Combine the kvm_enter, kvm_reenter and kvm_out trace events into a
single kvm_transition event class to reduce duplication and bloat.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Fixes: 93258604ab ("MIPS: KVM: Add guest mode switch trace events")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23 19:17:30 +02:00
Arnd Bergmann
87aeb54f1b kvm: x86: use getboottime64
KVM reads the current boottime value as a struct timespec in order to
calculate the guest wallclock time, resulting in an overflow in 2038
on 32-bit systems.

The data then gets passed as an unsigned 32-bit number to the guest,
and that in turn overflows in 2106.

We cannot do much about the second overflow, which affects both 32-bit
and 64-bit hosts, but we can ensure that they both behave the same
way and don't overflow until 2106, by using getboottime64() to read
a timespec64 value.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23 19:17:30 +02:00
Ashok Raj
c45dcc71b7 KVM: VMX: enable guest access to LMCE related MSRs
On Intel platforms, this patch adds LMCE to KVM MCE supported
capabilities and handles guest access to LMCE related MSRs.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported
           Only enable LMCE on Intel platform
           Check MSR_IA32_FEATURE_CONTROL when handling guest
             access to MSR_IA32_MCG_EXT_CTL]
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23 19:17:29 +02:00
Haozhong Zhang
37e4c997da KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL
KVM currently does not check the value written to guest
MSR_IA32_FEATURE_CONTROL, though bits corresponding to disabled features
may be set. This patch makes KVM to validate individual bits written to
guest MSR_IA32_FEATURE_CONTROL according to enabled features.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23 19:17:29 +02:00
Haozhong Zhang
3b84080b95 KVM: VMX: move msr_ia32_feature_control to vcpu_vmx
msr_ia32_feature_control will be used for LMCE and not depend only on
nested anymore, so move it from struct nested_vmx to struct vcpu_vmx.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23 19:17:28 +02:00
Paolo Bonzini
8ff7b95647 KVM: s390: vSIE (nested virtualization) feature for 4.8 (kvm/next)
With an updated QEMU this allows to create nested KVM guests
 (KVM under KVM) on s390.
 
 s390 memory management changes from Martin Schwidefsky or
 acked by Martin. One common code memory management change (pageref)
 acked by Andrew Morton.
 
 The feature has to be enabled with the nested medule parameter.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJXaS0XAAoJEBF7vIC1phx8rqMP/3EUiQbD/seIbkc2xqdApbZX
 xEDu6215f84N3KsLlZC/z/PhoxZ9z+GrHu+p2DqgnoMyDMj3C32s0OVecdxrE2+H
 pdaEY2ju8rjnY2P4hHzAfdWhdZtwsgdMIX8qDesQjcz4Z1QdOBXymEUpV7d35E/O
 WYqZw6EY6aOB338XKPaD03ITC+us/dW75ctPvqRzh5EKjjIMeOATonyGx0oHi14s
 qXRK1tfq8OegJGzioW5hOIlUkawycNu5+uIRjyoxzLuEPjn68Bomfi6W7Mak0BIc
 v8kEylFgMvSHKy0mWXCZsxj3hD6kNUzlJ+gRYFCiVWvdw6ywFzQiHP2nwLApg7Op
 2a5S4wzQ3FIhA8u0k7JQBUncJggRZByitm4DerezHCZzKI9QimYrcKp2HWBQtI+0
 12VC9/A0qyy2mKvTlgaDS4SQnPV2P6ztrLgORgK6F9TPq4w8mMYxG9Nz6neea6Po
 cotjEtqZivKNno1MOUzAg3jsrTMy4O4ktS0e8JSog2pDRHIrMiqCOcq1B0cRGK6P
 8UmcpkK6yiEgCCbtqEU3QhcIMY6srTC6xArCjuieTY9v6EtBWAioRXuHRkVhs8xR
 HkEdlWlYTXeGpqcR2cCrUzEzUKdVmyTmnIVtoy/FfRfKEY02fHhiEIS3orqXoc4y
 +/yPsoGVxt8lMFs3/E8Y
 =ubb/
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: vSIE (nested virtualization) feature for 4.8 (kvm/next)

With an updated QEMU this allows to create nested KVM guests
(KVM under KVM) on s390.

s390 memory management changes from Martin Schwidefsky or
acked by Martin. One common code memory management change (pageref)
acked by Andrew Morton.

The feature has to be enabled with the nested medule parameter.
2016-06-21 15:21:51 +02:00
David Hildenbrand
a411edf132 KVM: s390: vsie: add module parameter "nested"
Let's be careful first and allow nested virtualization only if enabled
by the system administrator. In addition, user space still has to
explicitly enable it via SCLP features for it to work.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:47 +02:00
David Hildenbrand
5d3876a8bf KVM: s390: vsie: add indication for future features
We have certain SIE features that we cannot support for now.
Let's add these features, so user space can directly prepare to enable
them, so we don't have to update yet another component.

In addition, add a comment block, telling why it is for now not possible to
forward/enable these features.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:47 +02:00
David Hildenbrand
91473b487d KVM: s390: vsie: correctly set and handle guest TOD
Guest 2 sets up the epoch of guest 3 from his point of view. Therefore,
we have to add the guest 2 epoch to the guest 3 epoch. We also have to take
care of guest 2 epoch changes on STP syncs. This will work just fine by
also updating the guest 3 epoch when a vsie_block has been set for a VCPU.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:46 +02:00
David Hildenbrand
b917ae573f KVM: s390: vsie: speed up VCPU external calls
Whenever a SIGP external call is injected via the SIGP external call
interpretation facility, the VCPU is not kicked. When a VCPU is currently
in the VSIE, the external call might not be processed immediately.

Therefore we have to provoke partial execution exceptions, which leads to a
kick of the VCPU and therefore also kick out of VSIE. This is done by
simulating the WAIT state. This bit has no other side effects.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:46 +02:00
David Hildenbrand
94a15de8fb KVM: s390: don't use CPUSTAT_WAIT to detect if a VCPU is idle
As we want to make use of CPUSTAT_WAIT also when a VCPU is not idle but
to force interception of external calls, let's check in the bitmap instead.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:45 +02:00
David Hildenbrand
adbf16985c KVM: s390: vsie: speed up VCPU irq delivery when handling vsie
Whenever we want to wake up a VCPU (e.g. when injecting an IRQ), we
have to kick it out of vsie, so the request will be handled faster.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:44 +02:00
David Hildenbrand
1b7029bec1 KVM: s390: vsie: try to refault after a reported fault to g2
We can avoid one unneeded SIE entry after we reported a fault to g2.
Theoretically, g2 resolves the fault and we can create the shadow mapping
directly, instead of failing again when entering the SIE.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:44 +02:00
David Hildenbrand
7fd7f39daa KVM: s390: vsie: support IBS interpretation
We can easily enable ibs for guest 2, so he can use it for guest 3.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:43 +02:00
David Hildenbrand
13ee3f678b KVM: s390: vsie: support conditional-external-interception
We can easily enable cei for guest 2, so he can use it for guest 3.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:42 +02:00
David Hildenbrand
5630a8e82b KVM: s390: vsie: support intervention-bypass
We can easily enable intervention bypass for guest 2, so it can use it
for guest 3.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:42 +02:00