As Nish suggested, it makes more sense to init the numa node informatiion
for present cpus at boottime, which could also avoid WARN_ON(1) in
numa_setup_cpu().
With this change, we also need to change the smp_prepare_cpus() to set up
numa information only on present cpus.
For those possible, but not present cpus, their numa information
will be set up after they are started, as the original code did before commit
2fabf084b6.
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With commit 2fabf084b6 ("powerpc: reorder per-cpu NUMA information's
initialization"), during boottime, cpu_numa_callback() is called
earlier(before their online) for each cpu, and verify_cpu_node_mapping()
uses cpu_to_node() to check whether siblings are in the same node.
It skips the checking for siblings that are not online yet. So the only
check done here is for the bootcpu, which is online at that time. But
the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which
will be set up in smp_prepare_cpus()).
So I saw something like following reported:
[ 0.000000] CPU thread siblings 1/2/3 and 0 don't belong to the same
node!
As we don't actually do the checking during this early stage, so maybe
we could directly call numa_setup_cpu() in do_init_bootmem().
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The size field of the op.type word is now the total number of bytes
to be loaded or stored.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This extends the instruction emulation done by analyse_instr() and
emulate_step() to handle a few more instructions that are found in
the kernel.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This splits out the instruction analysis part of emulate_step() into
a separate analyse_instr() function, which decodes the instruction,
but doesn't execute any load or store instructions. It does execute
integer instructions and branches which can be executed purely by
updating register values in the pt_regs struct. For other instructions,
it returns the instruction type and other details in a new
instruction_op struct. emulate_step() then uses that information
to execute loads, stores, cache operations, mfmsr, mtmsr[d], and
(on 64-bit) sc instructions.
The reason for doing this is so that the KVM code can use it instead
of having its own separate instruction emulation code. Possibly the
alignment interrupt handler could also use this.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit e6a6928c3e "of/fdt: Convert FDT functions to use libfdt",
the kernel stopped supporting old flat device tree formats. The minimum
supported version is now 0x10.
There was a checking function added, early_init_dt_verify(), but it's
not called on powerpc.
The result is, if you boot with an old flat device tree, the kernel will
fail to parse it correctly, think you have no memory etc. and hilarity
ensues.
We can't really fix it, but we can at least catch the fact that the
device tree is in an unsupported format and panic(). We can't call
BUG(), it's too early.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On PowerNV platforms, when a CPU is offline, we put it into nap mode.
It's possible that the CPU wakes up from nap mode while it is still
offline due to a stray IPI. A misdirected device interrupt could also
potentially cause it to wake up. In that circumstance, we need to clear
the interrupt so that the CPU can go back to nap mode.
In the past the clearing of the interrupt was accomplished by briefly
enabling interrupts and allowing the normal interrupt handling code
(do_IRQ() etc.) to handle the interrupt. This has the problem that
this code calls irq_enter() and irq_exit(), which call functions such
as account_system_vtime() which use RCU internally. Use of RCU is not
permitted on offline CPUs and will trigger errors if RCU checking is
enabled.
To avoid calling into any generic code which might use RCU, we adopt
a different method of clearing interrupts on offline CPUs. Since we
are on the PowerNV platform, we know that the system interrupt
controller is a XICS being driven directly (i.e. not via hcalls) by
the kernel. Hence this adds a new icp_native_flush_interrupt()
function to the native-mode XICS driver and arranges to call that
when an offline CPU is woken from nap. This new function reads the
interrupt from the XICS. If it is an IPI, it clears the IPI; if it
is a device interrupt, it prints a warning and disables the source.
Then it does the end-of-interrupt processing for the interrupt.
The other thing that briefly enabling interrupts did was to check and
clear the irq_happened flag in this CPU's PACA. Therefore, after
flushing the interrupt from the XICS, we also clear all bits except
the PACA_IRQ_HARD_DIS (interrupts are hard disabled) bit from the
irq_happened flag. The PACA_IRQ_HARD_DIS flag is set by power7_nap()
and is left set to indicate that interrupts are hard disabled. This
means we then have to ignore that flag in power7_nap(), which is
reasonable since it doesn't indicate that any interrupt event needs
servicing.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
I ran some tests to compare hash_64 using shifts and multiplies.
The results:
POWER6: ~2x slower
POWER7: ~2x faster
POWER8: ~2x faster
Now we have a proper config option, select
CONFIG_ARCH_HAS_FAST_MULTIPLIER on POWER7 and POWER8.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This allows the user to build a kernel targeted at POWER8
(ie gcc -mcpu=power8).
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When removing a cpu, this patch makes sure that values
gotten from or passed to firmware are in the correct
endian format.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The ibm,ppc-interrupt-server#s property is in big endian format.
These values need to be converted when used by little endian
architectures.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
of_device_ids (i.e. compatible strings and the respective data) are not
supposed to change at runtime. All functions working with of_device_ids
provided by <linux/of.h> work with const of_device_ids. This allows to
mark all struct of_device_id const, too.
While touching these line also put the __init annotation at the right
position where necessary.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CONFIG_JUMP_LABEL doesn't ensure HAVE_JUMP_LABEL, if it
is not the case use maintainers's own mutex to guard
the modification of global values.
Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A recent patch added a function prototype for htab_remove_mapping in
c code. Fix it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There were a number of prototypes for functions that no longer
exist. Remove them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix a number of places where global functions were not including
their prototype. This ensures the prototype and the function match.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Simplify things considerably by moving all the ppc32 specific
symbol exports into its own file.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the lib symbol exports closer to their function definitions
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Check that the OPAL_DUMP_READ token exists before initalising the elog
infrastructure.
This avoids littering the OPAL console with:
"OPAL: Called with bad token 91"
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Check that the OPAL_ELOG_READ token exists before initalising the elog
infrastructure.
This avoids littering the OPAL console with:
"OPAL: Called with bad token 74"
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Check that the OPAL_RTC_READ token exists before we use the OPAL RTC.
Refactors the code a little to merge error paths.
This avoids littering the OPAL console with:
"OPAL: Called with bad token 3".
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently there is no way to generically check if an OPAL call exists or not
from the host kernel.
This adds an OPAL call opal_check_token() which tells you if the given token is
present in OPAL or not.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix ppc 32 build failure as reported here:
http://kisskb.ellerman.id.au/kisskb/buildresult/11663513/
The error is as follows:
arch/powerpc/include/asm/floppy.h:142:20: error: 'isa_bridge_pcidev' undeclared
(first use in this function)
This is happening since floppy.o is enabled by BLK_DEV_FD which depends on
ARCH_MAY_HAVE_PC_FDC which is in-turn enabled if PPC_PSERIES=n.
The following commit changes the dependency so that ARCH_MAY_HAVE_PC_FDC is
dependent exclusively on PCI since otherwise it will not compile.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
in commit 29f1aff2c (powerpc: Copy bootable images in the default
install script) we changed to copying all the built boot targets based
on the assumption that it's backwards compatible. It turns out that
debian devived installkernel scripts will barf if not given exactly 4
args.
This change reverts make install to just install the vmlinux (we can
change the dfault in a seperate patch) and introduces a new make
zInstall which works with a more flexible installkernel script.
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Presently we only support initiating Service Processor dump from host.
Hence update sysfs message. Also update couple of other error/info
messages.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
New awesome things in this release:
- E500: e6500 core support
- E500: guest and remote debug support
- Book3S: remote sw breakpoint support
- Book3S: HV: Minor bugfixes
Alexander Graf (1):
KVM: PPC: Pass enum to kvmppc_get_last_inst
Bharat Bhushan (8):
KVM: PPC: BOOKE: allow debug interrupt at "debug level"
KVM: PPC: BOOKE : Emulate rfdi instruction
KVM: PPC: BOOKE: Allow guest to change MSR_DE
KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUG
KVM: PPC: BOOKE: Guest and hardware visible debug registers are same
KVM: PPC: BOOKE: Add one reg interface for DBSR
KVM: PPC: BOOKE: Add one_reg documentation of SPRG9 and DBSR
KVM: PPC: BOOKE: Emulate debug registers and exception
Madhavan Srinivasan (2):
powerpc/kvm: support to handle sw breakpoint
powerpc/kvm: common sw breakpoint instr across ppc
Michael Neuling (1):
KVM: PPC: Book3S HV: Add register name when loading toc
Mihai Caraman (10):
powerpc/booke: Restrict SPE exception handlers to e200/e500 cores
powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers
KVM: PPC: Book3E: Increase FPU laziness
KVM: PPC: Book3e: Add AltiVec support
KVM: PPC: Make ONE_REG powerpc generic
KVM: PPC: Move ONE_REG AltiVec support to powerpc
KVM: PPC: Remove the tasklet used by the hrtimer
KVM: PPC: Remove shared defines for SPE and AltiVec interrupts
KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core
KVM: PPC: Book3E: Enable e6500 core
Paul Mackerras (2):
KVM: PPC: Book3S HV: Increase timeout for grabbing secondary threads
KVM: PPC: Book3S HV: Only accept host PVR value for guest PVR
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJUIyyEAAoJECszeR4D/txgiV8P/AnSRcjxrlW+ITsimZezDaj5
MfFv2ZyQKlVjp4cfzfCTW5otQT/K2rSfJzB/V6l1xGcM/UEO+snmPddokvFLMsp9
dLvPjZI6ivZu/rjRZ8eqnTQIAwid0K5Yss870Y8YWfRBByKVDs7rRx75gj6q8kek
jG3wLQQxDYEapkGXiaIcX2Mbf6GAZKNhGf6M5Khn/v3RE0+mNg9J+nffBZXOxEYo
WDe20KNSuDqDEnWIc82uibTbH1Wnxmetc5jf21DWaquLs9VGbON1X9Myl+aBNQuP
wDt6D04rgtBZbwyHKsSO/0poK0eIms+5jiW8c+XPO2QOLXQwwNKBNmRKePyk1bt5
gRxd+u9OGzRGHKwIS1vqHLKCdr5HiTN0uE+nZ+oDWjXVJQRMc8HCx0tWxzZg46yd
kIIRuDrIQQUH3j2L/PnY3Nx3yKNhg97Ysek0ToIsxlkqczrAUewnXuOj9Ijf+/Cz
Y3cVsQEhepcO3xyz5uyWJQwmFZkwJVOclzGaNgXKeKl5fkpXwLPxc6vmI2K+hnU9
TRFoQgbknPxQe2qv9cXeMBFhZwNRKpcYW7w3G81ko/7foVmwP3CjnNulXMKiNuVH
i8pVd8zxiJuTWVQSksGWuWCxueLmc86L4khSF5YBzg9pid7ajmxcfEDWCQGdN+Fe
Oh4HUW0860IJYOQRIKJv
=CR/Z
-----END PGP SIGNATURE-----
Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-next
Patch queue for ppc - 2014-09-24
New awesome things in this release:
- E500: e6500 core support
- E500: guest and remote debug support
- Book3S: remote sw breakpoint support
- Book3S: HV: Minor bugfixes
Alexander Graf (1):
KVM: PPC: Pass enum to kvmppc_get_last_inst
Bharat Bhushan (8):
KVM: PPC: BOOKE: allow debug interrupt at "debug level"
KVM: PPC: BOOKE : Emulate rfdi instruction
KVM: PPC: BOOKE: Allow guest to change MSR_DE
KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUG
KVM: PPC: BOOKE: Guest and hardware visible debug registers are same
KVM: PPC: BOOKE: Add one reg interface for DBSR
KVM: PPC: BOOKE: Add one_reg documentation of SPRG9 and DBSR
KVM: PPC: BOOKE: Emulate debug registers and exception
Madhavan Srinivasan (2):
powerpc/kvm: support to handle sw breakpoint
powerpc/kvm: common sw breakpoint instr across ppc
Michael Neuling (1):
KVM: PPC: Book3S HV: Add register name when loading toc
Mihai Caraman (10):
powerpc/booke: Restrict SPE exception handlers to e200/e500 cores
powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers
KVM: PPC: Book3E: Increase FPU laziness
KVM: PPC: Book3e: Add AltiVec support
KVM: PPC: Make ONE_REG powerpc generic
KVM: PPC: Move ONE_REG AltiVec support to powerpc
KVM: PPC: Remove the tasklet used by the hrtimer
KVM: PPC: Remove shared defines for SPE and AltiVec interrupts
KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core
KVM: PPC: Book3E: Enable e6500 core
Paul Mackerras (2):
KVM: PPC: Book3S HV: Increase timeout for grabbing secondary threads
KVM: PPC: Book3S HV: Only accept host PVR value for guest PVR
Commit df568d8e ("scsi: Use 'depends' with LIBFC instead of
'select'.") removed what happened to be the only instance of 'select
NET'. Defconfigs that were relying on the select now lack networking
support.
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5d6be6a5 ("scsi_netlink : Make SCSI_NETLINK dependent on NET
instead of selecting NET") removed what happened to be the only instance
of 'select NET'. Defconfigs that were relying on the select now lack
networking support.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used to let the guest run while the APIC access page is
not pinned. Because subsequent patches will fill in the function
for x86, place the (still empty) x86 implementation in the x86.c file
instead of adding an inline function in kvm_host.h.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.
2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.
3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.
Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Include linux/thread_info.h so we can use is_32_bit_task() cleanly.
Then just simplify syscall_get_arch() since is_32_bit_task() works for
all configuration options.
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric Paris <eparis@redhat.com>
Conflicts:
arch/mips/net/bpf_jit.c
drivers/net/can/flexcan.c
Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Continue is not needed at the bottom of a loop.
The Coccinelle semantic patch implementing this change is:
@@
@@
for (...;...;...) {
...
if (...) {
...
- continue;
}
}
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The kvmppc_get_last_inst function recently received a facelift that allowed
us to pass an enum of the type of instruction we want to read into it rather
than an unreadable boolean.
Unfortunately, not all callers ended up passing the enum. This wasn't really
an issue as "true" and "false" happen to match the two enum values we have,
but it's still hard to read.
Update all callers of kvmppc_get_last_inst() to follow the new calling
convention.
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch extends the use of illegal instruction as software
breakpoint instruction across the ppc platform. Patch extends
booke program interrupt code to support software breakpoint.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[agraf: Fix bookehv]
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch adds kernel side support for software breakpoint.
Design is that, by using an illegal instruction, we trap to hypervisor
via Emulation Assistance interrupt, where we check for the illegal instruction
and accordingly we return to Host or Guest. Patch also adds support for
software breakpoint in PR KVM.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that AltiVec and hardware thread support is in place enable e6500 core.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
ePAPR represents hardware threads as cpu node properties in device tree.
So with existing QEMU, hardware threads are simply exposed as vcpus with
one hardware thread.
The e6500 core shares TLBs between hardware threads. Without tlb write
conditional instruction, the Linux kernel uses per core mechanisms to
protect against duplicate TLB entries.
The guest is unable to detect real siblings threads, so it can't use the
TLB protection mechanism. An alternative solution is to use the hypervisor
to allocate different lpids to guest's vcpus that runs simultaneous on real
siblings threads. On systems with two threads per core this patch halves
the size of the lpid pool that the allocator sees and use two lpids per VM.
Use even numbers to speedup vcpu lpid computation with consecutive lpids
per VM: vm1 will use lpids 2 and 3, vm2 lpids 4 and 5, and so on.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: fix spelling]
Signed-off-by: Alexander Graf <agraf@suse.de>
Since the guest can read the machine's PVR (Processor Version Register)
directly and see the real value, we should disallow userspace from
setting any value for the guest's PVR other than the real host value.
Therefore this makes kvm_arch_vcpu_set_sregs_hv() check the supplied
PVR value and return an error if it is different from the host value,
which has been put into vcpu->arch.pvr at vcpu creation time.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Occasional failures have been seen with split-core mode and migration
where the message "KVM: couldn't grab cpu" appears. This increases
the length of time that we wait from 1ms to 10ms, which seems to
work around the issue.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>