Commit graph

10,319 commits

Author SHA1 Message Date
Al Viro
9d401279d6 powerpc: don't bother with zero-extending arguments in sys_clone()
... since the syscall glue had been doing that for 9 years already.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-21 22:25:53 -04:00
Al Viro
53b50f9483 powerpc: take dereferencing to ret_from_kernel_thread()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-21 22:25:11 -04:00
Deepthi Dharwar
83dac59409 cpuidle/powerpc: Fix snooze state problem in the cpuidle design on pseries.
Earlier without cpuidle framework on pseries, the native arch
idle routine comprised of both snooze and nap
states.  smt_snooze_delay variable was used to delay
the idle process entry to deeper idle state like  nap.
With the coming of cpuidle, this arch specific idle was replaced
by two different idle routines, one for supporting snooze and other
for nap. This enabled addition of more
low level idle states on pseries in the future.

On adopting the generic cpuidle framework for POWER systems,
the decision of which idle state to choose from,  given a predicted
idle time is taken by the menu governor based on
target_residency and  exit_latency of the idle states.
target_residency is the minimum time to be resident in that idle state.
Exit_latency is time taken to exit out of idle state.
Deeper the idle state, both the target residency and exit latency
would be higher.

In the current design, smt_snooze_delay is used as target_residency
for the  snooze state which is incorrect, as it is not the
minimum but the maximum duration to be in snooze state.
This would  result in the governor in taking bad decision,
as presently target_residency of nap < target_residency of snooze
inspite of nap being deeper idle state.

This patch aims to fix this problem by replacing the smt_snooze_delay loop
in snooze state, with the need_resched()  as the governor is aware of
entry and exit of various idle transitions based on which
next idle time prediction.

The governor is intelligent enough to determine the idle state the needs to
be transitioned to and maintains a whole of heuristics including
io load, previous idle states predictions etc for the same, based on
which idle state entry decision is taken.

With this fix, of setting target_residency of snooze to 0
					     nap to smt_snooze_delay
if the predicted idle time is less
than smt_snooze_delay (target_residency of nap)
value governor would pick snooze state, else nap. This adhers to the
previous native idle design.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-10-18 10:57:25 +11:00
Deepthi Dharwar
8ea959a17f cpuidle/powerpc: Fix smt_snooze_delay functionality.
smt_snooze_delay was designed to  delay idle loop's nap entry
in the native idle code before it got  ported over to use as part of
the cpuidle framework.

A -ve value  assigned to smt_snooze_delay should result in
busy looping, in other words disabling the entry to nap state.

	- https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-May/082450.html

This particular functionality can be achieved currently by
echo 1 > /sys/devices/system/cpu/cpu*/state1/disable
but it is broken when one assigns -ve value to  the smt_snooze_delay
variable either via sysfs entry or ppc64_cpu util.

This patch aims to fix this, by disabling nap state when smt_snooze_delay
variable is set to -ve value.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-10-18 10:57:24 +11:00
Deepthi Dharwar
817deb05df cpuidle/powerpc: Fix target residency initialisation in pseries cpuidle
Remove the redundant target residency initialisation in pseries_cpuidle_driver_init().
This is currently over-writing the residency time updated as part of the static
table, resulting in  all the idle states having the same target
residency of 100us which is incorrect. This may result in the menu governor making
wrong state decisions.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-10-18 10:57:24 +11:00
Aneesh Kumar K.V
ce236ab576 powerpc: Build fix for powerpc KVM
Fix build failure for powerpc KVM by adding missing VPN_SHIFT definition
and the ';'

arch/powerpc/kvm/book3s_32_mmu_host.c: In function 'kvmppc_mmu_map_page':
arch/powerpc/kvm/book3s_32_mmu_host.c:176: error: 'VPN_SHIFT' undeclared (first use in this function)
arch/powerpc/kvm/book3s_32_mmu_host.c:176: error: (Each undeclared identifier is reported only once
arch/powerpc/kvm/book3s_32_mmu_host.c:176: error: for each function it appears in.)
arch/powerpc/kvm/book3s_32_mmu_host.c:178: error: expected ';' before 'next_pteg'
arch/powerpc/kvm/book3s_32_mmu_host.c:190: error: label 'next_pteg' used but not defined
make[1]: *** [arch/powerpc/kvm/book3s_32_mmu_host.o] Error 1

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-10-18 10:37:52 +11:00
Benjamin Herrenschmidt
72523d8082 Revert "powerpc/perf: Use pmc_overflow() to detect rolled back events"
This reverts commit 813312110b.

This revert was requested by the author of the patch as it seems
to cause system hangs with some low frequency events
2012-10-18 10:36:11 +11:00
Al Viro
40792104b2 powerpc: don't mess with r2 in copy_thread() and friends
kernel_thread() callbacks are *not* in modules and are not going to
be there.  And it's not even read in ppc32 ret_from_kernel_thread(),
so no need to bother with it there either.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-14 19:35:52 -04:00
Al Viro
138d1ce80e powerpc: switch to saner kernel_execve() semantics
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-14 19:35:44 -04:00
Linus Torvalds
d25282d1c9 Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module signing support from Rusty Russell:
 "module signing is the highlight, but it's an all-over David Howells frenzy..."

Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

* 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
  X.509: Fix indefinite length element skip error handling
  X.509: Convert some printk calls to pr_devel
  asymmetric keys: fix printk format warning
  MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
  MODSIGN: Make mrproper should remove generated files.
  MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
  MODSIGN: Use the same digest for the autogen key sig as for the module sig
  MODSIGN: Sign modules during the build process
  MODSIGN: Provide a script for generating a key ID from an X.509 cert
  MODSIGN: Implement module signature checking
  MODSIGN: Provide module signing public keys to the kernel
  MODSIGN: Automatically generate module signing keys if missing
  MODSIGN: Provide Kconfig options
  MODSIGN: Provide gitignore and make clean rules for extra files
  MODSIGN: Add FIPS policy
  module: signature checking hook
  X.509: Add a crypto key parser for binary (DER) X.509 certificates
  MPILIB: Provide a function to read raw data into an MPI
  X.509: Add an ASN.1 decoder
  X.509: Add simple ASN.1 grammar compiler
  ...
2012-10-14 13:39:34 -07:00
Linus Torvalds
b6897130f0 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc uapi disintegration from Benjamin Herrenschmidt.

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  UAPI: (Scripted) Disintegrate arch/powerpc/include/asm
2012-10-13 11:21:15 +09:00
Linus Torvalds
4e21fc138b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull third pile of kernel_execve() patches from Al Viro:
 "The last bits of infrastructure for kernel_thread() et.al., with
  alpha/arm/x86 use of those.  Plus sanitizing the asm glue and
  do_notify_resume() on alpha, fixing the "disabled irq while running
  task_work stuff" breakage there.

  At that point the rest of kernel_thread/kernel_execve/sys_execve work
  can be done independently for different architectures.  The only
  pending bits that do depend on having all architectures converted are
  restrictred to fs/* and kernel/* - that'll obviously have to wait for
  the next cycle.

  I thought we'd have to wait for all of them done before we start
  eliminating the longjump-style insanity in kernel_execve(), but it
  turned out there's a very simple way to do that without flagday-style
  changes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  alpha: switch to saner kernel_execve() semantics
  arm: switch to saner kernel_execve() semantics
  x86, um: convert to saner kernel_execve() semantics
  infrastructure for saner ret_from_kernel_thread semantics
  make sure that kernel_thread() callbacks call do_exit() themselves
  make sure that we always have a return path from kernel_execve()
  ppc: eeh_event should just use kthread_run()
  don't bother with kernel_thread/kernel_execve for launching linuxrc
  alpha: get rid of switch_stack argument of do_work_pending()
  alpha: don't bother passing switch_stack separately from regs
  alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c
  alpha: simplify TIF_NEED_RESCHED handling
2012-10-13 10:05:52 +09:00
Linus Torvalds
03d3602a83 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core update from Thomas Gleixner:
 - Bug fixes (one for a longstanding dead loop issue)
 - Rework of time related vsyscalls
 - Alarm timer updates
 - Jiffies updates to remove compile time dependencies

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Cast raw_interval to u64 to avoid shift overflow
  timers: Fix endless looping between cascade() and internal_add_timer()
  time/jiffies: bring back unconditional LATCH definition
  time: Convert x86_64 to using new update_vsyscall
  time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems
  time: Introduce new GENERIC_TIME_VSYSCALL
  time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD
  time: Move update_vsyscall definitions to timekeeper_internal.h
  time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes
  jiffies: Remove compile time assumptions about CLOCK_TICK_RATE
  jiffies: Kill unused TICK_USEC_TO_NSEC
  alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue
  alarmtimer: Remove unused helpers & defines
  alarmtimer: Use hrtimer per-alarm instead of per-base
  alarmtimer: Implement minimum alarm interval for allowing suspend
2012-10-12 22:17:48 +09:00
Linus Torvalds
8213a2f3ee Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull pile 2 of execve and kernel_thread unification work from Al Viro:
 "Stuff in there: kernel_thread/kernel_execve/sys_execve conversions for
  several more architectures plus assorted signal fixes and cleanups.

  There'll be more (in particular, real fixes for the alpha
  do_notify_resume() irq mess)..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (43 commits)
  alpha: don't open-code trace_report_syscall_{enter,exit}
  Uninclude linux/freezer.h
  m32r: trim masks
  avr32: trim masks
  tile: don't bother with SIGTRAP in setup_frame
  microblaze: don't bother with SIGTRAP in setup_rt_frame()
  mn10300: don't bother with SIGTRAP in setup_frame()
  frv: no need to raise SIGTRAP in setup_frame()
  x86: get rid of duplicate code in case of CONFIG_VM86
  unicore32: remove pointless test
  h8300: trim _TIF_WORK_MASK
  parisc: decide whether to go to slow path (tracesys) based on thread flags
  parisc: don't bother looping in do_signal()
  parisc: fix double restarts
  bury the rest of TIF_IRET
  sanitize tsk_is_polling()
  bury _TIF_RESTORE_SIGMASK
  unicore32: unobfuscate _TIF_WORK_MASK
  mips: NOTIFY_RESUME is not needed in TIF masks
  mips: merge the identical "return from syscall" per-ABI code
  ...

Conflicts:
	arch/arm/include/asm/thread_info.h
2012-10-12 10:49:08 +09:00
Al Viro
ecf89e581a ppc: eeh_event should just use kthread_run()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-11 21:40:31 -04:00
Linus Torvalds
14ffe009ca Merge branch 'akpm' (Fixups from Andrew)
Merge misc fixes from Andrew Morton:
 "Followups, fixes and some random stuff I found on the internet."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (11 patches)
  perf: fix duplicate header inclusion
  memcg, kmem: fix build error when CONFIG_INET is disabled
  rtc: kconfig: fix RTC_INTF defaults connected to RTC_CLASS
  rapidio: fix comment
  lib/kasprintf.c: use kmalloc_track_caller() to get accurate traces for kvasprintf
  rapidio: update for destination ID allocation
  rapidio: update asynchronous discovery initialization
  rapidio: use msleep in discovery wait
  mm: compaction: fix bit ranges in {get,clear,set}_pageblock_skip()
  arch/powerpc/platforms/pseries/hotplug-memory.c: section removal cleanups
  arch/powerpc/platforms/pseries/hotplug-memory.c: fix section handling code
2012-10-11 10:14:16 +09:00
Yasuaki Ishimatsu
1633dbbacb arch/powerpc/platforms/pseries/hotplug-memory.c: section removal cleanups
Followups to d760afd4d2 ("memory-hotplug: suppress "Trying to free
nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning").

 - use unsigned long type, as overflows are conceivable

 - rename `i' to the less-misleading and more informative `section'

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-11 08:50:14 +09:00
Andrew Morton
158544b165 arch/powerpc/platforms/pseries/hotplug-memory.c: fix section handling code
Fix

  arch/powerpc/platforms/pseries/hotplug-memory.c: In function 'pseries_remove_memblock':
  arch/powerpc/platforms/pseries/hotplug-memory.c:103:17: error: unused variable 'pfn' [-Werror=unused-variable]

Caused by commit d760afd4d2 ("memory-hotplug: suppress "Trying to free
nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning").

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-11 08:50:13 +09:00
Marcelo Tosatti
03604b3114 Merge branch 'for-upstream' of http://github.com/agraf/linux-2.6 into queue
* 'for-upstream' of http://github.com/agraf/linux-2.6: (56 commits)
  arch/powerpc/kvm/e500_tlb.c: fix error return code
  KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas
  KVM: PPC: Book3S: Get/set guest FP regs using the GET/SET_ONE_REG interface
  KVM: PPC: Book3S: Get/set guest SPRs using the GET/SET_ONE_REG interface
  KVM: PPC: set IN_GUEST_MODE before checking requests
  KVM: PPC: e500: MMU API: fix leak of shared_tlb_pages
  KVM: PPC: e500: fix allocation size error on g2h_tlb1_map
  KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation
  KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs
  KVM: PPC: Book3S HV: Fix updates of vcpu->cpu
  KVM: Move some PPC ioctl definitions to the correct place
  KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
  KVM: PPC: Move kvm->arch.slot_phys into memslot.arch
  KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots
  KVM: PPC: bookehv: Allow duplicate calls of DO_KVM macro
  KVM: PPC: BookE: Support FPU on non-hv systems
  KVM: PPC: 440: Implement mfdcrx
  KVM: PPC: 440: Implement mtdcrx
  Document IACx/DACx registers access using ONE_REG API
  KVM: PPC: E500: Remove E500_TLB_DIRTY flag
  ...
2012-10-10 19:03:54 -03:00
David Woodhouse
ffe3150125 UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWhOxKuMESys7AQLCZRAAsZAuAK0MxZ4iuq/+fmy7Uxb1jrzLOYSb
 3UgbTgXAjR0WAUHNegVZLX1Xc+12KxvMCj/8sO62Ai+wtgHeDAuUl2T0FbSZjlGK
 qqx/qQqTFHUfJRbm3Lu9iarZ2K49v1kTDk4C+nC8J9mEEW4WFlVPD10n90j+4hxr
 ZCEYril7qOQQV65oor3BT2V64+X1WDHriTLugH1o8RziRF9jh6Z2hgZAWnThcGxu
 lPsmXF2e7jDqGcM3gWtxZWu/yTBPxw549R+JUg4aVKho9WI5ClyjNAKnE7wtd3iW
 HyrylRH+ch2oeYFa5+xoyopRARUUPmujKaHU+ZI1o++eNzuw5JYiwuMlZBLyUc9I
 foWMSUw31U7695exyf66HiH7GEKI1PVpgJVNu41eJvl0iWSWCpKCB6Gs8Sw4xnp2
 auUCYSniXHNTFhFktjNdIUAn0+1X/b/SEfb/id4GvLp1K98QGOfe8dMCC8hEnXiF
 4iIViM8Sv1GB1us5huSjbMeRPbZ3x/loqEpApfgcaqcyrUR29FTE/lFQ4fj9xviL
 JjckPLMMZb4Ho5wrkCi5NtXJ16mx1qKzbBGDdqzmqaNdN+08rNF//kA9m9hCwgD8
 XfAV286DKDC0SllZIG+Uz7YLnSZjNAUhjvWN3ipV+SdT5DGybL3uSW5tYiSAzI2E
 3cayGTWINMg=
 =U9Qq
 -----END PGP SIGNATURE-----

Merge tag 'disintegrate-mtd-20121009' of git://git.infradead.org/users/dhowells/linux-headers

UAPI Disintegration 2012-10-09

Conflicts:
	MAINTAINERS
	arch/arm/configs/bcmring_defconfig
	arch/arm/mach-imx/clk-imx51-imx53.c
	drivers/mtd/nand/Kconfig
	drivers/mtd/nand/bcm_umi_nand.c
	drivers/mtd/nand/nand_bcm_umi.h
	drivers/mtd/nand/orion_nand.c
2012-10-09 15:04:25 +01:00
David Howells
c3617f7203 UAPI: (Scripted) Disintegrate arch/powerpc/include/asm
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
2012-10-09 09:47:26 +01:00
Linus Torvalds
9e2d8656f5 Merge branch 'akpm' (Andrew's patch-bomb)
Merge patches from Andrew Morton:
 "A few misc things and very nearly all of the MM tree.  A tremendous
  amount of stuff (again), including a significant rbtree library
  rework."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (160 commits)
  sparc64: Support transparent huge pages.
  mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd().
  mm: Add and use update_mmu_cache_pmd() in transparent huge page code.
  sparc64: Document PGD and PMD layout.
  sparc64: Eliminate PTE table memory wastage.
  sparc64: Halve the size of PTE tables
  sparc64: Only support 4MB huge pages and 8KB base pages.
  memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning
  mm: memcg: clean up mm_match_cgroup() signature
  mm: document PageHuge somewhat
  mm: use %pK for /proc/vmallocinfo
  mm, thp: fix mlock statistics
  mm, thp: fix mapped pages avoiding unevictable list on mlock
  memory-hotplug: update memory block's state and notify userspace
  memory-hotplug: preparation to notify memory block's state at memory hot remove
  mm: avoid section mismatch warning for memblock_type_name
  make GFP_NOTRACK definition unconditional
  cma: decrease cc.nr_migratepages after reclaiming pagelist
  CMA: migrate mlocked pages
  kpageflags: fix wrong KPF_THP on non-huge compound pages
  ...
2012-10-09 16:23:15 +09:00
Yasuaki Ishimatsu
d760afd4d2 memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning
When our x86 box calls __remove_pages(), release_mem_region() shows many
warnings.  And x86 box cannot unregister iomem_resource.

  "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>"

release_mem_region() has been changed to be called in each
PAGES_PER_SECTION by commit de7f0cba96 ("memory hotplug: release
memory regions in PAGES_PER_SECTION chunks").  Because powerpc registers
iomem_resource in each PAGES_PER_SECTION chunk.  But when I hot add
memory on x86 box, iomem_resource is register in each _CRS not
PAGES_PER_SECTION chunk.  So x86 box unregisters iomem_resource.

The patch fixes the problem.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:04 +09:00
Shaohua Li
45cac65b0f readahead: fault retry breaks mmap file read random detection
.fault now can retry.  The retry can break state machine of .fault.  In
filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
try, since the page is in page cache now, ra->mmap_miss is decreased.  And
these are done in one fault, so we can't detect random mmap file access.

Add a new flag to indicate .fault is tried once.  In the second try, skip
ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.

I only tested x86, didn't test other archs, but looks the change for other
archs is obvious, but who knows :)

Signed-off-by: Shaohua Li <shaohua.li@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:47 +09:00
Shaohua Li
e79bee24fd atomic: implement generic atomic_dec_if_positive()
The x86 implementation of atomic_dec_if_positive is quite generic, so make
it available to all architectures.

This is needed for "swap: add a simple detector for inappropriate swapin
readahead".

[akpm@linux-foundation.org: do the "#define foo foo" trick in the conventional manner]
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:46 +09:00
Will Deacon
5d3a551c28 mm: hugetlb: add arch hook for clearing page flags before entering pool
The core page allocator ensures that page flags are zeroed when freeing
pages via free_pages_check.  A number of architectures (ARM, PPC, MIPS)
rely on this property to treat new pages as dirty with respect to the data
cache and perform the appropriate flushing before mapping the pages into
userspace.

This can lead to cache synchronisation problems when using hugepages,
since the allocator keeps its own pool of pages above the usual page
allocator and does not reset the page flags when freeing a page into the
pool.

This patch adds a new architecture hook, arch_clear_hugepage_flags, so
that architectures which rely on the page flags being in a particular
state for fresh allocations can adjust the flags accordingly when a page
is freed into the pool.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:24 +09:00
Konstantin Khlebnikov
314e51b985 mm: kill vma flag VM_RESERVED and mm->reserved_vm counter
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

 | effect                 | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump      | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct.  Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:19 +09:00
Konstantin Khlebnikov
2dd8ad81e3 mm: use mm->exe_file instead of first VM_EXECUTABLE vma->vm_file
Some security modules and oprofile still uses VM_EXECUTABLE for retrieving
a task's executable file.  After this patch they will use mm->exe_file
directly.  mm->exe_file is protected with mm->mmap_sem, so locking stays
the same.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>			[arch/tile]
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>	[tomoyo]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:18 +09:00
Catalin Marinas
7ac57a89de Kconfig: clean up the "#if defined(arch)" list for exception-trace sysctl entry
Introduce SYSCTL_EXCEPTION_TRACE config option and selec it in the
architectures requiring support for the "exception-trace" debug_table
entry in kernel/sysctl.c.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:14 +09:00
Catalin Marinas
b69ec42b1b Kconfig: clean up the long arch list for the DEBUG_KMEMLEAK config option
Introduce HAVE_DEBUG_KMEMLEAK config option and select it in corresponding
architecture Kconfig files.  DEBUG_KMEMLEAK now only depends on
HAVE_DEBUG_KMEMLEAK.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:14 +09:00
Linus Torvalds
50e0d10232 This has three changes for asm-generic that did not really fit into any
other branch as normal asm-generic changes do. One is a fix for a
 build warning, the other two are more interesting:
 
 * A patch from Mark Brown to allow using the common clock infrastructure
 on all architectures, so we can use the clock API in architecture
 independent device drivers.
 
 * The UAPI split patches from David Howells for the asm-generic files.
 There are other architecture specific series that are going through
 the arch maintainer tree and that depend on this one.
 
 There may be a few small merge conflicts between Mark's patch and
 the following arch header file split patches. In each case the solution
 will be to keep the new "generic-y += clkdev.h" line, even if it
 ends up being the only line in the Kbuild file.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIVAwUAUHLuO2CrR//JCVInAQLsKxAAoa+oSP3KGuQbLHq2wvUxAdXWDFcZgKo+
 qMRejSJPI0sreJ9GJHpUjHtJ7W2gujeo9upmUIJzoWY9vrmjkhCDkaWliaQI8SmY
 CKB9zI2xCB9iFzHtWxocfnJzU7NvzjJm+jnIYrqkaO9HGMxL99tsv9TsBYXK/08j
 QmlGP5fHdGU3zZxVt5r1GL8/nfX4zn3/YEll9nJ7vqXZltIBbaksxmgPoa0QkkH8
 LMeMAlgRR2DHWt58gXHyGB7Afx3QEnZBDaQpYxA446P+2gtvIhFYOnpuX14pZb7t
 m4IM0vOO6WzARQR6DJlRHfYJevojgGHu4Y8wkEzuWE+Hr2BqmiVct7UKqGJdqTY5
 7+I7wwaJmdd3zE61LxRS9UOjJDwMh1gmsNU4+42RArQ5eLcikNR5zfYzDRLCTmnk
 qKZvbiaxgme2YvWazxbBT6EqmIVU6lfHHIoMLr8U0j40Cl0GCmN7EBbe7/r2Jhjs
 6VnCOJ6vb4RCOJGGAcLRMQu7xEtqcCe0Zht839wl13QXewxS3QRgwg6Bjy/fwA9r
 jij5gf+R25J/fQW7yZv4LwcMowRE1xvpu0ebwkK3LLR8jcon71scd6f3PW/bUUpj
 j4tgFuJbXzOxQ4LFgBzvdVgx3wDzsQhqb/6p2l6ROdcw7xXFDdFZ4zq3h0A25wXZ
 J6WDO387tpg=
 =Aaki
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "This has three changes for asm-generic that did not really fit into
  any other branch as normal asm-generic changes do.  One is a fix for a
  build warning, the other two are more interesting:

   * A patch from Mark Brown to allow using the common clock
     infrastructure on all architectures, so we can use the clock API in
     architecture independent device drivers.

   * The UAPI split patches from David Howells for the asm-generic
     files.  There are other architecture specific series that are going
     through the arch maintainer tree and that depend on this one.

  There may be a few small merge conflicts between Mark's patch and the
  following arch header file split patches.  In each case the solution
  will be to keep the new "generic-y += clkdev.h" line, even if it ends
  up being the only line in the Kbuild file."

* tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  UAPI: (Scripted) Disintegrate include/asm-generic
  asm-generic: Add default clkdev.h
  asm-generic: xor: mark static functions as __maybe_unused
2012-10-09 15:58:38 +09:00
Julia Lawall
12ecd9570d arch/powerpc/kvm/e500_tlb.c: fix error return code
Convert a 0 error return code to a negative one, as returned elsewhere in the
function.

A new label is also added to avoid freeing things that are known to not yet
be allocated.

A simplified version of the semantic match that finds the first problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret;
expression e,e1,e2,e3,e4,x;
@@

(
if (\(ret != 0\|ret < 0\) || ...) { ... return ...; }
|
ret = 0
)
... when != ret = e1
*x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...);
... when != x = e2
    when != ret = e3
*if (x == NULL || ...)
{
  ... when != ret = e4
*  return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:55 +02:00
Paul Mackerras
55b665b026 KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas
The PAPR paravirtualization interface lets guests register three
different types of per-vCPU buffer areas in its memory for communication
with the hypervisor.  These are called virtual processor areas (VPAs).
Currently the hypercalls to register and unregister VPAs are handled
by KVM in the kernel, and userspace has no way to know about or save
and restore these registrations across a migration.

This adds "register" codes for these three areas that userspace can
use with the KVM_GET/SET_ONE_REG ioctls to see what addresses have
been registered, and to register or unregister them.  This will be
needed for guest hibernation and migration, and is also needed so
that userspace can unregister them on reset (otherwise we corrupt
guest memory after reboot by writing to the VPAs registered by the
previous kernel).

The "register" for the VPA is a 64-bit value containing the address,
since the length of the VPA is fixed.  The "registers" for the SLB
shadow buffer and dispatch trace log (DTL) are 128 bits long,
consisting of the guest physical address in the high (first) 64 bits
and the length in the low 64 bits.

This also fixes a bug where we were calling init_vpa unconditionally,
leading to an oops when unregistering the VPA.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:55 +02:00
Paul Mackerras
a8bd19ef4d KVM: PPC: Book3S: Get/set guest FP regs using the GET/SET_ONE_REG interface
This enables userspace to get and set all the guest floating-point
state using the KVM_[GS]ET_ONE_REG ioctls.  The floating-point state
includes all of the traditional floating-point registers and the
FPSCR (floating point status/control register), all the VMX/Altivec
vector registers and the VSCR (vector status/control register), and
on POWER7, the vector-scalar registers (note that each FP register
is the high-order half of the corresponding VSR).

Most of these are implemented in common Book 3S code, except for VSX
on POWER7.  Because HV and PR differ in how they store the FP and VSX
registers on POWER7, the code for these cases is not common.  On POWER7,
the FP registers are the upper halves of the VSX registers vsr0 - vsr31.
PR KVM stores vsr0 - vsr31 in two halves, with the upper halves in the
arch.fpr[] array and the lower halves in the arch.vsr[] array, whereas
HV KVM on POWER7 stores the whole VSX register in arch.vsr[].

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix whitespace, vsx compilation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:54 +02:00
Paul Mackerras
a136a8bdc0 KVM: PPC: Book3S: Get/set guest SPRs using the GET/SET_ONE_REG interface
This enables userspace to get and set various SPRs (special-purpose
registers) using the KVM_[GS]ET_ONE_REG ioctls.  With this, userspace
can get and set all the SPRs that are part of the guest state, either
through the KVM_[GS]ET_REGS ioctls, the KVM_[GS]ET_SREGS ioctls, or
the KVM_[GS]ET_ONE_REG ioctls.

The SPRs that are added here are:

- DABR:  Data address breakpoint register
- DSCR:  Data stream control register
- PURR:  Processor utilization of resources register
- SPURR: Scaled PURR
- DAR:   Data address register
- DSISR: Data storage interrupt status register
- AMR:   Authority mask register
- UAMOR: User authority mask override register
- MMCR0, MMCR1, MMCRA: Performance monitor unit control registers
- PMC1..PMC8: Performance monitor unit counter registers

In order to reduce code duplication between PR and HV KVM code, this
moves the kvm_vcpu_ioctl_[gs]et_one_reg functions into book3s.c and
centralizes the copying between user and kernel space there.  The
registers that are handled differently between PR and HV, and those
that exist only in one flavor, are handled in kvmppc_[gs]et_one_reg()
functions that are specific to each flavor.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: minimal style fixes]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:54 +02:00
Scott Wood
5bd1cf1185 KVM: PPC: set IN_GUEST_MODE before checking requests
Avoid a race as described in the code comment.

Also remove a related smp_wmb() from booke's kvmppc_prepare_to_enter().
I can't see any reason for it, and the book3s_pr version doesn't have it.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:54 +02:00
Scott Wood
adbb48a854 KVM: PPC: e500: MMU API: fix leak of shared_tlb_pages
This was found by kmemleak.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:53 +02:00
Scott Wood
e400e72f25 KVM: PPC: e500: fix allocation size error on g2h_tlb1_map
We were only allocating half the bytes we need, which was made more
obvious by a recent fix to the memset in  clear_tlb1_bitmap().

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Cc: stable@vger.kernel.org
2012-10-05 23:38:53 +02:00
Paul Mackerras
70bddfefbd KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation
In the case where the host kernel is using a 64kB base page size and
the guest uses a 4k HPTE (hashed page table entry) to map an emulated
MMIO device, we were calculating the guest physical address wrongly.
We were calculating a gfn as the guest physical address shifted right
16 bits (PAGE_SHIFT) but then only adding back in 12 bits from the
effective address, since the HPTE had a 4k page size.  Thus the gpa
reported to userspace was missing 4 bits.

Instead, we now compute the guest physical address from the HPTE
without reference to the host page size, and then compute the gfn
by shifting the gpa right PAGE_SHIFT bits.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:53 +02:00
Paul Mackerras
964ee98ccd KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs
When making a vcpu non-runnable we incorrectly changed the
thread IDs of all other threads on the core, just remove that
code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:52 +02:00
Paul Mackerras
a47d72f361 KVM: PPC: Book3S HV: Fix updates of vcpu->cpu
This removes the powerpc "generic" updates of vcpu->cpu in load and
put, and moves them to the various backends.

The reason is that "HV" KVM does its own sauce with that field
and the generic updates might corrupt it. The field contains the
CPU# of the -first- HW CPU of the core always for all the VCPU
threads of a core (the one that's online from a host Linux
perspective).

However, the preempt notifiers are going to be called on the
threads VCPUs when they are running (due to them sleeping on our
private waitqueue) causing unload to be called, potentially
clobbering the value.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:52 +02:00
Paul Mackerras
dfe49dbd1f KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
This adds an implementation of kvm_arch_flush_shadow_memslot for
Book3S HV, and arranges for kvmppc_core_commit_memory_region to
flush the dirty log when modifying an existing slot.  With this,
we can handle deletion and modification of memory slots.

kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which
on Book3S HV now traverses the reverse map chains to remove any HPT
(hashed page table) entries referring to pages in the memslot.  This
gets called by generic code whenever deleting a memslot or changing
the guest physical address for a memslot.

We flush the dirty log in kvmppc_core_commit_memory_region for
consistency with what x86 does.  We only need to flush when an
existing memslot is being modified, because for a new memslot the
rmap array (which stores the dirty bits) is all zero, meaning that
every page is considered clean already, and when deleting a memslot
we obviously don't care about the dirty bits any more.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:51 +02:00
Paul Mackerras
a66b48c3a3 KVM: PPC: Move kvm->arch.slot_phys into memslot.arch
Now that we have an architecture-specific field in the kvm_memory_slot
structure, we can use it to store the array of page physical addresses
that we need for Book3S HV KVM on PPC970 processors.  This reduces the
size of struct kvm_arch for Book3S HV, and also reduces the size of
struct kvm_arch_memory_slot for other PPC KVM variants since the fields
in it are now only compiled in for Book3S HV.

This necessitates making the kvm_arch_create_memslot and
kvm_arch_free_memslot operations specific to each PPC KVM variant.
That in turn means that we now don't allocate the rmap arrays on
Book3S PR and Book E.

Since we now unpin pages and free the slot_phys array in
kvmppc_core_free_memslot, we no longer need to do it in
kvmppc_core_destroy_vm, since the generic code takes care to free
all the memslots when destroying a VM.

We now need the new memslot to be passed in to
kvmppc_core_prepare_memory_region, since we need to initialize its
arch.slot_phys member on Book3S HV.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:51 +02:00
Paul Mackerras
2c9097e4c1 KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots
The generic KVM code uses SRCU (sleeping RCU) to protect accesses
to the memslots data structures against updates due to userspace
adding, modifying or removing memory slots.  We need to do that too,
both to avoid accessing stale copies of the memslots and to avoid
lockdep warnings.  This therefore adds srcu_read_lock/unlock pairs
around code that accesses and uses memslots.

Since the real-mode handlers for H_ENTER, H_REMOVE and H_BULK_REMOVE
need to access the memslots, and we don't want to call the SRCU code
in real mode (since we have no assurance that it would only access
the linear mapping), we hold the SRCU read lock for the VM while
in the guest.  This does mean that adding or removing memory slots
while some vcpus are executing in the guest will block for up to
two jiffies.  This tradeoff is acceptable since adding/removing
memory slots only happens rarely, while H_ENTER/H_REMOVE/H_BULK_REMOVE
are performance-critical hot paths.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:51 +02:00
Mihai Caraman
d61966fc08 KVM: PPC: bookehv: Allow duplicate calls of DO_KVM macro
The current form of DO_KVM macro restricts its use to one call per input
parameter set. This is caused by kvmppc_resume_\intno\()_\srr1 symbol
definition.
Duplicate calls of DO_KVM are required by distinct implementations of
exeption handlers which are delegated at runtime. Use a rare label number
to avoid conflicts with the calling contexts.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:50 +02:00
Alexander Graf
7a08c2740f KVM: PPC: BookE: Support FPU on non-hv systems
When running on HV aware hosts, we can not trap when the guest sets the FP
bit, so we just let it do so when it wants to, because it has full access to
MSR.

For non-HV aware hosts with an FPU (like 440), we need to also adjust the
shadow MSR though. Otherwise the guest gets an FP unavailable trap even when
it really enabled the FP bit in MSR.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:50 +02:00
Alexander Graf
ceb985f9d1 KVM: PPC: 440: Implement mfdcrx
We need mfdcrx to execute properly on 460 cores.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:49 +02:00
Alexander Graf
e4dcfe88fb KVM: PPC: 440: Implement mtdcrx
We need mtdcrx to execute properly on 460 cores.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:49 +02:00
Alexander Graf
430c7ff52f KVM: PPC: E500: Remove E500_TLB_DIRTY flag
Since we always mark pages as dirty immediately when mapping them read/write
now, there's no need for the dirty flag in our cache.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:48 +02:00
Alexander Graf
166a2b7000 KVM: PPC: Use symbols for exit trace
Exit traces are a lot easier to read when you don't have to remember
cryptic numbers for guest exit reasons. Symbolify them in our trace
output.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:48 +02:00