Do a global flush tlb after splitting the large page and before we do the
actual change page attribute in the PTE.
With out this, we violate the TLB application note, which says
"The TLBs may contain both ordinary and large-page translations for
a 4-KByte range of linear addresses. This may occur if software
modifies the paging structures so that the page size used for the
address range changes. If the two translations differ with respect
to page frame or attributes (e.g., permissions), processor behavior
is undefined and may be implementation-specific."
And also serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity
mappings) using cpa_lock. So that we don't allow any other cpu, with stale
large tlb entries change the page attribute in parallel to some other cpu
splitting a large page entry along with changing the attribute.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Interrupt context no longer splits large page in cpa(). So we can do away
with cpa memory pool code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
No alias checking needed for setting present/not-present mapping. Otherwise,
we may need to break large pages for 64-bit kernel text mappings (this adds to
complexity if we want to do this from atomic context especially, for ex:
with CONFIG_DEBUG_PAGEALLOC). Let's keep it simple!
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Don't use large pages for kernel identity mapping with DEBUG_PAGEALLOC.
This will remove the need to split the large page for the
allocated kernel page in the interrupt context.
This will simplify cpa code(as we don't do the split any more from the
interrupt context). cpa code simplication in the subsequent patches.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the first pass, kernel physical mapping will be setup using large or
small pages but uses the same PTE attributes as that of the early
PTE attributes setup by early boot code in head_[32|64].S
After flushing TLB's, we go through the second pass, which setups the
direct mapped PTE's with the appropriate attributes (like NX, GLOBAL etc)
which are runtime detectable.
This two pass mechanism conforms to the TLB app note which says:
"Software should not write to a paging-structure entry in a way that would
change, for any linear address, both the page size and either the page frame
or attributes."
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This merges phase 1 of the x86 tree, which is a collection of branches:
x86/alternatives, x86/cleanups, x86/commandline, x86/crashdump,
x86/debug, x86/defconfig, x86/doc, x86/exports, x86/fpu, x86/gart,
x86/idle, x86/mm, x86/mtrr, x86/nmi-watchdog, x86/oprofile,
x86/paravirt, x86/reboot, x86/sparse-fixes, x86/tsc, x86/urgent and
x86/vmalloc
and as Ingo says: "these are the easiest, purely independent x86 topics
with no conflicts, in one nice Octopus merge".
* 'x86-v28-for-linus-phase1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (147 commits)
x86: mtrr_cleanup: treat WRPROT as UNCACHEABLE
x86: mtrr_cleanup: first 1M may be covered in var mtrrs
x86: mtrr_cleanup: print out correct type v2
x86: trivial printk fix in efi.c
x86, debug: mtrr_cleanup print out var mtrr before change it
x86: mtrr_cleanup try gran_size to less than 1M, v3
x86: mtrr_cleanup try gran_size to less than 1M, cleanup
x86: change MTRR_SANITIZER to def_bool y
x86, debug printouts: IOMMU setup failures should not be KERN_ERR
x86: export set_memory_ro and set_memory_rw
x86: mtrr_cleanup try gran_size to less than 1M
x86: mtrr_cleanup prepare to make gran_size to less 1M
x86: mtrr_cleanup safe to get more spare regs now
x86_64: be less annoying on boot, v2
x86: mtrr_cleanup hole size should be less than half of chunk_size, v2
x86: add mtrr_cleanup_debug command line
x86: mtrr_cleanup optimization, v2
x86: don't need to go to chunksize to 4G
x86_64: be less annoying on boot
x86, olpc: fix endian bug in openfirmware workaround
...
Write the name of the unknown vendor_id to output instead of just
"unknown".
Tag changed to 'vendor_id' as used in /proc/cpuinfo
Signed-off-by: Hans Schou <linux@schou.dk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When reserving space for the hypervisor the Xen paravirt backend adds
an extra two pages (this was carried forward from the 2.6.18-xen tree
which had them "for safety"). Depending on various CONFIG options this
can cause the boot time fixmaps to span multiple PMDs which is not
supported and triggers a WARN in early_ioremap_init().
This was exposed by 2216d199b1 which
moved the dmi table parsing earlier.
x86: fix CONFIG_X86_RESERVE_LOW_64K=y
The bad_bios_dmi_table() quirk never triggered because we do DMI setup
too late. Move it a bit earlier.
There is no real reason to reserve these two extra pages and the
fixmap already incorporates FIX_HOLE which serves the same
purpose. None of the other callers of reserve_top_address do this.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.
A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.
Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
add error handling for cpufreq_register_driver() error
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: cpufreq@lists.linux.org.uk
Signed-off-by: Dave Jones <davej@redhat.com>
Replace the no longer working links and email address in the
documentation and in source code.
Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Dave Jones <davej@redhat.com>
When pinning/unpinning a pagetable with split pte locks, we can end up
holding multiple pte locks at once (we need to hold the locks while
there's a pending batched hypercall affecting the pte page). Because
all the pte locks are in the same lock class, lockdep thinks that
we're potentially taking a lock recursively.
This warning is spurious because we always take the pte locks while
holding mm->page_table_lock. lockdep now has spin_lock_nest_lock to
express this kind of dominant lock use, so use it here so that lockdep
knows what's going on.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If a processor implementation discern that a processor state component is in
its initialized state, it may modify the corresponding bit in the
xsave header.xstate_bv as '0'. State in the memory layout setup by 'xsave'
will be consistent with the bit values in the header.
During signal handling, legacy applications may change the FP/SSE bits
in the sigcontext memory layout without touching the FP/SSE header bits
in the xsave header. So always set FP/SSE bits in the xsave header
while saving the sigcontext state to the user space. During signal return,
this will enable the kernel to capture any changes to the FP/SSE bits by the
legacy applications which don't touch xsave headers.
xsave aware apps can change the xstate_bv in the xsave header aswell
as change any contents in the memory layout. xrestor as part of sigreturn
will capture all the changes.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This PCI ID based quick should be a full solution for the IRQ0 override
related slowdown problem on SB450 based systems:
33fb0e4: x86: SB450: skip IRQ0 override if it is not routed to INT2 of IOAPIC
Emit a warning in those cases where the DMI quirk triggers but
the PCI ID based quirk didnt.
If this warning does not trigger then we can phase out the DMI quirks.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On some HP nx6... laptops (e.g. nx6325) BIOS reports an IRQ0 override
but the SB450 chipset is configured such that timer interrupts goe to
INT0 of IOAPIC.
Check IRQ0 routing and if it is routed to INT0 of IOAPIC skip the
timer override.
[ This more generic PCI ID based quirk should alleviate the need for
dmi_ignore_irq0_timer_override DMI quirks. ]
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: "Maciej W. Rozycki" <macro@linux-mips.org>
Tested-by: Dmitry Torokhov <dtor@mail.ru>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: gart iommu have direct mapping when agp is present too
Stress-testing KVM's latest NMI support with kgdbts inside an SMP guest,
I came across spurious unhandled NMIs while running the singlestep test.
Looking closer at the code path each NMI takes when KGDB is enabled, I
noticed that kgdb_nmicallback is called twice per event: One time via
DIE_NMI_IPI notification, the second time on DIE_NMI. Removing the first
invocation cures the unhandled NMIs here.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
There is a bug in the BIOSes of some HP boxes with AMD Turions which
connects IO-APIC pins with ACPI thermal trip points in such a way that
if the state of the IO-APIC is not as expected by the (buggy) BIOS, the
thermal trip points are set to insanely low values (usually all of them
become 16 degrees Celsius). As a result, thermal throttling kicks in
and knock the system down to its shoes.
Unfortunately some of the recent IO-APIC changes made the bug show up.
To prevent this from happening, blacklist machines that are known to be
affected (nx6115 and 6715b in this particular case).
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=11516 listed as
a regression from 2.6.26.
On my box it was caused by:
commit 691874fa96
Author: Maciej W. Rozycki <macro@linux-mips.org>
Date: Tue May 27 21:19:51 2008 +0100
x86: I/O APIC: timer through 8259A second-chance
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
and the whole story is described in this (huge) thread:
http://marc.info/?l=linux-kernel&m=121358440508410&w=4
Matthew Garrett told us about that happening on the nx6125:
http://marc.info/?l=linux-kernel&m=121396307411930&w=4
and then Maciej analysed the breakage on the basis of a DSDT from the
nx6325:
http://marc.info/?l=linux-kernel&m=121401068718826&w=4
As far as the Dmitry's and Jason's boxes are concerned, I recognized the
symptoms and asked them to verify that the blacklisting helped.
It appears that the buggy BIOS code has been copy-pasted to the entire
range of machines, for no good reason.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Tested-by: Jason Vas Dias <jason.vas.dias@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace a magic number with a named constant in the VESA boot code.
Signed-off-by: Michal Januszewski <spock@gentoo.org>
Cc: linux-fbdev-devel@lists.sourceforge.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move init_memory_mapping() out of init_k8_gatt.
for: http://bugzilla.kernel.org/show_bug.cgi?id=11676
2.6.27-rc2 to rc8, apgart fails, iommu=soft works, regression
This is needed because we need to map the GART aperture even
if the GATT is not initialized.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For the purpose of MTRR canonicalization, treat WRPROT as UNCACHEABLE.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The first 1M is don't care when it comes to the variables MTRRs.
Cover it as WB as a heuristic approximation; this is generally what we
want to minimize the number of registers.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Print out the correct type when the Write Protected (WP) type is seen.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
After commit 968de4f ("i386: Relocatable kernel support") IMAGE_OFFSET wasn't
actually used anymore in the (current) X86 build system. Now remove its last
traces.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: segfault on build of a 32-bit relocatable kernel
When converting arch/x86/boot/compressed/relocs.c to support unlimited
sections, the computation of sym_strtab in walk_relocs() was done
incorrectly. This causes a segfault for some people when building the
relocatable 32-bit kernel.
Pointed out by Anonymous <pageexec@freemail.hu>.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Some BIOSes do not indicate error when trying to read from non-
existing device. Zero buffer before reading and check that we
possibly have valid MBR by looking for MBR magic.
This was fixed in different way for edd.S in
http://marc.info/?l=linux-kernel&m=114087765422490&w=2, but lost
again when edd.S was rewritten in C.
Signed-off-by: Andrey Borzenkov < arvidjaar@mail.ru>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
[patch] x86: Trivial printk fix in efi.c
The following line is lacking a space between "memdesc" and "doesn't".
"Kernel-defined memdescdoesn't match the one from EFI!"
Fixed the printk by adding a space.
Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove braces and indent for flags and fpstate in restore_sigcontext().
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are a couple of Xen features which rely on directly accessing
per-cpu data via a segment register, which is not yet available on
x86-64. In the meantime, just disable direct access to the vcpu info
structure; this leaves some of the code as dead, but it will come to
life in time, and the warnings are suppressed.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>