Beschorner Daniel reported:
> hwinfo problem since 2.6.28, showing this in the oops:
> Corrupted page table at address 7fd04de3ec00
Also, PaX Team reported a regression with this commit:
> commit 9542ada803
> Author: Suresh Siddha <suresh.b.siddha@intel.com>
> Date: Wed Sep 24 08:53:33 2008 -0700
>
> x86: track memtype for RAM in page struct
This commit breaks mapping any RAM page through /dev/mem, as the
reserve_memtype() was not initializing the return attribute type and as such
corrupting the PTE entry that was setup with the return attribute type.
Because of this bug, application mapping this RAM page through /dev/mem
will die with "Corrupted page table at address xxxx" message in the kernel
log and also the kernel identity mapping which maps the underlying RAM
page gets converted to UC.
Fix this by initializing the return attribute type before calling
reserve_ram_pages_type()
Reported-by: PaX Team <pageexec@freemail.hu>
Reported-and-tested-by: Beschorner Daniel <Daniel.Beschorner@facton.com>
Tested-and-Acked-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make X86 SGI Ultraviolet support configurable. Saves about 13K of text size
on my modest config.
text data bss dec hex filename
6770537 1158680 694356 8623573 8395d5 vmlinux
6757492 1157664 694228 8609384 835e68 vmlinux.nouv
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
while looking at:
http://bugzilla.kernel.org/show_bug.cgi?id=11541
I realized that the mtrr.show param cannot work, because
the code is processed much too early.
This patch:
- Declares mtrr.show as early_param
- Stays consistent with the previous param (which I doubt
that it ever worked), so mtrr.show=1 would still work
- Declares mtrr_show as initdata
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix sporadic slowdowns and warning messages
This patch fixes a performance issue reported by Linus on his
Nehalem system. While Linus reverted the PAT patch (commit
58dab916df) which exposed the issue,
existing cpa() code can potentially still cause wrong(page attribute
corruption) behavior.
This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that
various people reported.
In 64bit kernel, kernel identity mapping might have holes depending
on the available memory and how e820 reports the address range
covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole
in the address range that is not listed by e820 entries, kernel identity
mapping will have a corresponding hole in its 1-1 identity mapping.
If cpa() happens on the kernel identity mapping which falls into these holes,
existing code fails like this:
__change_page_attr_set_clr()
__change_page_attr()
returns 0 because of if (!kpte). But doesn't
set cpa->numpages and cpa->pfn.
cpa_process_alias()
uses uninitialized cpa->pfn (random value)
which can potentially lead to changing the page
attribute of kernel text/data, kernel identity
mapping of RAM pages etc. oops!
This bug was easily exposed by another PAT patch which was doing
cpa() more often on kernel identity mapping holes (physical range between
max_low_pfn_mapped and 4GB), where in here it was setting the
cache disable attribute(PCD) for kernel identity mappings aswell.
Fix cpa() to handle the kernel identity mapping holes. Retain
the WARN() for cpa() calls to other not present address ranges
(kernel-text/data, ioremap() addresses)
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix:
arch/x86/mm/tlb.c:47: error: ‘CONFIG_X86_INTERNODE_CACHE_BYTES’ undeclared here (not in a function)
The CONFIG_X86_INTERNODE_CACHE_BYTES symbol is only defined on 64-bit,
because vsmp support is 64-bit only. Define it on 32-bit too - where it
will always be equal to X86_L1_CACHE_BYTES.
Also move the default of X86_L1_CACHE_BYTES (which is separate from the
more commonly used L1_CACHE_SHIFT kconfig symbol) from 128 bytes to
64 bytes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix:
arch/x86/mm/srat_64.c: In function ‘acpi_numa_processor_affinity_init’:
arch/x86/mm/srat_64.c:141: error: implicit declaration of function ‘get_uv_system_type’
arch/x86/mm/srat_64.c:141: error: ‘UV_X2APIC’ undeclared (first use in this function)
arch/x86/mm/srat_64.c:141: error: (Each undeclared identifier is reported only once
arch/x86/mm/srat_64.c:141: error: for each function it appears in.)
A couple of UV definitions were moved to asm/uv/uv.h, but srat_64.c did
not include that header. Add it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Now that it's unified, move the (SMP) TLB flushing code from arch/x86/kernel/
to arch/x86/mm/, where it belongs logically.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
arch/x86/kernel/tlb_32.c
Merge it here because both the cpumask changes and the ongoing percpu
work is touching the TLB code. The percpu changes take precedence, as
they eliminate tlb_32.c altogether.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 4217458daf.
Justin Madru bisected this commit, it was causing weird Firefox
crashes.
The reason is that GCC mis-optimizes (re-uses) the on-stack parameters of
the calling frame, which corrupts the syscall return pt_regs state and
thus corrupts user-space register state.
So we go back to the slightly less clean but more optimization-safe
method of getting to pt_regs. Also add a comment to explain this.
Resolves: http://bugzilla.kernel.org/show_bug.cgi?id=12505
Reported-and-bisected-by: Justin Madru <jdm64@gawab.com>
Tested-by: Justin Madru <jdm64@gawab.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix rare (but currently harmless) miscompile with certain configs and gcc versions
Hugh Dickins noticed that strncpy_from_user() was miscompiled
in some circumstances with gcc 4.3.
Thanks to Hugh's excellent analysis it was easy to track down.
Hugh writes:
> Try building an x86_64 defconfig 2.6.29-rc1 kernel tree,
> except not quite defconfig, switch CONFIG_PREEMPT_NONE=y
> and CONFIG_PREEMPT_VOLUNTARY off (because it expands a
> might_fault() there, which hides the issue): using a
> gcc 4.3.2 (I've checked both openSUSE 11.1 and Fedora 10).
>
> It generates the following:
>
> 0000000000000000 <__strncpy_from_user>:
> 0: 48 89 d1 mov %rdx,%rcx
> 3: 48 85 c9 test %rcx,%rcx
> 6: 74 0e je 16 <__strncpy_from_user+0x16>
> 8: ac lods %ds:(%rsi),%al
> 9: aa stos %al,%es:(%rdi)
> a: 84 c0 test %al,%al
> c: 74 05 je 13 <__strncpy_from_user+0x13>
> e: 48 ff c9 dec %rcx
> 11: 75 f5 jne 8 <__strncpy_from_user+0x8>
> 13: 48 29 c9 sub %rcx,%rcx
> 16: 48 89 c8 mov %rcx,%rax
> 19: c3 retq
>
> Observe that "sub %rcx,%rcx; mov %rcx,%rax", whereas gcc 4.2.1
> (and many other configs) say "sub %rcx,%rdx; mov %rdx,%rax".
> Isn't it returning 0 when it ought to be returning strlen?
The asm constraints for the strncpy_from_user() result were missing an
early clobber, which tells gcc that the last output arguments
are written before all input arguments are read.
Also add more early clobbers in the rest of the file and fix 32-bit
usercopy.c in the same way.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
[ since this API is rarely used and no in-kernel user relies on a 'len'
return value (they only rely on negative return values) this miscompile
was never noticed in the field. But it's worth fixing it nevertheless. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: less contention when issuing invalidate IPI, cleanup
Make x86_32 use the same tlb code as 64bit. The 64bit code uses
multiple IPI vectors for tlb shootdown to reduce contention. This
patch makes x86_32 allocate the same 8 IPIs as x86_64 and share the
code paths.
Note that the usage of asmlinkage is inconsistent for x86_32 and 64
and calls for further cleanup. This has been noted with a FIXME
comment in tlb_64.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: clean up, ipi vector number reordering for x86_32
Make the following changes to prepare for tlb merge.
* reorder x86_32 ip vectors
* adjust tlb_32.c and tlb_64.c such that their logics coincide exactly
- on spurious invalidate ipi, tlb_32 acks the irq
- tlb_64 now has proper memory barriers around clearing
flush_cpumask (no change in generated code)
* unexport flush_tlb_page from tlb_32.c, there's no user
* use unsigned int for cpu id
* drop unnecessary includes from tlb_64.c
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Make the following uv related cleanups.
* collect visible uv related definitions and interfaces into uv/uv.h
and use it. this cleans up the messy situation where on 64bit, uv
is defined properly, on 32bit generic it's dummy and on the rest
undefined. after this clean up, uv is defined on 64 and dummy on
32.
* update uv_flush_tlb_others() such that it takes cpumask of
to-be-flushed cpus as argument, instead of that minus self, and
returns yet-to-be-flushed cpumask, instead of modifying the passed
in parameter. this interface change will ease dummy implementation
of uv_flush_tlb_others() and makes uv tlb flush related stuff
defined in tlb_uv proper.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup, better irq_regs code generation for x86_64
Make 64-bit use the same optimizations as 32-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
tj: * changed cpu to unsigned as was done on mmu_context_64.h as cpu
id is officially unsigned int
* added missing ';' to 32bit version of deactivate_mm()
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
%fs is currently set to __KERNEL_DS at boot, and conditionally
switched to __KERNEL_PERCPU for secondary cpus. Instead, initialize
GDT_ENTRY_PERCPU to the same attributes as GDT_ENTRY_KERNEL_DS and
set %fs to __KERNEL_PERCPU unconditionally.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: slightly better code generation for percpu_to_op()
The processor will sign-extend 32-bit immediate values in 64-bit
operations. Use the 'e' constraint ("32-bit signed integer constant,
or a symbolic reference known to fit that range") for 64-bit constants.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup && more compact percpu area layout with future changes
Move 64-bit GDT to page-aligned section and clean up comment
formatting.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
In switch_to(), instead of taking offset to irq_stack_union.stack,
make it a proper percpu access using __percpu_arg() and per_cpu_var().
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Remove byte locks implementation, which was introduced by Jeremy in
8efcbab6 ("paravirt: introduce a "lock-byte" spinlock implementation"),
but turned out to be dead code that is not used by any in-kernel
virtualization guest (Xen uses its own variant of spinlocks implementation
and KVM is not planning to move to byte locks).
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, restructure code to improve assembly
gcc isn't _all_ that smart about spilling registers to stack or reusing
stack slots, even with branch annotations. do_page_fault contained a lot
of functionality, so split unlikely paths into their own functions, and
mark them as noinline just to be sure. I consider this actually to be
somewhat of a cleanup too: the main function now contains about half
the number of lines so the normal path is easier to read, while the error
cases are also nicely split away.
Also, ensure the order of arguments to functions is always the same: regs,
addr, error_code. This can reduce code size a tiny bit, and just looks neater
too.
And add a couple of branch annotations.
Before:
do_page_fault:
subq $360, %rsp #,
After:
do_page_fault:
subq $56, %rsp #,
bloat-o-meter:
add/remove: 8/0 grow/shrink: 0/1 up/down: 2222/-1680 (542)
function old new delta
__bad_area_nosemaphore - 506 +506
no_context - 474 +474
vmalloc_fault - 424 +424
spurious_fault - 358 +358
mm_fault_error - 272 +272
bad_area_access_error - 89 +89
bad_area - 89 +89
bad_area_nosemaphore - 10 +10
do_page_fault 2464 784 -1680
Yes, the total size increases by 542 bytes, due to the extra function calls.
But these will very rarely be called (except for vmalloc_fault) in a normal
workload. Importantly, do_page_fault is less than 1/3rd it's original size,
and touches far less stack.
Existing gotos and branch hints did move a lot of the infrequently used text
out of the fastpath, but that's even further improved after this patch.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cleanup the cpuid check for DS configuration.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dump the branch trace on an oops (based on ftrace_dump_on_oops).
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: x86_64 percpu area layout change, irq_stack now at the beginning
Now that the PDA is empty except for the stack canary, it can be removed.
The irqstack is moved to the start of the per-cpu section. If the stack
protector is enabled, the canary overlaps the bottom 48 bytes of the irqstack.
tj: * updated subject
* dropped asm relocation of irq_stack_ptr
* updated comments a bit
* rebased on top of stack canary changes
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Use cpu_number to determine if the adjustment is necessary.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Copy the code to cpu_init() to satisfy the requirement that the cpu
be reinitialized. Remove all other calls, since the segments are
already initialized in head_64.S.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: no unnecessary stack canary swapping during context switch
There's no point in moving stack_canary around during context switch
if it's not enabled. Conditionalize it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Make the following cleanups.
* remove duplicate comment from boot_init_stack_canary() which fits
better in the other place - cpu_idle().
* move stack_canary offset check from __switch_to() to
boot_init_stack_canary().
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: fix crash with memory hotplug enabled
kernel_physical_mapping_init() is called during memory hotplug
so it does not belong in the init section.
If the kernel is built with CONFIG_DEBUG_SECTION_MISMATCH=y on
the make command line, arch/x86/mm/init_64.c is compiled with
the -fno-inline-functions-called-once gcc option defeating
inlining of kernel_physical_mapping_init() within init_memory_mapping().
When kernel_physical_mapping_init() is not inlined it is placed
in the .init.text section according to the __init in it's current
declaration. A later call to kernel_physical_mapping_init() during
a memory hotplug operation encounters an int3 trap because the
.init.text section memory has been freed.
This patch eliminates the crash caused by the int3 trap by moving the
non-inlined kernel_physical_mapping_init() from .init.text to .meminit.text.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
-tip testing found this crash:
> [ 35.258515] calling acpi_cpufreq_init+0x0/0x127 @ 1
> [ 35.264127] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 35.267554] IP: [<ffffffff80478092>] __bitmap_intersects+0x48/0x73
> [ 35.267554] PGD 0
> [ 35.267554] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c is still broken: there's no
allocation of the variable mask, so we pass in an uninitialized cmd.mask
field to drv_read(), which then passes it to the scheduler which then
crashes ...
Switch it over to the much simpler constant-cpumask-pointers approach.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use new work_on_cpu function to reduce stack usage
Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for drv_read() and drv_write().
Basically converts do_drv_{read,write} into "work_on_cpu" functions that
are now called by drv_read and drv_write.
Note: This patch basically reverts 50c668d6 which reverted 7503bfba, now
that the work_on_cpu() function is more stable.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Dieter Ries <clip2@gmx.de>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <cpufreq@vger.kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since commit 75c46fa, "x64, x2apic/intr-remap: MSI and MSI-X
support for interrupt remapping infrastructure", x86 has had an
implementation of arch_setup_msi_irqs().
That implementation does not call arch_setup_msi_irq(), instead it calls
setup_irq(). No other x86 code calls arch_setup_msi_irq().
That leaves only arch_setup_msi_irqs() in drivers/pci/msi.c, but that
routine is overridden by the x86 version of arch_setup_msi_irqs().
So arch_setup_msi_irq() is dead code, remove it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The function setup_cpu_local_masks() has been marked __init, in
order to remove the following section mismatch messages:
WARNING: vmlinux.o(.text+0x3c2c7): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
WARNING: vmlinux.o(.text+0x3c2d3): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
WARNING: vmlinux.o(.text+0x3c2df): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
WARNING: vmlinux.o(.text+0x3c2eb): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
Signed-off-by: Leonardo Potenza <lpotenza@inwind.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add debug warning
Fire off one message if two apic's discovered with different
apic versions. (this code is only called during CPU init)
The goal of this is to pave the way of the removal of the apic_version[]
array. We dont expect any apic version incompatibilities in the x86
landscape of systems [if so we dont handle them very well and probably
never will handle deep apic version assymetries well], but it's prudent
to have a debug check for one kernel cycle nevertheless.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Move/remove leftover RDC321 files. Now that it's not a subarch anymore,
arch/x86/mach-rdc321x and arch/x86/include/asm/mach-rdc321x/ are not
needed.
One include file was still in use: rdc321x_defs.h, move that to the
generic x86 asm header directory.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/include/asm/pda.h
We merge tip/core/percpu into tip/perfcounters/core because of a
semantic and contextual conflict: the former eliminates the PDA,
while the latter extends it with apic_perf_irqs field.
Resolve the conflict by moving the new field to the irq_cpustat
structure on 64-bit too.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Accessing memory through %gs should not use rip-relative addressing.
Adding a P prefix for the argument tells gcc to not add (%rip) to
the memory references.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>