Impact: remove 32-bit optimization to prepare unification
x86-32 and -64 differ in the way they context-switch tasks
with io permission bitmaps. x86-64 simply copies the next
tasks io bitmap into place (if any) on context switch. x86-32
invalidates the bitmap on context switch, so that the next
IO instruction will fault; at that point it installs the
appropriate IO bitmap.
This makes context switching IO-bitmap-using tasks a bit more
less expensive, at the cost of making the next IO instruction
slower due to the extra fault. This tradeoff only makes sense
if IO-bitmap-using processes are relatively common, but they
don't actually use IO instructions very often.
However, in a typical desktop system, the only process likely
to be using IO bitmaps is the X server, and nothing at all on
a server. Therefore the lazy context switch doesn't really win
all that much, and its just a gratuitious difference from
64-bit code.
This patch removes the lazy context switch, with a view to
unifying this code in a later change.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: standardize IO on cached ops
On modern CPUs it is almost always a bad idea to use non-temporal stores,
as the regression in this commit has shown it:
30d697f: x86: fix performance regression in write() syscall
The kernel simply has no good information about whether using non-temporal
stores is a good idea or not - and trying to add heuristics only increases
complexity and inserts fragility.
The regression on cached write()s took very long to be found - over two
years. So dont take any chances and let the hardware decide how it makes
use of its caches.
The only exception is drivers/gpu/drm/i915/i915_gem.c: there were we are
absolutely sure that another entity (the GPU) will pick up the dirty
data immediately and that the CPU will not touch that data before the
GPU will.
Also, keep the _nocache() primitives to make it easier for people to
experiment with these details. There may be more clear-cut cases where
non-cached copies can be used, outside of filemap.c.
Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 17581ad812.
Sitsofe Wheeler reported that /dev/dri/card0 is MIA on his EeePC 900
and bisected it to this commit.
Graphics card is an i915 in an EeePC 900:
00:02.0 VGA compatible controller [0300]:
Intel Corporation Mobile 915GM/GMS/910GML
Express Graphics Controller [8086:2592] (rev 04)
( Most likely the ioremap() of the driver failed and hence the card
did not initialize. )
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix new breakages introduced by previous fix
Commit c132937556 tried to clean up
bootmem arch wrapper but it wasn't quite correct. Before the commit,
the followings were broken.
* Low level interface functions prefixed with __ ignored arch
preference.
* reserve_bootmem(...) can't be mapped into
reserve_bootmem_node(NODE_DATA(0)->bdata, ...) because the node is
not preference here. The region specified MUST fall into the
specified region; otherwise, it will panic.
After the commit,
* If allocation fails for the arch preferred node, it should fallback
to whatever is available. Instead, it simply failed allocation.
There are too many internal details to allow generic wrapping and
still keep things simple for archs. Plus, all that arch wants is a
way to prefer certain node over another.
This patch drops the generic wrapping around alloc_bootmem_core() and
add alloc_bootmem_core() instead. If necessary, arch can define
bootmem_arch_referred_node() macro or function which takes all
allocation information and returns the preferred node. bootmem
generic code will always try the preferred node first and then
fallback to other nodes as usual.
Breakages noted and changes reviewed by Johannes Weiner.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Impact: unification
This patch unify fixmap_32.h and fixmap_64.h into fixmap.h.
Things that we can't merge now are using CONFIG_X86_{32,64}
(e.g.:vsyscall and EFI)
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup
Just prepare fixmap for later mechanic unification.
No real modification on code.
text data bss dec hex filename
3831152 353188 372736 4557076 458914 vmlinux-32.after
3831152 353188 372736 4557076 458914 vmlinux-32.before
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup
Just prepare fixmap for later mechanic unification.
No real modification on code.
text data bss dec hex filename
4312362 527192 421924 5261478 5048a6 vmlinux-64.after
4312362 527192 421924 5261478 5048a6 vmlinux-64.before
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: new fixmap allocation
FIX_EFI_IO_MAP_FIRST_PAGE is used only when EFI is enabled.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: New fixmap allocations
Add CONFIG_X86_{LOCAL,IO}_APIC to enum fixed_address.
FIX_APIC_BASE is used only when CONFIG_X86_LOCAL_APIC is
enabled and FIX_IO_APIC_BASE_* are used only when
CONFIG_X86_IO_APIC is enabled.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: new interface (not yet use)
Define reserve_top_address for x86_64; only for later x86 integration.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: new interface, not yet used
Now, with these macros, x86_64 code can know where start the
permanent and non-permanent fixed mapped address.
This patch make these macros equal fixmap_32.h for future
x86 integration.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: rename
Rename __FIXADDR_SIZE to FIXADDR_SIZE
and __FIXADDR_BOOT_SIZE to FIXADDR_BOOT_SIZE.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup
- rename apic->wakeup_cpu to apic->wakeup_secondary_cpu, to
make it apparent that this is an SMP-only method
- handle NULL ->wakeup_secondary_cpus to mean the default INIT
wakeup sequence - this allows simplification of the APIC
driver templates.
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix
wakeup_secondary_cpu_via_init(), the default platform method for
booting a secondary CPU, is always used on UP due to probe_32.c,
if CONFIG_X86_LOCAL_APIC is enabled but SMP is off.
So provide a UP wrapper inline as well.
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
that is only needed when CONFIG_X86_VSMP is defined with 64bit
also remove dead code about PCI, because CONFIG_X86_VSMP depends on PCI
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
x86_quirks->update_apic() calling looks crazy. so try to remove it:
1. every apic take wakeup_cpu member directly
2. separate es7000_apic to es7000_apic_cluster
3. use uv_wakeup_cpu directly
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make io_mapping_create_wc and io_mapping_free go through PAT to make sure
that there are no memory type aliases.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
io_mapping_create_wc should take a resource_size_t parameter in place of
unsigned long. With unsigned long, there will be no way to map greater than 4GB
address in i386/32 bit.
On x86, greater than 4GB addresses cannot be mapped on i386 without PAE. Return
error for such a case.
Patch also adds a structure for io_mapping, that saves the base, size and
type on HAVE_ATOMIC_IOMAP archs, that can be used to verify the offset on
io_mapping_map calls.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a function to check and keep identity maps in sync, when changing
any memory type. One of the follow on patches will also use this
routine.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make more types of copies non-temporal
This change makes the following simple fix:
30d697f: x86: fix performance regression in write() syscall
A bit more sophisticated: we check the 'total' number of bytes
written to decide whether to copy in a cached or a non-temporal
way.
This will for example cause the tail (modulo 4096 bytes) chunk
of a large write() to be non-temporal too - not just the page-sized
chunks.
Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, enable future change
Add a 'total bytes copied' parameter to __copy_from_user_*nocache(),
and update all the callsites.
The parameter is not used yet - architecture code can use it to
more intelligently decide whether the copy should be cached or
non-temporal.
Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
http://bugzilla.kernel.org/show_bug.cgi?id=10968
[ Updated for current tree, and fixed compile failure
when p4-clockmod was built modular -- davej]
From: Matthias-Christian Ott <ott@mirix.org>
Signed-off-by: Dominik Brodowski <linux@brodo.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Impact: cleanup
Unused macro parameters cause spurious unused variable warnings.
Convert all cacheflush macros to inline functions to avoid the
warnings and achieve better type checking.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Major new feature
Intel CMCI (Corrected Machine Check Interrupt) is a new
feature on Nehalem CPUs. It allows the CPU to trigger
interrupts on corrected events, which allows faster
reaction to them instead of with the traditional
polling timer.
Also use CMCI to discover shared banks. Machine check banks
can be shared by CPU threads or even cores. Using the CMCI enable
bit it is possible to detect the fact that another CPU already
saw a specific bank. Use this to assign shared banks only
to one CPU to avoid reporting duplicated events.
On CPU hot unplug bank sharing is re discovered. This is done
using a thread that cycles through all the CPUs.
To avoid races between the poller and CMCI we only poll
for banks that are not CMCI capable and only check CMCI
owned banks on a interrupt.
The shared banks ownership information is currently only used for
CMCI interrupts, not polled banks.
The sharing discovery code follows the algorithm recommended in the
IA32 SDM Vol3a 14.5.2.1
The CMCI interrupt handler just calls the machine check poller to
pick up the machine check event that caused the interrupt.
I decided not to implement a separate threshold event like
the AMD version has, because the threshold is always one currently
and adding another event didn't seem to add any value.
Some code inspired by Yunhong Jiang's Xen implementation,
which was in term inspired by a earlier CMCI implementation
by me.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: New register definitions only
CMCI means support for raising an interrupt on a corrected machine
check event instead of having to poll for it. It's a new feature in
Intel Nehalem CPUs available on some machine check banks.
For details see the IA32 SDM Vol3a 14.5
Define the registers for it as a preparation for further patches.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Define a per cpu bitmap that contains the banks polled by the machine
check poller. This is needed for the CMCI code in the next patches
to be able to disable polling on specific banks.
The bank by default contains all banks, so there is no behaviour
change. Only future code will remove some banks from the polling
set.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup; preparation for feature
The mce_amd_64 code has an own private MC threshold vector with an own
interrupt handler. Since Intel needs a similar handler
it makes sense to share the vector because both can not
be active at the same time.
I factored the common APIC handler code into a separate file which can
be used by both the Intel or AMD MC code.
This is needed for the next patch which adds an Intel specific
CMCI handler.
This patch should be a nop for AMD, it just moves some code
around.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Cleanup (code movement)
Move MAX_NR_BANKS into mce.h because it's needed there
for followup patches.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
While the introduction of __copy_from_user_nocache (see commit:
0812a579c9) may have been an improvement
for sufficiently large writes, there is evidence to show that it is
deterimental for small writes. Unixbench's fstime test gives the
following results for 256 byte writes with MAX_BLOCK of 2000:
2.6.29-rc6 ( 5 samples, each in KB/sec ):
283750, 295200, 294500, 293000, 293300
2.6.29-rc6 + this patch (5 samples, each in KB/sec):
313050, 3106750, 293350, 306300, 307900
2.6.18
395700, 342000, 399100, 366050, 359850
See w_test() in src/fstime.c in unixbench version 4.1.0. Basically, the above test
consists of counting how much we can write in this manner:
alarm(10);
while (!sigalarm) {
for (f_blocks = 0; f_blocks < 2000; ++f_blocks) {
write(f, buf, 256);
}
lseek(f, 0L, 0);
}
Note, there are other components to the write syscall regression
that are not addressed here.
Signed-off-by: Salman Qazi <sqazi@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: minor change to populate_extra_pte() and addition of pmd flavor
Update populate_extra_pte() to return pointer to the pte_t for the
specified address and add populate_extra_pmd() which only populates
till the pmd and returns pointer to the pmd entry for the address.
For 64bit, pud/pmd/pte fill functions are separated out from
set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and
populate_extra_{pte|pmd}().
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleaner and consistent bootmem wrapping
By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
arch-specific wrappers for bootmem allocation. However, this is done
a bit strangely in that only the high level convenience macros can be
changed while lower level, but still exported, interface functions
can't be wrapped. This not only is messy but also leads to strange
situation where alloc_bootmem() does what the arch wants it to do but
the equivalent __alloc_bootmem() call doesn't although they should be
able to be used interchangeably.
This patch updates bootmem such that archs can override / wrap the
backend function - alloc_bootmem_core() instead of the highlevel
interface functions to allow simpler and consistent wrapping. Also,
HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@saeurebad.de>
Impact: cleanup
Make x86_quirks support more transparent. The highlevel
methods are now named:
extern void x86_quirk_pre_intr_init(void);
extern void x86_quirk_intr_init(void);
extern void x86_quirk_trap_init(void);
extern void x86_quirk_pre_time_init(void);
extern void x86_quirk_time_init(void);
This makes it clear that if some platform extension has to
do something here that it is considered ... weird, and is
discouraged.
Also remove arch_hooks.h and move it into setup.h (and other
header files where appropriate).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: remove dead code
Remove:
- pre_setup_arch_hook()
- mca_nmi_hook()
If needed they can be added back via an x86_quirk handler.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: remove unused/broken code
The Voyager subarch last built successfully on the v2.6.26 kernel
and has been stale since then and does not build on the v2.6.27,
v2.6.28 and v2.6.29-rc5 kernels.
No actual users beyond the maintainer reported this breakage.
Patches were sent and most of the fixes were accepted but the
discussion around how to do a few remaining issues cleanly
fizzled out with no resolution and the code remained broken.
In the v2.6.30 x86 tree development cycle 32-bit subarch support
has been reworked and removed - and the Voyager code, beyond the
build problems already known, needs serious and significant
changes and probably a rewrite to support it.
CONFIG_X86_VOYAGER has been marked BROKEN then. The maintainer has
been notified but no patches have been sent so far to fix it.
While all other subarchs have been converted to the new scheme,
voyager is still broken. We'd prefer to receive patches which
clean up the current situation in a constructive way, but even in
case of removal there is no obstacle to add that support back
after the issues have been sorted out in a mutually acceptable
fashion.
So remove this inactive code for now.
Signed-off-by: Ingo Molnar <mingo@elte.hu>