linux-pinenote

Author	SHA1	Message	Date
Maciej W. Rozycki	49a66a0bce	x86: I/O APIC: Always report how the timer has been set up Following recent (and less so) issues with the 8254 timer when routed through the I/O or local APIC, always report which configurations have been tried and which one has been set up eventually. This is so that logs posted by people for some other reason can be used as a cross-reference when investigating any possible future problems. The change unifies messages printed on 32-bit and 64-bit platforms and adds trailing newlines (removes leading ones), so that proper log level annotation can be used and any possible interspersed output will not cause a mess. I have chosen to use apic_printk(APIC_QUIET, ...) rather than printk(...) so that the distinction of these messages is maintained making possible future decisions about changes in this area easier. A change posted separately making apic_verbosity unsigned removes any extra code that would otherwise be generated as a result of this design decision. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:27:47 +02:00
Maciej W. Rozycki	baa1318841	x86: APIC: Make apic_verbosity unsigned As a microoptimisation, make apic_verbosity unsigned. This will make apic_printk(APIC_QUIET, ...) expand into just printk(...) with the surrounding condition and a reference to apic_verbosity removed. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:27:43 +02:00
Maciej W. Rozycki	17c44697f2	x86: I/O APIC: Include <asm/i8259.h> required by some code Include <asm/i8259.h> for i8259A_lock used in print_PIC() -- #if-0-ed out by default. The 32-bit version gets it right already. The plan is to enable this code with "apic=debug" eventually. This will aid with debugging strange problems without the need to ask people to apply patches. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:27:38 +02:00
Cyrill Gorcunov	836c129de9	x86: apic_32 - introduce calibrate_APIC_clock Introduce calibrate_APIC_clock so it could help in further 32/64bit apic code merging. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: macro@linux-mips.org Cc: yhlu.kernel@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:17:30 +02:00
Cyrill Gorcunov	89b3b1f41b	x86: apic_64 - make calibrate_APIC_clock to return error code Make calibration_result to return error and check calibration_result to be sufficient inside calibrate_APIC_clock. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: macro@linux-mips.org Cc: yhlu.kernel@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:17:29 +02:00
Yinghai Lu	caadbdce24	x86: enable memory tester support on 32-bit only supports memory below max_low_pfn. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:11:58 +02:00
Yinghai Lu	1f067167a8	x86: seperate memtest from init_64.c it's separate functionality that deserves its own file. This also prepares 32-bit memtest support. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:10:27 +02:00
Hiroshi Shimamoto	fbdb7da91b	x86_64: ia32_signal.c: use macro instead of immediate Make and use macro FIX_EFLAGS, instead of immediate value 0x40DD5 in ia32_restore_sigcontext(). Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 13:54:08 +02:00
Ingo Molnar	cdbfc557c4	Merge branch 'linus' into x86/cleanups	2008-07-18 13:53:16 +02:00
Jeremy Fitzhardinge	95c7c23b06	xen: report hypervisor version Various versions of the hypervisor have differences in what ABIs and features they support. Print some details into the boot log to help with remote debugging. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 13:50:42 +02:00
Ingo Molnar	2fb5e1e101	Merge branch 'linus' into x86/paravirt-spinlocks Conflicts: arch/x86/kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 13:41:27 +02:00
Maciej W. Rozycki	593f4a788e	x86: APIC: remove apic_write_around(); use alternatives Use alternatives to select the workaround for the 11AP Pentium erratum for the affected steppings on the fly rather than build time. Remove the X86_GOOD_APIC configuration option and replace all the calls to apic_write_around() with plain apic_write(), protecting accesses to the ESR as appropriate due to the 3AP Pentium erratum. Remove apic_read_around() and all its invocations altogether as not needed. Remove apic_write_atomic() and all its implementing backends. The use of ASM_OUTPUT2() is not strictly needed for input constraints, but I have used it for readability's sake. I had the feeling no one else was brave enough to do it, so I went ahead and here it is. Verified by checking the generated assembly and tested with both a 32-bit and a 64-bit configuration, also with the 11AP "feature" forced on and verified with gdb on /proc/kcore to work as expected (as an 11AP machines are quite hard to get hands on these days). Some script complained about the use of "volatile", but apic_write() needs it for the same reason and is effectively a replacement for writel(), so I have disregarded it. I am not sure what the policy wrt defconfig files is, they are generated and there is risk of a conflict resulting from an unrelated change, so I have left changes to them out. The option will get removed from them at the next run. Some testing with machines other than mine will be needed to avoid some stupid mistake, but despite its volume, the change is not really that intrusive, so I am fairly confident that because it works for me, it will everywhere. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 12:51:21 +02:00
Yinghai Lu	29cbeb0e17	x86: use cpu_clear in remove_cpu_from_maps Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 12:20:28 +02:00
Ingo Molnar	cd569ef5d6	Merge branch 'linus' into x86/urgent	2008-07-18 12:20:23 +02:00
Ingo Molnar	48ae744434	Merge branch 'linus' into x86/step	2008-07-18 10:14:56 +02:00
Ingo Molnar	6879827f4e	x86: remove arch/x86/kernel/smpcommon_32.c Yinghai Lu noticed that arch/x86/kernel/smpcommon_32.c got renamed to arch/x86/kernel/smpcommon.c but the old almost-empty file stayed around. Zap it. Reported-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 01:21:53 +02:00
Ingo Molnar	64d206d896	x86: rename CONFIG_NONPROMISC_DEVMEM to CONFIG_PROMISC_DEVMEM Linus observed: > The real bug is that we shouldn't have "double negatives", and > certainly not negative config options. Making that "promiscuous > /dev/mem" option a negated thing as a config option was bad. right ... lets rename this option. There should never be a negation in config options. [ that reminds me of CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER, but that is for another commit ;-) ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 00:28:57 +02:00
Ingo Molnar	393d81aa02	Merge branch 'linus' into xen-64bit	2008-07-17 23:57:20 +02:00
H. Peter Anvin	4fdf08b5bf	x86: unify and correct the GDT_ENTRY() macro Merge the GDT_ENTRY() macro between arch/x86/boot/pm.c and arch/x86/kernel/acpi/sleep.c and put the new one in <asm-x86/segment.h>. While we're at it, correct the bitmasks for the limit and flags. The new version relies on using ULL constants in order to cause type promotion rather than explicit casts; this avoids having to include <linux/types.h> in <asm-x86/segments.h>. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-17 11:29:24 -07:00
Linus Torvalds	2b04be7e8a	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix asm/e820.h for userspace inclusion x86: fix numaq_tsc_disable x86: fix kernel_physical_mapping_init() for large x86 systems	2008-07-17 10:38:59 -07:00
Linus Torvalds	bdec6cace4	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ftrace: do not trace library functions ftrace: do not trace scheduler functions ftrace: fix lockup with MAXSMP ftrace: fix merge buglet	2008-07-17 10:37:10 -07:00
Yinghai Lu	9354094a95	x86: fix numaq_tsc_disable fix: arch/x86/kernel/numaq_32.c: In function ‘numaq_tsc_disable’: arch/x86/kernel/numaq_32.c:99: warning: ‘return’ with a value, in function returning void Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-17 19:27:08 +02:00
Jeremy Fitzhardinge	93a0886e23	x86, xen, power: fix up config dependencies on PM Xen save/restore needs bits of code enabled by PM_SLEEP, and PM_SLEEP depends on PM. So make XEN_SAVE_RESTORE depend on PM and PM_SLEEP depend on XEN_SAVE_RESTORE. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-17 19:25:20 +02:00
Ingo Molnar	fab3b58d3b	x86 reboot quirks: add Dell Precision WorkStation T5400 as reported in: "reboot=bios is mandatory on Dell T5400 server." http://bugzilla.kernel.org/show_bug.cgi?id=11108 add a DMI reboot quirk. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org>	2008-07-17 13:56:15 +02:00
Ingo Molnar	8e9509c827	ftrace: fix merge buglet -tip testing found a bootup hang here: initcall anon_inode_init+0x0/0x130 returned 0 after 0 msecs calling acpi_event_init+0x0/0x57 the bootup should have continued with: initcall acpi_event_init+0x0/0x57 returned 0 after 45 msecs but it hung hard there instead. bisection led to this commit: \| commit `5806b81ac1` \| Merge: d14c8a6... 6712e29... \| Author: Ingo Molnar <mingo@elte.hu> \| Date: Mon Jul 14 16:11:52 2008 +0200 \| Merge branch 'auto-ftrace-next' into tracing/for-linus turns out that i made this mistake in the merge: ifdef CONFIG_FTRACE # Do not profile debug utilities CFLAGS_REMOVE_tsc_64.o = -pg CFLAGS_REMOVE_tsc_32.o = -pg those two files got unified meanwhile - so the dont-profile annotation got lost. The proper rule is: CFLAGS_REMOVE_tsc.o = -pg i guess this could have been caught sooner if the CFLAGS_REMOVE* kbuild rule aborted the build if it met a target that does not exist anymore? Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-17 13:26:50 +02:00
Linus Torvalds	dc7c65db28	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits) Revert "x86/PCI: ACPI based PCI gap calculation" PCI: remove unnecessary volatile in PCIe hotplug struct controller x86/PCI: ACPI based PCI gap calculation PCI: include linux/pm_wakeup.h for device_set_wakeup_capable PCI PM: Fix pci_prepare_to_sleep x86/PCI: Fix PCI config space for domains > 0 Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n PCI: Simplify PCI device PM code PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep PCI ACPI: Rework PCI handling of wake-up ACPI: Introduce new device wakeup flag 'prepared' ACPI: Introduce acpi_device_sleep_wake function PCI: rework pci_set_power_state function to call platform first PCI: Introduce platform_pci_power_manageable function ACPI: Introduce acpi_bus_power_manageable function PCI: make pci_name use dev_name PCI: handle pci_name() being const PCI: add stub for pci_set_consistent_dma_mask() PCI: remove unused arch pcibios_update_resource() functions PCI: fix pci_setup_device()'s sprinting into a const buffer ... Fixed up conflicts in various files (arch/x86/kernel/setup_64.c, arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c, drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86 and ACPI updates manually.	2008-07-16 17:25:46 -07:00
Jesse Barnes	58b6e55384	Revert "x86/PCI: ACPI based PCI gap calculation" This reverts commit `809d9a8f93`. This one isn't quite ready for prime time. It needs more testing and additional feedback from the ACPI guys.	2008-07-16 16:21:47 -07:00
Zhao Yakui	da5e09a1b3	ACPI : Create "idle=nomwait" bootparam "idle=nomwait" disables the use of the MWAIT instruction from both C1 (C1_FFH) and deeper (C2C3_FFH) C-states. When MWAIT is unavailable, the BIOS and OS generally negotiate to use the HALT instruction for C1, and use IO accesses for deeper C-states. This option is useful for power and performance comparisons, and also to work around BIOS bugs where broken MWAIT support is advertised. http://bugzilla.kernel.org/show_bug.cgi?id=10807 http://bugzilla.kernel.org/show_bug.cgi?id=10914 Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:05 +02:00
Zhao Yakui	c1e3b377ad	ACPI: Create "idle=halt" bootparam "idle=halt" limits the idle loop to using the halt instruction. No MWAIT, no IO accesses, no C-states deeper than C1. If something is broken in the idle code, "idle=halt" is a less severe workaround than "idle=poll" which disables all power savings. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:05 +02:00
Zhao Yakui	5b53496a5a	ACPI: Disable the C2C3_FFH access mode HW has no MWAIT support `991528d734` (ACPI: Processor native C-states using MWAIT) started passing C2C3_FFH to _PDC to tell the BIOS that Linux supports MWAIT for deep C-states. However, we should first double check with the hardware that it actually supports MWAIT before potentially exposing a BIOS bug of an MWAIT _CST on HW that doesn't support MWAIT. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Bob Moore	19d0cfe9dd	ACPICA: Update DMAR and SRAT table definitions Synchronized tables with current specifications. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Roland McGrath	380fdd7585	x86 ptrace: user-sets-TF nits This closes some arcane holes in single-step handling that can arise only when user programs set TF directly (via popf or sigreturn) and then use vDSO (syscall/sysenter) system call entry. In those entry paths, the clear_TF_reenable case hits and we must check TIF_SINGLESTEP to be sure our bookkeeping stays correct wrt the user's view of TF. Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:17 -07:00
Roland McGrath	d4d6715016	x86 ptrace: unify syscall tracing This unifies and cleans up the syscall tracing code on i386 and x86_64. Using a single function for entry and exit tracing on 32-bit made the do_syscall_trace() into some terrible spaghetti. The logic is clear and simple using separate syscall_trace_enter() and syscall_trace_leave() functions as on 64-bit. The unification adds PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP support on x86_64, for 32-bit ptrace() callers and for 64-bit ptrace() callers tracing either 32-bit or 64-bit tasks. It behaves just like 32-bit. Changing syscall_trace_enter() to return the syscall number shortens all the assembly paths, while adding the SYSEMU feature in a simple way. Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:17 -07:00
Roland McGrath	64f0973319	x86 ptrace: unify TIF_SINGLESTEP This unifies the treatment of TIF_SINGLESTEP on i386 and x86_64. The bit is now excluded from _TIF_WORK_MASK on i386 as it has been on x86_64. This means the do_notify_resume() path using it is never used, so TIF_SINGLESTEP is not cleared on returning to user mode. Both now leave TIF_SINGLESTEP set when returning to user, so that it's already set on an int $0x80 system call entry. This removes the need for testing TF on the system_call path. Doing it this way fixes the regression for PTRACE_SINGLESTEP into a sigreturn syscall, introduced by commit `1e2e99f0e4`. The clear_TF_reenable case that sets TIF_SINGLESTEP can only happen on a non-exception kernel entry, i.e. sysenter/syscall instruction. That will always get to the syscall exit tracing path. Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:16 -07:00
Roland McGrath	6718d0d6da	x86 ptrace: block-step fix The enable_single_step() logic bails out early if TF is already set. That skips some of the bookkeeping that keeps things straight. This makes PTRACE_SINGLEBLOCK break the behavior of a user task that was already setting TF itself in user mode. Fix the bookkeeping to notice the old TF setting as it should. Test case at: http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/step-jump-cont-strict.c?cvsroot=systemtap Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:16 -07:00
Jack Steiner	e22146e610	x86: fix kernel_physical_mapping_init() for large x86 systems Fix bug in kernel_physical_mapping_init() that causes kernel page table to be built incorrectly for systems with greater than 512GB of memory. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 18:27:36 +02:00
Ingo Molnar	77e442461c	Merge branch 'linus' into x86/kprobes	2008-07-16 13:11:29 +02:00
Ingo Molnar	34646bca47	x86, paravirt-spinlocks: fix boot hang the paravirt-spinlock patches caused a boot hang with this config: http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad i have bisected it down to: \| commit e17b58c2e85bc2ad2afc07fb8d898017c2b75ed1 \| Author: Jeremy Fitzhardinge <jeremy@goop.org> \| Date: Mon Jul 7 12:07:53 2008 -0700 \| \| xen: implement Xen-specific spinlocks i.e. applying that patch alone causes the hang. The hang happens in the ftrace self-test: initcall utsname_sysctl_init+0x0/0x19 returned 0 after 0 msecs calling init_sched_switch_trace+0x0/0x4c Testing tracer sched_switch: PASSED initcall init_sched_switch_trace+0x0/0x4c returned 0 after 167 msecs calling init_function_trace+0x0/0x12 Testing tracer ftrace: [hard hang] it should have continued like this: Testing tracer ftrace: PASSED initcall init_function_trace+0x0/0x12 returned 0 after 198 msecs calling init_irqsoff_tracer+0x0/0x14 Testing tracer irqsoff: PASSED initcall init_irqsoff_tracer+0x0/0x14 returned 0 after 3 msecs calling init_mmio_trace+0x0/0x12 initcall init_mmio_trace+0x0/0x12 returned 0 after 0 msecs the problem is that such lowlevel primitives as spinlocks should never be built with -pg (which ftrace does). Marking paravirt.o as non-pg and marking all spinlock ops as always-inline solve the hang. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Ingo Molnar	9af98578d6	x86: paravirt spinlocks, modular build fix fix: MODPOST 408 modules ERROR: "pv_lock_ops" [net/dccp/dccp.ko] undefined! ERROR: "pv_lock_ops" [fs/jbd2/jbd2.ko] undefined! ERROR: "pv_lock_ops" [drivers/media/common/saa7146_vv.ko] undefined! Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Ingo Molnar	4bb689eee1	x86: paravirt spinlocks, !CONFIG_SMP build fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	2d9e1e2f58	xen: implement Xen-specific spinlocks The standard ticket spinlocks are very expensive in a virtual environment, because their performance depends on Xen's scheduler giving vcpus time in the order that they're supposed to take the spinlock. This implements a Xen-specific spinlock, which should be much more efficient. The fast-path is essentially the old Linux-x86 locks, using a single lock byte. The locker decrements the byte; if the result is 0, then they have the lock. If the lock is negative, then locker must spin until the lock is positive again. When there's contention, the locker spin for 2^16[] iterations waiting to get the lock. If it fails to get the lock in that time, it adds itself to the contention count in the lock and blocks on a per-cpu event channel. When unlocking the spinlock, the locker looks to see if there's anyone blocked waiting for the lock by checking for a non-zero waiter count. If there's a waiter, it traverses the per-cpu "lock_spinners" variable, which contains which lock each CPU is waiting on. It picks one CPU waiting on the lock and sends it an event to wake it up. This allows efficient fast-path spinlock operation, while allowing spinning vcpus to give up their processor time while waiting for a contended lock. [] 2^16 iterations is threshold at which 98% locks have been taken according to Thomas Friebel's Xen Summit talk "Preventing Guests from Spinning Around". Therefore, we'd expect the lock and unlock slow paths will only be entered 2% of the time. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	56397f8dad	xen: use lock-byte spinlock implementation Switch to using the lock-byte spinlock implementation, to avoid the worst of the performance hit from ticket locks. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	8efcbab674	paravirt: introduce a "lock-byte" spinlock implementation Implement a version of the old spinlock algorithm, in which everyone spins waiting for a lock byte. In order to be compatible with the ticket-lock's use of a zero initializer, this uses the convention of '0' for unlocked and '1' for locked. This algorithm is much better than ticket locks in a virtual envionment, because it doesn't interact badly with the vcpu scheduler. If there are multiple vcpus spinning on a lock and the lock is released, the next vcpu to be scheduled will take the lock, rather than cycling around until the next ticketed vcpu gets it. To use this, you must call paravirt_use_bytelocks() very early, before any spinlocks have been taken. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	74d4affde8	x86/paravirt: add hooks for spinlock operations Ticket spinlocks have absolutely ghastly worst-case performance characteristics in a virtual environment. If there is any contention for physical CPUs (ie, there are more runnable vcpus than cpus), then ticket locks can cause the system to end up spending 90+% of its time spinning. The problem is that (v)cpus waiting on a ticket spinlock will be granted access to the lock in strict order they got their tickets. If the hypervisor scheduler doesn't give the vcpus time in that order, they will burn timeslices waiting for the scheduler to give the right vcpu some time. In the worst case it could take O(n^2) vcpu scheduler timeslices for everyone waiting on the lock to get it, not counting new cpus trying to take the lock while the log-jam is sorted out. These hooks allow a paravirt backend to replace the spinlock implementation. At the very least, this could revert the implementation back to the old lock algorithm, which allows the next scheduled vcpu to take the lock, and has basically fairly good performance. It also allows the spinlocks to take advantages of the hypervisor features to make locks more efficient (spin and block, for example). The cost to native execution is an extra direct call when using a spinlock function. There's no overhead if CONFIG_PARAVIRT is turned off. The lock structure is fixed at a single "unsigned int", initialized to zero, but the spinlock implementation can use it as it wishes. Thanks to Thomas Friebel's Xen Summit talk "Preventing Guests from Spinning Around" for pointing out this problem. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:52 +02:00
Jeremy Fitzhardinge	094029479b	x86_64: adjust exception frame on paranoid exceptions Exceptions using paranoidentry need to have their exception frames adjusted explicitly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-16 11:08:58 +02:00
Jeremy Fitzhardinge	d5303b811b	x86: xen: no need to disable vdso32 Now that the vdso32 code can cope with both syscall and sysenter missing for 32-bit compat processes, just disable the features without disabling vdso altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-16 11:08:44 +02:00
Jeremy Fitzhardinge	6a52e4b1cd	x86_64: further cleanup of 32-bit compat syscall mechanisms AMD only supports "syscall" from 32-bit compat usermode. Intel and Centaur(?) only support "sysenter" from 32-bit compat usermode. Set the X86 feature bits accordingly, and set up the vdso in accordance with those bits. On the offchance we run on in a 64-bit environment which supports neither syscall nor sysenter from 32-bit mode, then fall back to the int $0x80 vdso. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-16 11:08:27 +02:00
Ingo Molnar	71415c6a08	x86, xen, vdso: fix build error fix: arch/x86/xen/built-in.o: In function `xen_enable_syscall': (.cpuinit.text+0xdb): undefined reference to `sysctl_vsyscall32' Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:07:58 +02:00
Jeremy Fitzhardinge	62541c3766	xen64: disable 32-bit syscall/sysenter if not supported. Old versions of Xen (3.1 and before) don't support sysenter or syscall from 32-bit compat userspaces. If we can't set the appropriate syscall callback, then disable the corresponding feature bit, which will cause the vdso32 setup to fall back appropriately. Linux assumes that syscall is always available to 32-bit userspace, and installs it by default if sysenter isn't available. In that case, we just disable vdso altogether, forcing userspace libc to fall back to int $0x80. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:07:44 +02:00
Ingo Molnar	6596f24223	Revert "x86_64: there's no need to preallocate level1_fixmap_pgt" This reverts commit 033786969d1d1b5af12a32a19d3a760314d05329. Suresh Siddha reported that this broke booting on his 2GB testbox. Reported-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:07:30 +02:00

... 11 12 13 14 15 ...

3984 commits