Commit graph

3984 commits

Author SHA1 Message Date
Vegard Nossum
4461145ef1 x86, lockdep: fix "WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()"
Alessandro Suardi reported:
> Recently upgraded my FC6 desktop to Fedora 9; with the
>  latest nautilus RPM updates my VNC session went nuts
>  with nautilus pegging the CPU for everything that breathed.
>
> I now reverted to an earlier nautilus package, but during
>  the peak CPU period my kernel spat this:
>
> [314185.623294] ------------[ cut here ]------------
> [314185.623414] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()
> [314185.623514] Modules linked in: iptable_filter ip_tables x_tables
> sunrpc ipv6 fuse snd_via82xx snd_ac97_codec ac97_bus snd_mpu401_uart
> snd_rawmidi via686a hwmon parport_pc sg parport uhci_hcd ehci_hcd
> [314185.623924] Pid: 12314, comm: nautilus Not tainted 2.6.26-rc5-git2 #4
> [314185.624021]  [<c0115b95>] warn_on_slowpath+0x41/0x7b
> [314185.624021]  [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021]  [<c0128396>] ? up_read+0x16/0x28
> [314185.624021]  [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021]  [<c012fa33>] ? __lock_acquire+0xbb4/0xbc3
> [314185.624021]  [<c012d0a0>] check_flags+0x4c/0x128
> [314185.624021]  [<c012fa73>] lock_acquire+0x31/0x7d
> [314185.624021]  [<c0128cf6>] __atomic_notifier_call_chain+0x30/0x80
> [314185.624021]  [<c0128cc6>] ? __atomic_notifier_call_chain+0x0/0x80
> [314185.624021]  [<c0128d52>] atomic_notifier_call_chain+0xc/0xe
> [314185.624021]  [<c0128d81>] notify_die+0x2d/0x2f
> [314185.624021]  [<c01043b0>] do_int3+0x1f/0x4d
> [314185.624021]  [<c02f2d3b>] int3+0x27/0x2c
> [314185.624021]  =======================
> [314185.624021] ---[ end trace 1923f65a2d7bb246 ]---
> [314185.624021] possible reason: unannotated irqs-off.
> [314185.624021] irq event stamp: 488879
> [314185.624021] hardirqs last  enabled at (488879): [<c0102d67>]
> restore_nocheck+0x12/0x15
> [314185.624021] hardirqs last disabled at (488878): [<c0102dca>]
> work_resched+0x19/0x30
> [314185.624021] softirqs last  enabled at (488876): [<c011a1ba>]
> __do_softirq+0xa6/0xac
> [314185.624021] softirqs last disabled at (488865): [<c010476e>]
> do_softirq+0x57/0xa6
>
> I didn't seem to find it with some googling, so here it is.
>
> I was incidentally ltracing that process to try and find out
>  what was gulping down that much CPU (sorry, no idea
>  whether ltrace and the WARNING happened at the same
>  time or which came first) and:

Yeah, this is extremely likely to be the source of the warning.

The warning should be harmless, however.

> Box is my trusty noname K7-800, 512MB RAM; if there's
>  anything else useful I might be able to provide, just ask.

It would be interesting to see where the int3 comes from.  Too bad,
lockdep doesn't provide the register dump. The stacktrace also doesn't
go further than the int3(), I wonder if this int3 came from userspace?
The ltrace readme says "software breakpoints, like gdb", so I guess
this is the case. Yep, seems like it.

This looks relevant:

| commit fb1dac909d
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date:   Wed Jan 16 09:51:59 2008 +0100
|
|     lockdep: more hardirq annotations for notify_die()

I'm attaching a similarly-looking patch for this case (DO_VM86_ERROR),
though I suspect it might be missing for the other cases
(DO_ERROR/DO_ERROR_INFO) as well.

Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:19 +02:00
David Howells
eb53e9f3ea x86: fix an incompatible pointer type warning on 64-bit compilations
Fix an incompatible pointer type warning on x86_64 compilations.
early_memtest() is passing a u64* to find_e820_area_size() which is expecting
an unsigned long.  Change t_start and t_size to unsigned long as those are
also 64-bit types on x88_64.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:15 +02:00
Peter Zijlstra
e32e58a96d x86: fix lockdep warning during suspend-to-ram
Andrew Morton wrote:

> I've been seeing the below for a long time during suspend-to-ram on the Vaio.
>
>
> PM: Syncing filesystems ... done.
> PM: Preparing system for mem sleep
> Freezing user space processes ... <4>------------[ cut here ]------------
> WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x127()
> Modules linked in: i915 drm ipw2200 sonypi ipv6 autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables acpi_cpufreq nvram ohci1394 ieee1394 ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy sr_mod snd_seq_oss cdrom snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ieee80211 pcspkr ieee80211_crypt snd_pcm i2c_i801 snd_timer i2c_core ide_pci_generic piix snd soundcore snd_page_alloc button ext3 jbd ide_disk ide_core [last unloaded: ipw2200]
> Pid: 3250, comm: zsh Not tainted 2.6.26-rc5 #1
>  [<c011c5f5>] warn_on_slowpath+0x41/0x6d
>  [<c01080e6>] ? native_sched_clock+0x82/0x96
>  [<c013789c>] ? mark_held_locks+0x41/0x5c
>  [<c0315688>] ? _spin_unlock_irqrestore+0x36/0x58
>  [<c0137a29>] ? trace_hardirqs_on+0xe6/0x10d
>  [<c0138637>] ? __lock_acquire+0xae3/0xb2b
>  [<c0313413>] ? schedule+0x39b/0x3b4
>  [<c0135596>] check_flags+0x4c/0x127
>  [<c01386b9>] lock_acquire+0x3a/0x86
>  [<c0315075>] _spin_lock+0x26/0x53
>  [<c0140660>] ? refrigerator+0x13/0xc3
>  [<c0140660>] refrigerator+0x13/0xc3
>  [<c012684a>] get_signal_to_deliver+0x3c/0x31e
>  [<c0102fe7>] do_notify_resume+0x91/0x6ee
>  [<c01359fd>] ? lock_release_holdtime+0x50/0x56
>  [<c0315688>] ? _spin_unlock_irqrestore+0x36/0x58
>  [<c0235d24>] ? read_chan+0x0/0x58c
>  [<c0137a29>] ? trace_hardirqs_on+0xe6/0x10d
>  [<c0315694>] ? _spin_unlock_irqrestore+0x42/0x58
>  [<c0230afa>] ? tty_ldisc_deref+0x5c/0x63
>  [<c0233104>] ? tty_read+0x66/0x98
>  [<c014b3f0>] ? audit_syscall_exit+0x2aa/0x2c5
>  [<c0109430>] ? do_syscall_trace+0x6b/0x16f
>  [<c0103a9c>] work_notifysig+0x13/0x1b
>  =======================
> ---[ end trace 25b49fe59a25afa5 ]---
> possible reason: unannotated irqs-off.
> irq event stamp: 58919
> hardirqs last  enabled at (58919): [<c0103afd>] syscall_exit_work+0x11/0x26

Joy - I so love entry.S

Best I can make of it:

syscall_exit_work
  resume_userspace
    DISABLE_INTERRUPTS
    (no TRACE_IRQS_OFF)
      work_pending
        work_notifysig
          do_notify_resume()
            do_signal()
              get_signal_to_deliver()
                try_to_freeze()
                  refrigerator()
                    task_lock() -> check_flags() -> BANG

The normal path is:

syscall_exit_work
  resume_userspace
    DISABLE_INTERRUPTS
    restore_all
      TRACE_IRQS_IRET
      iret

No idea why that would not warn..

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:09 +02:00
Manish Katiyar
52aaa12fbe x86: fix unused variable 'loops' warning in arch/x86/boot/a20.c
Following patch fixes the below warning message :
arch/x86/boot/a20.c:118: warning: unused variable 'loops'

Signed-off-by : Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:05 +02:00
Ingo Molnar
0b6a39f7eb Revert "x86: fix ioapic bug again"
This reverts commit 6e908947b4.

Németh Márton reported:

| there is a problem in 2.6.26-rc3 which was not there in case of
| 2.6.25: the CPU wakes up ~90,000 times per sec instead of ~60 per sec.
|
| I also "git bisected" the problem, the result is:
|
| 6e908947b4 is first bad commit
| commit 6e908947b4
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Fri Mar 21 14:32:36 2008 +0100
|
|     x86: fix ioapic bug again

the original problem is fixed by Maciej W. Rozycki in the tip/x86/apic
branch (confirmed by Márton), but those changes are too intrusive for
v2.6.26 so we'll go for the less intrusive (repeated) revert now.

Reported-and-bisected-by: Németh Márton <nm127@freemail.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:26:28 +02:00
Joe Korty
86b2b70e15 x86: fix asm warning in head_32.S
On Mon, May 19, 2008 at 04:10:02PM -0700, Linus Torvalds wrote:
> It also causes these warnings on 32-bit PAE:
>
> 	  AS      arch/x86/kernel/head_32.o
> 	arch/x86/kernel/head_32.S: Assembler messages:
> 	arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
> 	arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
>
> and I do not see why (the end result seems to be identical).

Fix head_32.S gcc bignum warnings when CONFIG_PAE=y.

    arch/x86/kernel/head_32.S: Assembler messages:
    arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
    arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed

The assembler was stumbling over the 64-bit constant 0x100000000 in the
KPMDS #define.

Testing: a cmp(1) on head_32.o before and after shows the binary is unchanged.

Signed-off-by: Joe Korty <joe.korty@ccur.com
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: "Siddha Suresh B" <suresh.b.siddha@intel.com>
Cc: bugme-daemon@bugzilla.kernel.org
Cc: airlied@linux.ie
Cc: "Barnes Jesse" <jesse.barnes@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:26:12 +02:00
Henry Nestler
b29c701dea x86: fix endless page faults in mount_block_root for Linux 2.6
Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-12 21:26:07 +02:00
Ingo Molnar
3703f39965 geode: fix modular build
-tip testing found this build bug:

 MODPOST 331 modules
 ERROR: "geode_mfgpt_toggle_event" [drivers/watchdog/geodewdt.ko] undefined!
 ERROR: "geode_mfgpt_alloc_timer" [drivers/watchdog/geodewdt.ko] undefined!
 make[1]: *** [__modpost] Error 1
 make: *** [modules] Error 2

with this config:

  http://redhat.com/~mingo/misc/config-Wed_Jun__4_18_01_59_CEST_2008.bad

export those symbols.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:25:51 +02:00
Linus Torvalds
dc10885d68 Merge branch 'core/iter-div' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/iter-div' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  always_inline timespec_add_ns
  add an inlined version of iter_div_u64_rem
  common implementation of iterative div/mod
2008-06-12 07:47:44 -07:00
Cyrill Gorcunov
f781b03c4b x86: touch_nmi_watchdog(): reset alert counters for supported nmi_watchdog modes only
The checking 'if nmi_watchdog > 0' (ie NMI_NONE) is quite fast but it
has a side effect - it's taken even if nmi_watchdog = NMI_DISABLED.

Nowadays nmi_watchdog is set up to NMI_NONE by default so this condition
is properly taken most the time but we better show this explicitly.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 15:14:54 +02:00
Rafael J. Wysocki
6703f6d10d x86, gart: add resume handling
If GART IOMMU is used on an AMD64 system, the northbridge registers
related to it should be restored during resume so that memory is not
corrupted.  Make gart_resume() handle that as appropriate.

Ref. http://lkml.org/lkml/2008/5/25/96 and the following thread.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 14:11:25 +02:00
Ingo Molnar
bb6dfb32f9 Merge branch 'linus' into x86/gart 2008-06-12 11:27:22 +02:00
Jeremy Fitzhardinge
f595ec964d common implementation of iterative div/mod
We have a few instances of the open-coded iterative div/mod loop, used
when we don't expcet the dividend to be much bigger than the divisor.
Unfortunately modern gcc's have the tendency to strength "reduce" this
into a full mod operation, which isn't necessarily any faster, and
even if it were, doesn't exist if gcc implements it in libgcc.

The workaround is to put a dummy asm statement in the loop to prevent
gcc from performing the transformation.

This patch creates a single implementation of this loop, and uses it
to replace the open-coded versions I know about.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:47:56 +02:00
Andreas Herrmann
499f8f84b8 x86: rename pat_wc_enabled to pat_enabled
BTW, what does pat_wc_enabled stand for? Does it mean
"write-combining"?

Currently it is used to globally switch on or off PAT support.
Thus I renamed it to pat_enabled.
I think this increases readability (and hope that I didn't miss
something).

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:27 +02:00
Andreas Herrmann
cd7a4e936d x86: PAT: fixed checkpatch errors (and whitespaces)
x86: PAT: fixed checkpatch errors (and whitespaces)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:24 +02:00
Andreas Herrmann
97cfab6ac4 x86: PAT: fix ambiguous paranoia check in pat_init()
Starting with commit 8d4a430085 (x86:
cleanup PAT cpu validation) the PAT CPU feature flag is not cleared
anymore. Now the error message

  "PAT enabled, but CPU feature cleared"

in pat_init() is misleading.

Furthermore the current code does not check for existence of the PAT
CPU feature flag if a CPU is whitelisted in validate_pat_support.

This patch clears pat_wc_enabled if boot CPU has no PAT feature flag
and adapts the paranoia check.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:22 +02:00
Andreas Herrmann
ee863ba7ab x86: unconditionally enable PAT for AMD CPUs
If PAT support is advertised it should just work. No errata known.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:20 +02:00
Venki Pallipadi
c26421d019 x86: fix Xorg crash with xf86MapVidMem error
Clarify the usage of mtrr_lookup() in PAT code, and to make PAT code
resilient to mtrr lookup problems.

Specifically, pat_x_mtrr_type() is restructured to highlight, under what
conditions we look for mtrr hint. pat_x_mtrr_type() uses a default type
when there are any errors in mtrr lookup (still maintaining the pat
consistency). And, reserve_memtype() highlights its usage ot mtrr_lookup
for request type of '-1' and also defaults in a sane way on any mtrr
lookup failure.

pat.c looks at mtrr type of a range to get a hint on what mapping type
to request when user/API: (1) hasn't specified any type (/dev/mem
mapping) and we do not want to take performance hit by always mapping
UC_MINUS. This will be the case for /dev/mem mappings used to map BIOS
area or ACPI region which are WB'able. In this case, as long as MTRR is
not WB, PAT will request UC_MINUS for such mappings.

(2) user/API requests WB mapping while in reality MTRR may have UC or
WC. In this case, PAT can map as WB (without checking MTRR) and still
effective type will be UC or WC. But, a subsequent request to map same
region as UC or WC may fail, as the region will get trackked as WB in
PAT list. Looking at MTRR hint helps us to track based on effective type
rather than what user requested. Again, here mtrr_lookup is only used as
hint and we fallback to WB mapping (as requested by user) as default.

In both cases, after using the mtrr hint, we still go through the
memtype list to make sure there are no inconsistencies among multiple
users.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Rufus & Azrael <rufus-azrael@numericable.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:03:55 +02:00
Linus Torvalds
da50ccc6a0 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (23 commits)
  ACPICA: fix stray va_end() caused by mis-merge
  ACPI: Reject below-freezing temperatures as invalid critical temperatures
  ACPICA: Fix for access to deleted object <regression>
  ACPICA: Fix to make _SST method optional
  ACPICA: Fix for Load operator, load table at the namespace root
  ACPICA: Ignore ACPI table signature for Load() operator
  ACPICA: Fix to allow zero-length ASL field declarations
  ACPI: use memory_read_from_buffer()
  bay: exit if notify handler cannot be installed
  dock.c remove trailing printk whitespace
  proper prototype for acpi_processor_tstate_has_changed()
  ACPI: handle invalid ACPI SLIT table
  PNPACPI: use _CRS IRQ descriptor length for _SRS
  pnpacpi: fix shareable IRQ encode/decode
  pnpacpi: fix IRQ flag decoding
  MAINTAINERS: update ACPI homepage
  ACPI 2.6.26-rc2: Add missing newline to DSDT/SSDT warning message
  ACPI: EC: Use msleep instead of udelay while waiting for event.
  thinkpad-acpi: fix LED handling on older ThinkPads
  thinkpad-acpi: fix initialization error paths
  ...
2008-06-11 17:16:32 -07:00
Fenghua Yu
39b8931b5c ACPI: handle invalid ACPI SLIT table
This is a SLIT sanity checking patch.  It moves slit_valid() function to
generic ACPI code and does sanity checking for both x86 and ia64.  It sets up
node_distance with LOCAL_DISTANCE and REMOTE_DISTANCE when hitting invalid
SLIT table on ia64.  It also cleans up unused variable localities in
acpi_parse_slit() on x86.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-06-11 19:13:46 -04:00
Linus Torvalds
a4df1ac12d Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: MMU: Fix is_empty_shadow_page() check
  KVM: MMU: Fix printk() format string
  KVM: IOAPIC: only set remote_irr if interrupt was injected
  KVM: MMU: reschedule during shadow teardown
  KVM: VMX: Clear CR4.VMXE in hardware_disable
  KVM: migrate PIT timer
  KVM: ppc: Report bad GFNs
  KVM: ppc: Use a read lock around MMU operations, and release it on error
  KVM: ppc: Remove unmatched kunmap() call
  KVM: ppc: add lwzx/stwz emulation
  KVM: ppc: Remove duplicate function
  KVM: s390: Fix race condition in kvm_s390_handle_wait
  KVM: s390: Send program check on access error
  KVM: s390: fix interrupt delivery
  KVM: s390: handle machine checks when guest is running
  KVM: s390: fix locking order problem in enable_sie
  KVM: s390: use yield instead of schedule to implement diag 0x44
  KVM: x86 emulator: fix hypercall return value on AMD
  KVM: ia64: fix zero extending for mmio ld1/2/4 emulation in KVM
2008-06-11 10:35:44 -07:00
Yinghai Lu
e3f2baebf4 PCI/x86: early dump pci conf space v2
Allows us to dump PCI space before any kernel changes have been made.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:52 -07:00
Yinghai Lu
e7891c733f PCI/x86: write_pci_config_byte fix offset
also add write_pci_config_16

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:51 -07:00
Rafael J. Wysocki
1eede070a5 Introduce new top level suspend and hibernation callbacks
Introduce 'struct pm_ops' and 'struct pm_ext_ops' ('ext' meaning
'extended') representing suspend and hibernation operations for bus
types, device classes, device types and device drivers.

Modify the PM core to use 'struct pm_ops' and 'struct pm_ext_ops'
objects, if defined, instead of the ->suspend(), ->resume(),
->suspend_late(), and ->resume_early() callbacks (the old callbacks
will be considered as legacy and gradually phased out).

The main purpose of doing this is to separate suspend (aka S2RAM and
standby) callbacks from hibernation callbacks in such a way that the
new callbacks won't take arguments and the semantics of each of them
will be clearly specified.  This has been requested for multiple
times by many people, including Linus himself, and the reason is that
within the current scheme if ->resume() is called, for example, it's
difficult to say why it's been called (ie. is it a resume from RAM or
from hibernation or a suspend/hibernation failure etc.?).

The second purpose is to make the suspend/hibernation callbacks more
flexible so that device drivers can handle more than they can within
the current scheme.  For example, some drivers may need to prevent
new children of the device from being registered before their
->suspend() callbacks are executed or they may want to carry out some
operations requiring the availability of some other devices, not
directly bound via the parent-child relationship, in order to prepare
for the execution of ->suspend(), etc.

Ultimately, we'd like to stop using the freezing of tasks for suspend
and therefore the drivers' suspend/hibernation code will have to take
care of the handling of the user space during suspend/hibernation.
That, in turn, would be difficult within the current scheme, without
the new ->prepare() and ->complete() callbacks.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:50 -07:00
Gary Hade
bb71ad8802 PCI: boot parameter to avoid expansion ROM memory allocation
Contention for scarce PCI memory resources has been growing
due to an increasing number of PCI slots in large multi-node
systems.  The kernel currently attempts by default to
allocate memory for all PCI expansion ROMs so there has
also been an increasing number of PCI memory allocation
failures seen on these systems.  This occurs because the
BIOS either (1) provides insufficient PCI memory resource
for all the expansion ROMs or (2) provides adequate PCI
memory resource for expansion ROMs but provides the
space in kernel unexpected BIOS assigned P2P non-prefetch
windows.

The resulting PCI memory allocation failures may be benign
when related to memory requests for expansion ROMs themselves
but in some cases they can occur when attempting to allocate
space for more critical BARs.  This can happen when a successful
expansion ROM allocation request consumes memory resource
that was intended for a non-ROM BAR.  We have seen this
happen during PCI hotplug of an adapter that contains a
P2P bridge where successful memory allocation for an
expansion ROM BAR on device behind the bridge consumed
memory that was intended for a non-ROM BAR on the P2P bridge.
In all cases the allocation failure messages can be very
confusing for users.

This patch provides a new 'pci=norom' kernel boot parameter
that can be used to disable the default PCI expansion ROM memory
resource allocation.  This provides a way to avoid the above
described issues on systems that do not contain PCI devices
for which drivers or user-level applications depend on the
default PCI expansion ROM memory resource allocation behavior.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:50 -07:00
Miklos Vajna
273c11270d x86/PCI: janitor work in irq.c
Wrapped long lines, removed trailing whitespaces, fixed case indentation
inside switch and so.

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:46 -07:00
Thomas Gleixner
00dba56465 x86: move more common idle functions/variables to process.c
more unification. Should cause no change in functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:29 +02:00
Thomas Gleixner
09fd4b4ef5 x86: use cpuid to check MWAIT support for C1
cpuid(0x05) provides extended information about MWAIT in EDX when bit
0 of ECX is set. Bit 4-7 of EDX determine whether MWAIT is supported
for C1. C1E enabled CPUs have these bits set to 0.

Based on an earlier patch from Andi Kleen.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:21 +02:00
Thomas Gleixner
732d7be17b x86: use cpuinfo to check for interrupt pending message msr
Simplify code: no need to do a cpuid(1) again. The cpuinfo structure
has all necessary information already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:14 +02:00
Thomas Gleixner
aa83f3f2cf x86: cleanup C1E enabled detection
Rename the "MSR_K8_ENABLE_C1E" MSR to INT_PENDING_MSG, which is the
name in the data sheet as well. Move the C1E mask to the header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:07 +02:00
Thomas Gleixner
6ddd2a2794 x86: simplify idle selection
default_idle is selected in cpu_idle(), when no other idle routine is
selected. Select it in select_idle_routine() when mwait is not
selected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:01 +02:00
Paolo Ciarrocchi
5b80fe8bd7 x86: coding style fixes to arch/x86/kernel/sys_i386_32.c
Before:
total: 16 errors, 25 warnings, 246 lines checked

After:
total: 0 errors, 7 warnings, 246 lines checked

Compile tested.

paolo@paolo-desktop:/tmp$ size sys*
   text    data     bss     dec     hex filename
   1209       0       0    1209     4b9 sys_i386_32.o.after
   1209       0       0    1209     4b9 sys_i386_32.o.before

paolo@paolo-desktop:/tmp$ md5sum sys*
6144f6d6ce7342c3e192681a1ccaa1c1  sys_i386_32.o.after
6144f6d6ce7342c3e192681a1ccaa1c1  sys_i386_32.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:34:54 +02:00
Paolo Ciarrocchi
7058b06188 x86: coding style fixes to arch/x86/pci/irq.
Before:
total: 60 errors, 85 warnings, 1237 lines checked

After:
total: 1 errors, 82 warnings, 1226 lines checked

WARNING: line over 80 characters

Compile tested.

paolo@paolo-desktop:/tmp$ size irq.o.*
   text    data     bss     dec     hex filename
   6128     440      76    6644    19f4 irq.o.after
   6128     440      76    6644    19f4 irq.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:33:59 +02:00
Robert Richter
9e26d84273 fix build bug in "x86: add PCI extended config space access for AMD Barcelona"
Also much less code now.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:32:53 +02:00
Miquel van Smoorenburg
b7f09ae583 x86, pci-dma.c: don't always add __GFP_NORETRY to gfp
Currently arch/x86/kernel/pci-dma.c always adds __GFP_NORETRY
to the allocation flags, because it wants to be reasonably
sure not to deadlock when calling alloc_pages().

But really that should only be done in two cases:
- when allocating memory in the lower 16 MB DMA zone.
  If there's no free memory there, waiting or OOM killing is of no use
- when optimistically trying an allocation in the DMA32 zone
  when dma_mask < DMA_32BIT_MASK hoping that the allocation
  happens to fall within the limits of the dma_mask

Also blindly adding __GFP_NORETRY to the the gfp variable might
not be a good idea since we then also use it when calling
dma_ops->alloc_coherent(). Clearing it might also not be a
good idea, dma_alloc_coherent()'s caller might have set it
on purpose. The gfp variable should not be clobbered.

[ mingo@elte.hu: converted to delta patch ontop of previous version. ]

Signed-off-by: Miquel van Smoorenburg <miquels@cistron.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:22:18 +02:00
Andreas Herrmann
668231141f x86: fix compile warning in io_apic_{32,64}.c
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:21:10 +02:00
Abhishek Sagar
1d74f2a0f6 ftrace: remove ftrace_ip_converted()
Remove the unneeded function ftrace_ip_converted().

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:57:49 +02:00
Yinghai Lu
6780711e98 x86: update mptable, fix
need to call early_reserve_e820() to preallocate mptable for 32bit

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:35:04 +02:00
Yinghai Lu
d49c428840 x86: make generic arch support NUMAQ
... so it could fall back to normal numa and we'd reduce the impact of the
NUMAQ subarch.

NUMAQ depends on GENERICARCH
also decouple genericarch numa from acpi.
also make it fall back to bigsmp if apicid > 8.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:34:42 +02:00
Ingo Molnar
4e78c91abe Revert "x86, numaq: add pci_acpi_scan_root() stub"
This reverts commit f329469097.

That bug will be fixed in a better way via:

  x86: make generic arch support NUMAQ
2008-06-10 11:33:01 +02:00
Yinghai Lu
e0da336468 x86: introduce max_physical_apicid for bigsmp switching
a multi-socket test-system with 3 or 4 ioapics, when 4 dualcore cpus or
2 quadcore cpus installed, needs to switch to bigsmp or physflat.

CPU apic id is [4,11] instead of [0,7], and we need to check max apic
id instead of cpu numbers.

also add check for 32 bit when acpi is not compiled in or acpi=off.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:32:09 +02:00
Yinghai Lu
c3ff01672a x86: fix boot failure with 64GB+ system with numa 32-bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:59 +02:00
Yinghai Lu
9043f00796 x86, numa, 32-bit: use find_e820_area() to find KVA RAM on node
don't assume we can use RAM near the end of every node.
Esp systems that have few memory and they could have
kva address and kva RAM all below max_low_pfn.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:52 +02:00
Yinghai Lu
cc1a9d86ce mm, x86: shrink_active_range() should check all
Now we are using register_e820_active_regions() instead of
add_active_range() directly. So end_pfn could be different between the
value in early_node_map to node_end_pfn.

So we need to make shrink_active_range() smarter.

shrink_active_range() is a generic MM function in mm/page_alloc.c but
it is only used on 32-bit x86. Should we move it back to some file in
arch/x86?

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:44 +02:00
Yinghai Lu
db3660c190 x86: remove all active memory ranges before registering them again after trimming - 64bit
this way we keep the early_node_map all right.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:30:47 +02:00
Ingo Molnar
2377494285 Revert "x86, 32-bit: SRAT fix"
This reverts commit ea57a5a6db, a better
fix will be merged.
2008-06-09 10:57:16 +02:00
Avi Kivity
3c9155106d KVM: MMU: Fix is_empty_shadow_page() check
The check is only looking at one of two possible empty ptes.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:36:33 +03:00
Avi Kivity
ebb0e6264c KVM: MMU: Fix printk() format string
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:36:20 +03:00
Linus Torvalds
256a13dd70 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86/PCI: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
  PCI/x86: fix up PCI stuff so that PCI_GOANY supports OLPC
2008-06-06 11:33:08 -07:00
Avi Kivity
8d2d73b9a5 KVM: MMU: reschedule during shadow teardown
Shadows for large guests can take a long time to tear down, so reschedule
occasionally to avoid softlockup warnings.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:32:20 +03:00