Commit graph

95125 commits

Author SHA1 Message Date
Hollis Blanchard
31bb117eb4 KVM: Use CONFIG_PREEMPT_NOTIFIERS around struct preempt_notifier
This allows kvm_host.h to be #included even when struct preempt_notifier is
undefined. This is needed to build ppc asm-offsets.h.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:18 +03:00
Sheng Yang
2384d2b326 KVM: VMX: Enable Virtual Processor Identification (VPID)
To allow TLB entries to be retained across VM entry and VM exit, the VMM
can now identify distinct address spaces through a new virtual-processor ID
(VPID) field of the VMCS.

[avi: drop vpid_sync_all()]
[avi: add "cc" to asm constraints]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:17 +03:00
Avi Kivity
adb1ff4675 KVM: Limit vcpu mmap size to one page on non-x86
The second page is only needed on archs that support pio.

Noted by Carsten Otte.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:17 +03:00
Avi Kivity
d196e34336 KVM: MMU: Decouple mmio from shadow page tables
Currently an mmio guest pte is encoded in the shadow pagetable as a
not-present trapping pte, with the SHADOW_IO_MARK bit set.  However
nothing is ever done with this information, so maintaining it is a
useless complication.

This patch moves the check for mmio to before shadow ptes are instantiated,
so the shadow code is never invoked for ptes that reference mmio.  The code
is simpler, and with future work, can be made to handle mmio concurrently.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:17 +03:00
Avi Kivity
1d6ad2073e KVM: x86 emulator: group decoding for group 1 instructions
Opcodes 0x80-0x83

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:16 +03:00
Avi Kivity
09566765ef KVM: Only x86 has pio
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:15 +03:00
Jan Engelhardt
5c5027425e KVM: constify function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:15 +03:00
Avi Kivity
d95058a1a7 KVM: x86 emulator: add group 7 decoding
This adds group decoding for opcode 0x0f 0x01 (group 7).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:15 +03:00
Avi Kivity
fd60754e4f KVM: x86 emulator: Group decoding for groups 4 and 5
Add group decoding support for opcode 0xfe (group 4) and 0xff (group 5).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:15 +03:00
Avi Kivity
7d858a19ef KVM: x86 emulator: Group decoding for group 3
This adds group decoding support for opcodes 0xf6, 0xf7 (group 3).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Avi Kivity
43bb19cd33 KVM: x86 emulator: group decoding for group 1A
This adds group decode support for opcode 0x8f.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Avi Kivity
e09d082c03 KVM: x86 emulator: add support for group decoding
Certain x86 instructions use bits 3:5 of the byte following the opcode as an
opcode extension, with the decode sometimes depending on bits 6:7 as well.
Add support for this in the main decoding table rather than an ad-hock
adaptation per opcode.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Dong, Eddie
1ae0a13def KVM: MMU: Simplify hash table indexing
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Dong, Eddie
489f1d6526 KVM: MMU: Update shadow ptes on partial guest pte writes
A guest partial guest pte write will leave shadow_trap_nonpresent_pte
in spte, which generates a vmexit at the next guest access through that pte.

This patch improves this by reading the full guest pte in advance and thus
being able to update the spte and eliminate the vmexit.

This helps pae guests which use two 32-bit writes to set a single 64-bit pte.

[truncation fix by Eric]

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:13 +03:00
David S. Miller
7cf069955f sparc64: Kill bogus RT_ALIGNEDSZ macro from signal.c
The structure has to be 8-byte aligned in size, so
this macro is just noise.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-27 00:25:30 -07:00
Robert Reif
3ade11601f sparc: sunzilog.c remove unused argument
Remove unused argument in function call.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 23:10:19 -07:00
Robert Reif
403ae52ac0 sparc: fix drivers/video/tcx.c warning
Fix compile warning:

CC drivers/video/tcx.o
drivers/video/tcx.c: In function ‘tcx_init_one’:
drivers/video/tcx.c:477: warning: format ‘%lx’ expects type ‘long 
unsigned int’, but argument 4 has type ‘resource_size_t’

This was the only sparc driver to use the resource directly in the
printk so I changed it to physbase like the other drivers.
Boot tested on SS4.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 22:29:43 -07:00
David S. Miller
5da496e4b9 sparc64: Kill unused local ISA bus layer.
No more drivers use this, and therefore it can die.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:23 -07:00
David S. Miller
9c1a5077fd input: Rewrite sparcspkr device probing.
Remove all dependencies on EBUS and ISA bus layers, which we'd like to
remove as they are superfluous.

While we're here, add support for proper frequency changing on BBC
beep devices.  Unlike the comments that were here, this device can
in fact use a programmable frequency.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:22 -07:00
David S. Miller
dc8ca2a111 sparc64: Do not ignore 'pmu' device ranges.
I must have disabled this due to other bugs which were fixed over
time.  And this is needed in order for child devices of "pmu"
to get proper resource values.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:20 -07:00
David S. Miller
0eb78f0b1a sparc64: Kill ISA_FLOPPY_WORKS code.
This never was enabled, I could never get it working, and if anyone
wants to try and get it's very easy to reference this code in the
history.

It's the only thing referencing the silly ISA device layer in the
sparc64 tree.  OF device layer infrastructure is what should be used
for these things.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:19 -07:00
David S. Miller
09337f501e sparc64: Kill CONFIG_SPARC32_COMPAT
It's completely superfluous, CONFIG_COMPAT is sufficient.

What this used to be is an umbrella for enabling code shared
by all 32-bit compat binary support types.  But with the
removal of SunOS and Solaris support, the only one left is
Linux 32-bit ELF.

Update defconfig.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:19 -07:00
David S. Miller
05d515ef3d sparc64: Cleanups and corrections for arch/sparc64/Kconfig
Refer to chip as "SPARC" throughout.

Say 32-bit SPARC and 64-bit SPARC rather than mentioning specific
chips such like UltraSPARC, as appropriate.

Remove non-sense help text referring to things that will never appear
on a SPARC system, such as EISA busses etc.

Use "help" instead of "--help--"

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:17 -07:00
David S. Miller
227c331178 sparc64: Fix wedged irq regression.
Kernel bugzilla 10273

As reported by Jos van der Ende, ever since commit
5a606b72a4 ("[SPARC64]: Do not ACK an
INO if it is disabled or inprogress.") sun4u interrupts
can get stuck.

What this changset did was add the following conditional to
the various IRQ chip ->enable() handlers on sparc64:

	if (unlikely(desc->status & (IRQ_DISABLED|IRQ_INPROGRESS)))
		return;

which is correct, however it means that special care is needed
in the ->enable() method.

Specifically we must put the interrupt into IDLE state during
an enable, or else it might never be sent out again.

Setting the INO interrupt state to IDLE resets the state machine,
the interrupt input to the INO is retested by the hardware, and
if an interrupt is being signalled by the device, the INO
moves back into TRANSMIT state, and an interrupt vector is sent
to the cpu.

The two sun4v IRQ chip handlers were already doing this properly,
only sun4u got it wrong.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:15 -07:00
Dmitry Torokhov
20430214cc Input: xpad - fix build failure
If both CONFIG_JOYSTICK_XPAD_FF and CONFIG_JOYSTICK_XPAD_LEDS are unset
xpad_bulk_out is not defined and build fails. Move it out of the #ifdef
block so it is always defined.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2008-04-27 00:10:11 -04:00
Peter Zijlstra
7f424a8b08 fix idle (arch, acpi and apm) and lockdep
OK, so 25-mm1 gave a lockdep error which made me look into this.

The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561

The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.

So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:

  idle routines are entered with IRQs disabled
  idle routines will exit with IRQs enabled

Nearly all already did this in one form or another.

Merge the 32 and 64 bit bits so they no longer have different bugs.

As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-27 00:01:45 +02:00
Yinghai Lu
5f0b2976cb x86: add pci=check_enable_amd_mmconf and dmi check
so will disable that feature by default, and only enable that via
pci=check_enable_amd_mmconf or for system match with dmi table.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
e8ee6f0ae5 x86: work around io allocation overlap of HT links
normally BIOSes assign io/mmio range to different HT links without
overlapping, even same node same link should get non overlapping
entries.

but Rafael L. Wysocki's buggy BIOS creates a link with overlapping
entries for mmio and io:

  node 0 link 0: io port [1000, ffffff]
  node 0 link 0: mmio [e0000000, efffffff]
  node 0 link 0: mmio [a0000, bffff]
  node 0 link 0: mmio [80000000, ffffffff]

try to merge them and we will get:

  bus: [00, ff] on node 0 link 0
  bus: 00 index 0 io port: [0, ffff]
  bus: 00 index 1 mmio: [80000000, fcffffffff]
  bus: 00 index 2 mmio: [a0000, bffff]

so later we will reduce the chance to assign used resource to
unassigned device.

Reported-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
cbf9bd603a acpi: get boot_cpu_id as early for k8_scan_nodes
[mingo@elte.hu: split from "x86_64: get boot_cpu_id as early for k8_scan_nodes]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
4cf1946374 x86_64: don't need set default res if only have one root bus
if there's only one root bus there's no need to split resources.

This patch fixes the issue described at:

  http://lkml.org/lkml/2008/4/10/304

Reported-and-bisected-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
6e184f299d x86: double check the multi root bus with fam10h mmconf
some bioses give same range to mmconf for fam10h msr, and mmio for node/link.

fam10h msr will overide mmio for node/link.
so we can not assign range to devices under node/link for unassigned resources.

this patch will take range out from the mmio for node/link

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
30a18d6c3f x86: multi pci root bus with different io resource range, on 64-bit
scan AMD opteron io/mmio routing to make sure every pci root bus get correct
resource range. Thus later pci scan could assign correct resource to device
with unassigned resource.

this can fix a system without _CRS for multi pci root bus.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
35ddd068fb x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit
... so we use the same code with Quad core cpu as old opteron.

This patch is useful when acpi=off or _PXM is not there in DSDT.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
871d5f8dd0 x86: get mp_bus_to_node early
Currently, on an amd k8 system with multi ht chains, the numa_node of
pci devices under /sys/devices/pci0000:80/* is always 0, even if that
chain is on node 1 or 2 or 3.

Workaround: pcibus_to_node(bus) is used when we want to get the node that
pci_device is on.

In struct device, we already have numa_node member, and we could use
dev_to_node()/set_dev_node() to get and set numa_node in the device.
set_dev_node is called in pci_device_add() with pcibus_to_node(bus),
and pcibus_to_node uses bus->sysdata for nodeid.

The problem is when pci_add_device is called, bus->sysdata is not assigned
correct nodeid yet. The result is that numa_node will always be 0.

pcibios_scan_root and pci_scan_root could take sysdata. So we need to get
mp_bus_to_node mapping before these two are called, and thus
get_mp_bus_to_node could get correct node for sysdata in root bus.

In scanning of the root bus, all child busses will take parent bus sysdata.
So all pci_device->dev.numa_node will be assigned correctly and automatically.

Later we could use dev_to_node(&pci_dev->dev) to get numa_node, and we
could also could make other bus specific device get the correct numa_node
too.

This is an updated version of pci_sysdata and Jeff's pci_domain patch.

[ mingo@elte.hu: build fix ]

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
bb63b42199 x86 pci: remove checking type for mmconfig probe
doesn't need to check if it is type1 or type2, we can use raw_pci_ops
directly.

also make pci_direct_conf1 static again.

anyway is there system with type 2 and mmconf support?

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
d2ebdf4bae x86: remove unneeded check in mmconf reject
mmconfig is only used to access extended configuration space.

so don't need to reject MFG that only have one entry and only handle bus0.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
0d358f22f6 driver core: try parent numa_node at first before using default
in the device_add, we try to use use parent numa_node.
need to make sure pci root bus's bridge device numa_node is set.
then we could use device->numa_node direclty for all device.
and don't need to call pcibus_to_node().

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
d39398a333 x86: seperate mmconf for fam10h out from setup_64.c
Separate mmconf for fam10h out from setup_64.c

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
d4c4d09415 x86: if acpi=off, force setting the mmconf for fam10h
some BIOS only let AMD fam 10h handle bus0, and nvidia mcp55/ck804
to handle other buses. at that case MCFG will cover all over them.

but with acpi=off, we can not use MCFG. this patch will double check
the busnbits, and if it is less handling 256 bues, and acpi=off
will forcely reset the mmconf in msr, so we still use mmconf in above case.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
7fd0da4085 x86_64: check MSR to get MMCONFIG for AMD Family 10h
so even booting kernel with acpi=off or even MCFG is not there, we still can
use MMCONFIG.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Greg KH <greg@kroah.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
eee206c3bf x86_64: check and enable MMCONFIG for AMD Family 10h
So we can use MMCONF when MMCONF is not set by BIOS

using TOP_MEM2 msr to get memory top, and try to scan fam10h mmio routing to
make sure the range is not conflicted with some prefetch MMIO that is above 4G.
(current only LinuxBIOS assign 64 bit mmio above 4G for some co-processor)

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
57741a7790 x86_64: set cfg_size for AMD Family 10h in case MMCONFIG
reuse pci_cfg_space_size but skip check pci express and pci-x CAP ID.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Yinghai Lu
05c58b8ac7 x86: mmconf enable mcfg early
Patch
	"x86: validate against ACPI motherboard resources"

changed the mmconf init sequence, and init MMCONF late in acpi_init.

here change it back to old sequence:

 1. check hostbridge in early
 2. check MCFG with e820 in early
 3. if all fail, will check MCFg with acpi _CRS in acpi_init

So we can make MCONF working again when acpi=off is set if hostbridge
support that.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Yinghai Lu
0b64ad7123 x86: clear pci_mmcfg_virt when mmcfg get rejected
For x86_64, need to free pci_mmcfg_virt, and iounmap some pointers
when MMCONF is not reserved in E820 or acpi _CRS and get rejected.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Robert Hancock
7752d5cfe3 x86: validate against acpi motherboard resources
This path adds validation of the MMCONFIG table against the ACPI reserved
motherboard resources.  If the MMCONFIG table is found to be reserved in
ACPI, we don't bother checking the E820 table.  The PCI Express firmware
spec apparently tells BIOS developers that reservation in ACPI is required
and E820 reservation is optional, so checking against ACPI first makes
sense.  Many BIOSes don't reserve the MMCONFIG region in E820 even though
it is perfectly functional, the existing check needlessly disables MMCONFIG
in these cases.

In order to do this, MMCONFIG setup has been split into two phases.  If PCI
configuration type 1 is not available then MMCONFIG is enabled early as
before.  Otherwise, it is enabled later after the ACPI interpreter is
enabled, since we need to be able to execute control methods in order to
check the ACPI reserved resources.  Presently this is just triggered off
the end of ACPI interpreter initialization.

There are a few other behavioral changes here:

- Validate all MMCONFIG configurations provided, not just the first one.

- Validate the entire required length of each configuration according to
  the provided ending bus number is reserved, not just the minimum required
  allocation.

- Validate that the area is reserved even if we read it from the chipset
  directly and not from the MCFG table.  This catches the case where the
  BIOS didn't set the location properly in the chipset and has mapped it
  over other things it shouldn't have.

This also cleans up the MMCONFIG initialization functions so that they
simply do nothing if MMCONFIG is not compiled in.

Based on an original patch by Rajesh Shah from Intel.

[akpm@linux-foundation.org: many fixes and cleanups]
Signed-off-by: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andi Kleen <ak@suse.de>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Linus Torvalds
c3bf9bc243 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3:
  x86_64/mm: check and print vmemmap allocation continuous
  x86_64: fix setup_node_bootmem to support big mem excluding with memmap
  x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
  mm: allow reserve_bootmem() cross nodes
  mm: offset align in alloc_bootmem()
  mm: fix alloc_bootmem_core to use fast searching for all nodes
  mm: make mem_map allocation continuous
2008-04-26 14:04:32 -07:00
Linus Torvalds
e3505dd50c Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild:
  kbuild: scripts/Makefile.modpost typo fix
  kbuild: soften MODULE_LICENSE check
2008-04-26 14:03:54 -07:00
Yinghai Lu
c2b91e2eec x86_64/mm: check and print vmemmap allocation continuous
On big systems with lots of memory, don't print out too much during
bootup, and make it easy to find if it is continuous.

on 256G 8 sockets system will get
 [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0
[ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs
 [ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0
[ffffe20038700000-ffffe200387fffff] potential offnode page_structs
 [ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1
 [ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2
 [ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2
[ffffe20054700000-ffffe200547fffff] potential offnode page_structs
 [ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2
[ffffe20070700000-ffffe200707fffff] potential offnode page_structs
 [ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3
 [ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4
 [ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4
[ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs
 [ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4
[ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs
 [ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5
 [ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6
 [ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6
[ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs
 [ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6
 [ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7

instead of a very long print out...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 22:51:09 +02:00
Yinghai Lu
1a27fc0a42 x86_64: fix setup_node_bootmem to support big mem excluding with memmap
typical case: four sockets system, every node has 4g ram, and we are using:

	memmap=10g$4g

to mask out memory on node1 and node2

when numa is enabled, early_node_mem is used to get node_data and node_bootmap.

if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.

so check it and print out some info about it.

need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.

depends on "mm: make reserve_bootmem can crossed the nodes".

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Yinghai Lu
8b3cd09ed2 x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
"mm: make reserve_bootmem can crossed the nodes" provides new
reserve_bootmem(), let reserve_bootmem_generic() use that.

reserve_bootmem_generic() is used to reserve initramdisk, so this way
we can make sure even when bootloader or kexec load ranges cross the
node memory boundaries, reserve_bootmem still works.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00