Commit graph

94918 commits

Author SHA1 Message Date
Dong, Eddie
489f1d6526 KVM: MMU: Update shadow ptes on partial guest pte writes
A guest partial guest pte write will leave shadow_trap_nonpresent_pte
in spte, which generates a vmexit at the next guest access through that pte.

This patch improves this by reading the full guest pte in advance and thus
being able to update the spte and eliminate the vmexit.

This helps pae guests which use two 32-bit writes to set a single 64-bit pte.

[truncation fix by Eric]

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:13 +03:00
David S. Miller
7cf069955f sparc64: Kill bogus RT_ALIGNEDSZ macro from signal.c
The structure has to be 8-byte aligned in size, so
this macro is just noise.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-27 00:25:30 -07:00
Robert Reif
3ade11601f sparc: sunzilog.c remove unused argument
Remove unused argument in function call.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 23:10:19 -07:00
Robert Reif
403ae52ac0 sparc: fix drivers/video/tcx.c warning
Fix compile warning:

CC drivers/video/tcx.o
drivers/video/tcx.c: In function ‘tcx_init_one’:
drivers/video/tcx.c:477: warning: format ‘%lx’ expects type ‘long 
unsigned int’, but argument 4 has type ‘resource_size_t’

This was the only sparc driver to use the resource directly in the
printk so I changed it to physbase like the other drivers.
Boot tested on SS4.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 22:29:43 -07:00
David S. Miller
5da496e4b9 sparc64: Kill unused local ISA bus layer.
No more drivers use this, and therefore it can die.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:23 -07:00
David S. Miller
9c1a5077fd input: Rewrite sparcspkr device probing.
Remove all dependencies on EBUS and ISA bus layers, which we'd like to
remove as they are superfluous.

While we're here, add support for proper frequency changing on BBC
beep devices.  Unlike the comments that were here, this device can
in fact use a programmable frequency.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:22 -07:00
David S. Miller
dc8ca2a111 sparc64: Do not ignore 'pmu' device ranges.
I must have disabled this due to other bugs which were fixed over
time.  And this is needed in order for child devices of "pmu"
to get proper resource values.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:20 -07:00
David S. Miller
0eb78f0b1a sparc64: Kill ISA_FLOPPY_WORKS code.
This never was enabled, I could never get it working, and if anyone
wants to try and get it's very easy to reference this code in the
history.

It's the only thing referencing the silly ISA device layer in the
sparc64 tree.  OF device layer infrastructure is what should be used
for these things.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:19 -07:00
David S. Miller
09337f501e sparc64: Kill CONFIG_SPARC32_COMPAT
It's completely superfluous, CONFIG_COMPAT is sufficient.

What this used to be is an umbrella for enabling code shared
by all 32-bit compat binary support types.  But with the
removal of SunOS and Solaris support, the only one left is
Linux 32-bit ELF.

Update defconfig.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:19 -07:00
David S. Miller
05d515ef3d sparc64: Cleanups and corrections for arch/sparc64/Kconfig
Refer to chip as "SPARC" throughout.

Say 32-bit SPARC and 64-bit SPARC rather than mentioning specific
chips such like UltraSPARC, as appropriate.

Remove non-sense help text referring to things that will never appear
on a SPARC system, such as EISA busses etc.

Use "help" instead of "--help--"

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:17 -07:00
David S. Miller
227c331178 sparc64: Fix wedged irq regression.
Kernel bugzilla 10273

As reported by Jos van der Ende, ever since commit
5a606b72a4 ("[SPARC64]: Do not ACK an
INO if it is disabled or inprogress.") sun4u interrupts
can get stuck.

What this changset did was add the following conditional to
the various IRQ chip ->enable() handlers on sparc64:

	if (unlikely(desc->status & (IRQ_DISABLED|IRQ_INPROGRESS)))
		return;

which is correct, however it means that special care is needed
in the ->enable() method.

Specifically we must put the interrupt into IDLE state during
an enable, or else it might never be sent out again.

Setting the INO interrupt state to IDLE resets the state machine,
the interrupt input to the INO is retested by the hardware, and
if an interrupt is being signalled by the device, the INO
moves back into TRANSMIT state, and an interrupt vector is sent
to the cpu.

The two sun4v IRQ chip handlers were already doing this properly,
only sun4u got it wrong.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:15 -07:00
Dmitry Torokhov
20430214cc Input: xpad - fix build failure
If both CONFIG_JOYSTICK_XPAD_FF and CONFIG_JOYSTICK_XPAD_LEDS are unset
xpad_bulk_out is not defined and build fails. Move it out of the #ifdef
block so it is always defined.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2008-04-27 00:10:11 -04:00
Peter Zijlstra
7f424a8b08 fix idle (arch, acpi and apm) and lockdep
OK, so 25-mm1 gave a lockdep error which made me look into this.

The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561

The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.

So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:

  idle routines are entered with IRQs disabled
  idle routines will exit with IRQs enabled

Nearly all already did this in one form or another.

Merge the 32 and 64 bit bits so they no longer have different bugs.

As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-27 00:01:45 +02:00
Yinghai Lu
5f0b2976cb x86: add pci=check_enable_amd_mmconf and dmi check
so will disable that feature by default, and only enable that via
pci=check_enable_amd_mmconf or for system match with dmi table.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
e8ee6f0ae5 x86: work around io allocation overlap of HT links
normally BIOSes assign io/mmio range to different HT links without
overlapping, even same node same link should get non overlapping
entries.

but Rafael L. Wysocki's buggy BIOS creates a link with overlapping
entries for mmio and io:

  node 0 link 0: io port [1000, ffffff]
  node 0 link 0: mmio [e0000000, efffffff]
  node 0 link 0: mmio [a0000, bffff]
  node 0 link 0: mmio [80000000, ffffffff]

try to merge them and we will get:

  bus: [00, ff] on node 0 link 0
  bus: 00 index 0 io port: [0, ffff]
  bus: 00 index 1 mmio: [80000000, fcffffffff]
  bus: 00 index 2 mmio: [a0000, bffff]

so later we will reduce the chance to assign used resource to
unassigned device.

Reported-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
cbf9bd603a acpi: get boot_cpu_id as early for k8_scan_nodes
[mingo@elte.hu: split from "x86_64: get boot_cpu_id as early for k8_scan_nodes]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
4cf1946374 x86_64: don't need set default res if only have one root bus
if there's only one root bus there's no need to split resources.

This patch fixes the issue described at:

  http://lkml.org/lkml/2008/4/10/304

Reported-and-bisected-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
6e184f299d x86: double check the multi root bus with fam10h mmconf
some bioses give same range to mmconf for fam10h msr, and mmio for node/link.

fam10h msr will overide mmio for node/link.
so we can not assign range to devices under node/link for unassigned resources.

this patch will take range out from the mmio for node/link

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
30a18d6c3f x86: multi pci root bus with different io resource range, on 64-bit
scan AMD opteron io/mmio routing to make sure every pci root bus get correct
resource range. Thus later pci scan could assign correct resource to device
with unassigned resource.

this can fix a system without _CRS for multi pci root bus.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
35ddd068fb x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit
... so we use the same code with Quad core cpu as old opteron.

This patch is useful when acpi=off or _PXM is not there in DSDT.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
871d5f8dd0 x86: get mp_bus_to_node early
Currently, on an amd k8 system with multi ht chains, the numa_node of
pci devices under /sys/devices/pci0000:80/* is always 0, even if that
chain is on node 1 or 2 or 3.

Workaround: pcibus_to_node(bus) is used when we want to get the node that
pci_device is on.

In struct device, we already have numa_node member, and we could use
dev_to_node()/set_dev_node() to get and set numa_node in the device.
set_dev_node is called in pci_device_add() with pcibus_to_node(bus),
and pcibus_to_node uses bus->sysdata for nodeid.

The problem is when pci_add_device is called, bus->sysdata is not assigned
correct nodeid yet. The result is that numa_node will always be 0.

pcibios_scan_root and pci_scan_root could take sysdata. So we need to get
mp_bus_to_node mapping before these two are called, and thus
get_mp_bus_to_node could get correct node for sysdata in root bus.

In scanning of the root bus, all child busses will take parent bus sysdata.
So all pci_device->dev.numa_node will be assigned correctly and automatically.

Later we could use dev_to_node(&pci_dev->dev) to get numa_node, and we
could also could make other bus specific device get the correct numa_node
too.

This is an updated version of pci_sysdata and Jeff's pci_domain patch.

[ mingo@elte.hu: build fix ]

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
bb63b42199 x86 pci: remove checking type for mmconfig probe
doesn't need to check if it is type1 or type2, we can use raw_pci_ops
directly.

also make pci_direct_conf1 static again.

anyway is there system with type 2 and mmconf support?

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
d2ebdf4bae x86: remove unneeded check in mmconf reject
mmconfig is only used to access extended configuration space.

so don't need to reject MFG that only have one entry and only handle bus0.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
0d358f22f6 driver core: try parent numa_node at first before using default
in the device_add, we try to use use parent numa_node.
need to make sure pci root bus's bridge device numa_node is set.
then we could use device->numa_node direclty for all device.
and don't need to call pcibus_to_node().

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
d39398a333 x86: seperate mmconf for fam10h out from setup_64.c
Separate mmconf for fam10h out from setup_64.c

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
d4c4d09415 x86: if acpi=off, force setting the mmconf for fam10h
some BIOS only let AMD fam 10h handle bus0, and nvidia mcp55/ck804
to handle other buses. at that case MCFG will cover all over them.

but with acpi=off, we can not use MCFG. this patch will double check
the busnbits, and if it is less handling 256 bues, and acpi=off
will forcely reset the mmconf in msr, so we still use mmconf in above case.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
7fd0da4085 x86_64: check MSR to get MMCONFIG for AMD Family 10h
so even booting kernel with acpi=off or even MCFG is not there, we still can
use MMCONFIG.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Greg KH <greg@kroah.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
eee206c3bf x86_64: check and enable MMCONFIG for AMD Family 10h
So we can use MMCONF when MMCONF is not set by BIOS

using TOP_MEM2 msr to get memory top, and try to scan fam10h mmio routing to
make sure the range is not conflicted with some prefetch MMIO that is above 4G.
(current only LinuxBIOS assign 64 bit mmio above 4G for some co-processor)

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu
57741a7790 x86_64: set cfg_size for AMD Family 10h in case MMCONFIG
reuse pci_cfg_space_size but skip check pci express and pci-x CAP ID.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Yinghai Lu
05c58b8ac7 x86: mmconf enable mcfg early
Patch
	"x86: validate against ACPI motherboard resources"

changed the mmconf init sequence, and init MMCONF late in acpi_init.

here change it back to old sequence:

 1. check hostbridge in early
 2. check MCFG with e820 in early
 3. if all fail, will check MCFg with acpi _CRS in acpi_init

So we can make MCONF working again when acpi=off is set if hostbridge
support that.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Yinghai Lu
0b64ad7123 x86: clear pci_mmcfg_virt when mmcfg get rejected
For x86_64, need to free pci_mmcfg_virt, and iounmap some pointers
when MMCONF is not reserved in E820 or acpi _CRS and get rejected.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Robert Hancock
7752d5cfe3 x86: validate against acpi motherboard resources
This path adds validation of the MMCONFIG table against the ACPI reserved
motherboard resources.  If the MMCONFIG table is found to be reserved in
ACPI, we don't bother checking the E820 table.  The PCI Express firmware
spec apparently tells BIOS developers that reservation in ACPI is required
and E820 reservation is optional, so checking against ACPI first makes
sense.  Many BIOSes don't reserve the MMCONFIG region in E820 even though
it is perfectly functional, the existing check needlessly disables MMCONFIG
in these cases.

In order to do this, MMCONFIG setup has been split into two phases.  If PCI
configuration type 1 is not available then MMCONFIG is enabled early as
before.  Otherwise, it is enabled later after the ACPI interpreter is
enabled, since we need to be able to execute control methods in order to
check the ACPI reserved resources.  Presently this is just triggered off
the end of ACPI interpreter initialization.

There are a few other behavioral changes here:

- Validate all MMCONFIG configurations provided, not just the first one.

- Validate the entire required length of each configuration according to
  the provided ending bus number is reserved, not just the minimum required
  allocation.

- Validate that the area is reserved even if we read it from the chipset
  directly and not from the MCFG table.  This catches the case where the
  BIOS didn't set the location properly in the chipset and has mapped it
  over other things it shouldn't have.

This also cleans up the MMCONFIG initialization functions so that they
simply do nothing if MMCONFIG is not compiled in.

Based on an original patch by Rajesh Shah from Intel.

[akpm@linux-foundation.org: many fixes and cleanups]
Signed-off-by: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andi Kleen <ak@suse.de>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Linus Torvalds
c3bf9bc243 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3:
  x86_64/mm: check and print vmemmap allocation continuous
  x86_64: fix setup_node_bootmem to support big mem excluding with memmap
  x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
  mm: allow reserve_bootmem() cross nodes
  mm: offset align in alloc_bootmem()
  mm: fix alloc_bootmem_core to use fast searching for all nodes
  mm: make mem_map allocation continuous
2008-04-26 14:04:32 -07:00
Linus Torvalds
e3505dd50c Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild:
  kbuild: scripts/Makefile.modpost typo fix
  kbuild: soften MODULE_LICENSE check
2008-04-26 14:03:54 -07:00
Yinghai Lu
c2b91e2eec x86_64/mm: check and print vmemmap allocation continuous
On big systems with lots of memory, don't print out too much during
bootup, and make it easy to find if it is continuous.

on 256G 8 sockets system will get
 [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0
[ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs
 [ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0
[ffffe20038700000-ffffe200387fffff] potential offnode page_structs
 [ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1
 [ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2
 [ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2
[ffffe20054700000-ffffe200547fffff] potential offnode page_structs
 [ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2
[ffffe20070700000-ffffe200707fffff] potential offnode page_structs
 [ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3
 [ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4
 [ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4
[ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs
 [ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4
[ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs
 [ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5
 [ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6
 [ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6
[ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs
 [ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6
 [ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7

instead of a very long print out...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 22:51:09 +02:00
Yinghai Lu
1a27fc0a42 x86_64: fix setup_node_bootmem to support big mem excluding with memmap
typical case: four sockets system, every node has 4g ram, and we are using:

	memmap=10g$4g

to mask out memory on node1 and node2

when numa is enabled, early_node_mem is used to get node_data and node_bootmap.

if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.

so check it and print out some info about it.

need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.

depends on "mm: make reserve_bootmem can crossed the nodes".

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Yinghai Lu
8b3cd09ed2 x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
"mm: make reserve_bootmem can crossed the nodes" provides new
reserve_bootmem(), let reserve_bootmem_generic() use that.

reserve_bootmem_generic() is used to reserve initramdisk, so this way
we can make sure even when bootloader or kexec load ranges cross the
node memory boundaries, reserve_bootmem still works.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Yinghai Lu
a5645a61b3 mm: allow reserve_bootmem() cross nodes
split reserve_bootmem_core() into two functions, one which checks
conflicts, and one which sets the bits.

and make reserve_bootmem to loop bdata_list to cross the nodes.

user could be crashkernel and ramdisk..., in case the range provided
by those externalities crosses the nodes.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Yinghai Lu
9a2dc04cf0 mm: offset align in alloc_bootmem()
need offset alignment when node_boot_start's alignment is less than
the alignment required.

use local node_boot_start to match alignment - so don't add extra operation
in search loop.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Yinghai Lu
ad09315cad mm: fix alloc_bootmem_core to use fast searching for all nodes
Make the nodes other than node 0 use bdata->last_success for fast
search too.

We need to use __alloc_bootmem_core() for vmemmap allocation for other
nodes when numa and sparsemem/vmemmap are enabled.

Also, make fail_block path increase i with incr only after ALIGN
to avoid extra increase when size is larger than align.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:07 +02:00
Yinghai Lu
e123dd3f0e mm: make mem_map allocation continuous
vmemmap allocation currently has this layout:

 [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
 [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
 [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
 [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
 [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
...

note that there is a 2M hole between them - not optimal.

the root cause is that usemap (24 bytes) will be allocated after every 2M
mem_map, and it will push next vmemmap (2M) to the next (2M) alignment.

solution: try to allocate the mem_map continously.

after the patch, we get:

 [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
 [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
 [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
 [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
 [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
...

which is the ideal layout.

and usemap will share a page because of they are allocated continuously too:

sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
...

so we make the bootmem allocation more compact and use less memory
for usemap => mission accomplished ;-)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:07 +02:00
Linus Torvalds
9b79ed952b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3:
  x86, bitops: select the generic bitmap search functions
  x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only
  x86: finalize bitops unification
  x86, UML: remove x86-specific implementations of find_first_bit
  x86: optimize find_first_bit for small bitmaps
  x86: switch 64-bit to generic find_first_bit
  x86: generic versions of find_first_(zero_)bit, convert i386
  bitops: use __fls for fls64 on 64-bit archs
  generic: implement __fls on all 64-bit archs
  generic: introduce a generic __fls implementation
  x86: merge the simple bitops and move them to bitops.h
  x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
  x86, uml: fix uml with generic find_next_bit for x86
  x86: change x86 to use generic find_next_bit
  uml: Kconfig cleanup
  uml: fix build error
2008-04-26 13:46:11 -07:00
Linus Torvalds
a52b0d25a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (46 commits)
  ide: constify struct ide_dma_ops
  ide: add struct ide_dma_ops (take 3)
  ide: add IDE_HFLAG_SERIALIZE_DMA host flag
  sl82c105: check bridge revision in sl82c105_init_one()
  au1xxx-ide: use ->init_dma method
  palm_bk3710: use ->init_dma method
  sgiioc4: use ->init_dma method
  icside: use ->init_dma method
  ide-pmac: use ->init_dma method
  ide: do complete DMA setup in ->init_dma method (take 2)
  au1xxx-ide: fix MWDMA support
  ide: cleanup ide_setup_dma()
  ide: factor out setting PCI bus-mastering from ide_hwif_setup_dma()
  ide: export ide_allocate_dma_engine()
  ide: move ide_setup_dma() call out from ->init_dma method
  alim15x3: skip DMA initialization completely on revs < 0x20
  pdc202xx_old: remove init_dma_pdc202xx()
  ide: don't display "BIOS" settings in ide_setup_dma()
  ide: remove ->cds field from ide_hwif_t (take 2)
  ide: remove ide_dma_iobase()
  ...
2008-04-26 13:44:19 -07:00
Linus Torvalds
539a5fe226 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootparam
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootparam:
  x86, boot: Document for linked list of struct setup_data
  x86, boot: export linked list of struct setup_data via debugfs
  x86, boot: add linked list of struct setup_data
  x86, boot: add free_early to early reservation machanism
2008-04-26 13:29:41 -07:00
Bartlomiej Zolnierkiewicz
f37afdaca7 ide: constify struct ide_dma_ops
* Export ide_dma_exec_cmd() and __ide_dma_test_irq().

* Constify struct ide_dma_ops.

* Always set hwif->dma_ops to &sff_dma_ops in ide_setup_dma()
  (it is later overriden by ide_init_port() if needed) and drop
  'const struct ide_port_info *d' argument.

While at it:

* Rename __ide_dma_test_irq() to ide_dma_test_irq().

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-26 22:25:24 +02:00
Bartlomiej Zolnierkiewicz
5e37bdc081 ide: add struct ide_dma_ops (take 3)
Add struct ide_dma_ops and convert core code + drivers to use it.

While at it:

* Drop "ide_" prefix from ->ide_dma_end and ->ide_dma_test_irq methods.

* Drop "ide_" "infixes" from DMA methods.

* au1xxx-ide.c:
  - use auide_dma_{test_irq,end}() directly in auide_dma_timeout()

* pdc202xx_old.c:
  - drop "old_" "infixes" from DMA methods

* siimage.c:
  - add siimage_dma_test_irq() helper
  - print SATA warning in siimage_init_one()

* Remove no longer needed ->init_hwif implementations.

v2:
* Changes based on review from Sergei:
  - s/siimage_ide_dma_test_irq/siimage_dma_test_irq/
  - s/drive->hwif/hwif/ in idefloppy_pc_intr().
  - fix patch description w.r.t. au1xxx-ide changes
  - fix au1xxx-ide build
  - fix naming for cmd64*_dma_ops
  - drop "ide_" and "old_" infixes
  - s/hpt3xxx_dma_ops/hpt37x_dma_ops/
  - s/hpt370x_dma_ops/hpt370_dma_ops/
  - use correct DMA ops for HPT302/N, HPT371/N and HPT374
  - s/it821x_smart_dma_ops/it821x_pass_through_dma_ops/

v3:
* Two bugs slipped in v2 (noticed by Sergei):
  - use correct DMA ops for HPT374 (for real this time)
  - handle HPT370/HPT370A properly

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-26 22:25:24 +02:00
Bartlomiej Zolnierkiewicz
1fd1890594 ide: add IDE_HFLAG_SERIALIZE_DMA host flag
* Add IDE_HFLAG_SERIALIZE_DMA host flag to serialize ports
  if DMA is available and handle it in ide_init_port().

* Convert sl82c105 host driver to use this new flag.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-26 22:25:24 +02:00
Bartlomiej Zolnierkiewicz
6c61064162 sl82c105: check bridge revision in sl82c105_init_one()
* Make sl82c105_bridge_revision() return 'u8' instead of 'unsigned long'.

* Check bridge revision in sl82c105_init_one().

While at:

* Use proper KERN_ level for printk().

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-26 22:25:23 +02:00
Bartlomiej Zolnierkiewicz
8552865935 au1xxx-ide: use ->init_dma method
* Pass 'ide_hwif_t *hwif' instead of '_auide_hwif *auide' to
  auide_ddma_init().

* Add 'const struct ide_port_info *d' argument to auide_ddma_init().

* Convert the driver to use ->init_dma method.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-26 22:25:23 +02:00
Bartlomiej Zolnierkiewicz
b552a2c1dd palm_bk3710: use ->init_dma method
* Move DMA setup to palm_bk3710_init_dma().

* Convert the driver to use ->init_dma method.

Cc: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-26 22:25:23 +02:00