Commit graph

118856 commits

Author SHA1 Message Date
Tom Tucker
517ac45af4 9p: rdma: Set trans prior to requesting async connection ops
The RDMA connection manager is fundamentally asynchronous.
Since the async callback context is the client pointer, the
transport in the client struct needs to be set prior to calling
the first async op.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05 13:19:06 -06:00
Ingo Molnar
9fcd18c9e6 sched: re-tune balancing
Impact: improve wakeup affinity on NUMA systems, tweak SMP systems

Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain
balancing defaults on NUMA and SMP systems.

Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason
why we would not want to have wakeup affinity across nodes as well.
(we already do this in the standard NUMA template.)

lat_ctx on a NUMA box is particularly happy about this change:

before:

 |   phoenix:~/l> ./lat_ctx -s 0 2
 |   "size=0k ovr=2.60
 |   2 5.70

after:

 |   phoenix:~/l> ./lat_ctx -s 0 2
 |   "size=0k ovr=2.65
 |   2 2.07

a 2.75x speedup.

pipe-test is similarly happy about it too:

 |  phoenix:~/sched-tests> ./pipe-test
 |   18.26 usecs/loop.
 |   14.70 usecs/loop.
 |   14.38 usecs/loop.
 |   10.55 usecs/loop.              # +WAKE_AFFINE on domain0+domain1
 |   8.63 usecs/loop.
 |   8.59 usecs/loop.
 |   9.03 usecs/loop.
 |   8.94 usecs/loop.
 |   8.96 usecs/loop.
 |   8.63 usecs/loop.

Also:

 - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings)
 - enable SD_WAKE_BALANCE on SMP domains

Sysbench+postgresql improves all around the board, quite significantly:

           .28-rc3-11474e2c  .28-rc3-11474e2c-tune
-------------------------------------------------
    1:             571              688    +17.08%
    2:            1236             1206    -2.55%
    4:            2381             2642    +9.89%
    8:            4958             5164    +3.99%
   16:            9580             9574    -0.07%
   32:            7128             8118    +12.20%
   64:            7342             8266    +11.18%
  128:            7342             8064    +8.95%
  256:            7519             7884    +4.62%
  512:            7350             7731    +4.93%
-------------------------------------------------
  SUM:           55412            59341    +6.62%

So it's a win both for the runup portion, the peak area and the tail.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05 18:04:38 +01:00
Eric W. Biederman
467622ef2a [MTD] [NOR] Fix cfi_send_gen_cmd handling of x16 devices in x8 mode (v4)
For "unlock" cycles to 16bit devices in 8bit compatibility mode we need
to use the byte addresses 0xaaa and 0x555. These effectively match
the word address 0x555 and 0x2aa, except the latter has its low bit set.

Most chips don't care about the value of the 'A-1' pin in x8 mode,
but some -- like the ST M29W320D -- do. So we need to be careful to
set it where appropriate.

cfi_send_gen_cmd is only ever passed addresses where the low byte
is 0x00, 0x55 or 0xaa. Of those, only addresses ending 0xaa are
affected by this patch, by masking in the extra low bit when the device
is known to be in compatibility mode.

[dwmw2: Do it only when (cmd_ofs & 0xff) == 0xaa]
v4: Fix  stupid typo in cfi_build_cmd_addr that failed to compile
    I'm writing this patch way to late at night.
v3: Bring all of the work back into cfi_build_cmd_addr
    including calling of map_bankwidth(map) and cfi_interleave(cfi)
    So every caller doesn't need to.
v2: Only modified the address if we our device_type is larger than our
    bus width.

Cc: stable@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-11-05 14:40:25 +01:00
David S. Miller
518a09ef11 tcp: Fix recvmsg MSG_PEEK influence of blocking behavior.
Vito Caputo noticed that tcp_recvmsg() returns immediately from
partial reads when MSG_PEEK is used.  In particular, this means that
SO_RCVLOWAT is not respected.

Simply remove the test.  And this matches the behavior of several
other systems, including BSD.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 03:36:01 -08:00
Alexey Dobriyan
efb9a8c28c netfilter: netns ct: walk netns list under RTNL
netns list (just list) is under RTNL. But helper and proto unregistration
happen during rmmod when RTNL is not held, and that's how it was tested:
modprobe/rmmod vs clone(CLONE_NEWNET)/exit.

BUG: unable to handle kernel paging request at 0000000000100100	<===
IP: [<ffffffffa009890f>] nf_conntrack_l4proto_unregister+0x96/0xae [nf_conntrack]
PGD 15e300067 PUD 15e1d8067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/kernel/uevent_seqnum
CPU 0
Modules linked in: nf_conntrack_proto_sctp(-) nf_conntrack_proto_dccp(-) af_packet iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 sr_mod cdrom [last unloaded: nf_conntrack_proto_sctp]
Pid: 16758, comm: rmmod Not tainted 2.6.28-rc2-netns-xfrm #3
RIP: 0010:[<ffffffffa009890f>]  [<ffffffffa009890f>] nf_conntrack_l4proto_unregister+0x96/0xae [nf_conntrack]
RSP: 0018:ffff88015dc1fec8  EFLAGS: 00010212
RAX: 0000000000000000 RBX: 00000000001000f8 RCX: 0000000000000000
RDX: ffffffffa009575c RSI: 0000000000000003 RDI: ffffffffa00956b5
RBP: ffff88015dc1fed8 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: ffff88015dc1fe48 R12: ffffffffa0458f60
R13: 0000000000000880 R14: 00007fff4c361d30 R15: 0000000000000880
FS:  00007f624435a6f0(0000) GS:ffffffff80521580(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000100100 CR3: 0000000168969000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 16758, threadinfo ffff88015dc1e000, task ffff880179864218)
Stack:
 ffffffffa0459100 0000000000000000 ffff88015dc1fee8 ffffffffa0457934
 ffff88015dc1ff78 ffffffff80253fef 746e6e6f635f666e 6f72705f6b636172
 00707463735f6f74 ffffffff8024cb30 00000000023b8010 0000000000000000
Call Trace:
 [<ffffffffa0457934>] nf_conntrack_proto_sctp_fini+0x10/0x1e [nf_conntrack_proto_sctp]
 [<ffffffff80253fef>] sys_delete_module+0x19f/0x1fe
 [<ffffffff8024cb30>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff803ea9b2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8020b52b>] system_call_fastpath+0x16/0x1b
Code: 13 35 e0 e8 c4 6c 1a e0 48 8b 1d 6d c6 46 e0 eb 16 48 89 df 4c 89 e2 48 c7 c6 fc 85 09 a0 e8 61 cd ff ff 48 8b 5b 08 48 83 eb 08 <48> 8b 43 08 0f 18 08 48 8d 43 08 48 3d 60 4f 50 80 75 d3 5b 41
RIP  [<ffffffffa009890f>] nf_conntrack_l4proto_unregister+0x96/0xae [nf_conntrack]
 RSP <ffff88015dc1fec8>
CR2: 0000000000100100
---[ end trace bde8ac82debf7192 ]---

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 03:03:18 -08:00
Takashi Iwai
959973b92d ALSA: hda - Add a quirk for MEDION MD96630
Use model=lenovo-ms7195-dig for MEDION MD96630 laptop (17c0:4085)
with ALC888 codec.
Reference: Novell bnc#412548
	https://bugzilla.novell.com/show_bug.cgi?id=412528

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-11-05 11:30:56 +01:00
Benjamin Thery
e3ec6cfc26 ipv6: fix run pending DAD when interface becomes ready
With some net devices types, an IPv6 address configured while the
interface was down can stay 'tentative' forever, even after the interface
is set up. In some case, pending IPv6 DADs are not executed when the
device becomes ready.

I observed this while doing some tests with kvm. If I assign an IPv6 
address to my interface eth0 (kvm driver rtl8139) when it is still down
then the address is flagged tentative (IFA_F_TENTATIVE). Then, I set
eth0 up, and to my surprise, the address stays 'tentative', no DAD is
executed and the address can't be pinged.

I also observed the same behaviour, without kvm, with virtual interfaces
types macvlan and veth.

Some easy steps to reproduce the issue with macvlan:

1. ip link add link eth0 type macvlan
2. ip -6 addr add 2003::ab32/64 dev macvlan0
3. ip addr show dev macvlan0
   ... 
   inet6 2003::ab32/64 scope global tentative
   ...
4. ip link set macvlan0 up
5. ip addr show dev macvlan0
   ...
   inet6 2003::ab32/64 scope global tentative
   ...
   Address is still tentative

I think there's a bug in net/ipv6/addrconf.c, addrconf_notify():
addrconf_dad_run() is not always run when the interface is flagged IF_READY.
Currently it is only run when receiving NETDEV_CHANGE event. Looks like
some (virtual) devices doesn't send this event when becoming up.

For both NETDEV_UP and NETDEV_CHANGE events, when the interface becomes
ready, run_pending should be set to 1. Patch below.

'run_pending = 1' could be moved below the if/else block but it makes 
the code less readable.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 01:43:57 -08:00
Randy Dunlap
b22cecdd8f net/9p: fix printk format warnings
Fix printk format warnings in net/9p.
Built cleanly on 7 arches.

net/9p/client.c:820: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:820: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:867: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:867: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:932: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:932: warning: format '%llx' expects type 'long long unsigned int', but argument 6 has type 'u64'
net/9p/client.c:982: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:982: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:1025: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:1025: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 12 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 13 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 12 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 13 has type 'u64'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 01:35:55 -08:00
Peter Zijlstra
02479099c2 sched: fix buddies for group scheduling
Impact: scheduling order fix for group scheduling

For each level in the hierarchy, set the buddy to point to the right entity.
Therefore, when we do the hierarchical schedule, we have a fair chance of
ending up where we meant to.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05 10:30:15 +01:00
Peter Zijlstra
4793241be4 sched: backward looking buddy
Impact: improve/change/fix wakeup-buddy scheduling

Currently we only have a forward looking buddy, that is, we prefer to
schedule to the task we last woke up, under the presumption that its
going to consume the data we just produced, and therefore will have
cache hot benefits.

This allows co-waking producer/consumer task pairs to run ahead of the
pack for a little while, keeping their cache warm. Without this, we
would interleave all pairs, utterly trashing the cache.

This patch introduces a backward looking buddy, that is, suppose that
in the above scenario, the consumer preempts the producer before it
can go to sleep, we will therefore miss the wakeup from consumer to
producer (its already running, after all), breaking the cycle and
reverting to the cache-trashing interleaved schedule pattern.

The backward buddy will try to schedule back to the task that woke us
up in case the forward buddy is not available, under the assumption
that the last task will be the one with the most cache hot task around
barring current.

This will basically allow a task to continue after it got preempted.

In order to avoid starvation, we allow either buddy to get wakeup_gran
ahead of the pack.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05 10:30:14 +01:00
Peter Zijlstra
d95f98d069 sched: fix fair preempt check
Impact: fix cross-class preemption

Inter-class wakeup preemptions should go on class order.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05 10:30:13 +01:00
Peter Zijlstra
f4b6755fb3 sched: cleanup fair task selection
Impact: cleanup

Clean up task selection

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05 10:30:13 +01:00
Eric Anholt
072ba49838 ftrace: fix breakage in bin_fmt results
In 777e208d40 we changed from outputting
field->cpu (a char) to iter->cpu (unsigned int), increasing the resulting
structure size by 3 bytes.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05 10:22:42 +01:00
Stephen Rothwell
454666eb78 powerpc: Fix "unused variable" warning in pci_dlpar.c
This gets rid of this build warning:

arch/powerpc/platforms/pseries/pci_dlpar.c: In function 'init_phb_dynamic':
arch/powerpc/platforms/pseries/pci_dlpar.c:192: warning: unused variable 'b'

This is one of the very few warnings left in a ppc64_defconfig build and
getting rid of it will make it easier to see future introduced ones (in
fact this was introduced very recently).

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05 19:59:08 +11:00
Alexey Dobriyan
9c8b4aff18 powerpc/cell: Fix compile error in ras.c
This fixes this error on Cell when CONFIG_KEXEC = n:

arch/powerpc/platforms/cell/ras.c:299: error: implicit declaration of function 'crash_shutdown_register'

We have to include <asm/kexec.h> because it contains the dummy
definition of crash_shutdown_register that is used when
CONFIG_KEXEC=n, but <linux/kexec.h> doesn't include <asm/kexec.h> in
that case.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05 19:59:08 +11:00
Alexey Dobriyan
fce4d58353 powerpc/ps3: Fix compile error in ps3-lpm.c
Compiling with CONFIG_SMP = n and CONFIG_PS3_LPM != n gives this error:

drivers/ps3/ps3-lpm.c:838: error: implicit declaration of function 'get_hard_smp_processor_id'

This fixes it.  We have to include <asm/smp.h> rather than
<linux/smp.h> because the UP definition of get_hard_smp_processor_id()
is in <asm/smp.h>, and <linux/smp.h> only includes <asm/smp.h> if
CONFIG_SMP = y.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05 19:59:08 +11:00
Patrick McHardy
9b22ea5609 net: fix packet socket delivery in rx irq handler
The changes to deliver hardware accelerated VLAN packets to packet
sockets (commit bc1d0411) caused a warning for non-NAPI drivers.
The __vlan_hwaccel_rx() function is called directly from the drivers
RX function, for non-NAPI drivers that means its still in RX IRQ
context:

[   27.779463] ------------[ cut here ]------------
[   27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81()
...
[   27.782520]  [<c0264755>] netif_nit_deliver+0x5b/0x75
[   27.782590]  [<c02bba83>] __vlan_hwaccel_rx+0x79/0x162
[   27.782664]  [<f8851c1d>] atl1_intr+0x9a9/0xa7c [atl1]
[   27.782738]  [<c0155b17>] handle_IRQ_event+0x23/0x51
[   27.782808]  [<c015692e>] handle_edge_irq+0xc2/0x102
[   27.782878]  [<c0105fd5>] do_IRQ+0x4d/0x64

Split hardware accelerated VLAN reception into two parts to fix this:

- __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN
  device lookup, then calls netif_receive_skb()/netif_rx()

- vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb()
  in softirq context, performs the real reception and delivery to
  packet sockets.

Reported-and-tested-by: Ramon Casellas <ramon.casellas@cttc.es>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04 14:49:57 -08:00
Andreas Steffen
79654a7698 xfrm: Have af-specific init_tempsel() initialize family field of temporary selector
While adding MIGRATE support to strongSwan, Andreas Steffen noticed that
the selectors provided in XFRM_MSG_ACQUIRE have their family field
uninitialized (those in MIGRATE do have their family set).

Looking at the code, this is because the af-specific init_tempsel()
(called via afinfo->init_tempsel() in xfrm_init_tempsel()) do not set
the value.

Reported-by: Andreas Steffen <andreas.steffen@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
2008-11-04 14:49:19 -08:00
Tony Lindgren
5c32f62b97 ARM: OMAP: Fix define for twl4030 irqs
Otherwise twl4030 gpios won't work.

Signed-off-by: Tony Lindgren <tony@atomide.com>
2008-11-04 13:35:08 -08:00
Tony Lindgren
52414739ca ARM: OMAP: Fix get_irqnr_and_base to clear spurious interrupt bits
On omap24xx, INTCPS_SIR_IRQ_OFFSET bits [6:0] contains the current
active interrupt number.

However, on 34xx INTCPS_SIR_IRQ_OFFSET bits [31:7] also contains the
SPURIOUSIRQFLAG, which gets set if the interrupt sorting information
is invalid.

If the SPURIOUSIRQFLAG bits are not ignored, the interrupt code will
occasionally produce a bunch of confusing errors:

irq -33, desc: c02ddcc8, depth: 0, count: 0, unhandled: 0
->handle_irq():  c006f23c, handle_bad_irq+0x0/0x22c
->chip(): 00000000, 0x0
->action(): 00000000

Fix this by masking out only the ACTIVEIRQ bits. Also fix a
confusing comment.

Signed-off-by: Tony Lindgren <tony@atomide.com>
2008-11-04 13:35:07 -08:00
Zhaolei
e621f266d4 ARM: OMAP: Fix debugfs_create_*'s error checking method for arm/plat-omap
debugfs_create_*() returns NULL if an error occurs, returns -ENODEV
when debugfs is not enabled in the kernel.

Comparing to PATCH v1, because clk_debugfs_init is included in
"#if defined CONFIG_DEBUG_FS", we only need to check NULL return.
Thanks Li Zefan <lizf@cn.fujitsu.com>

debugfs_create_u8() and other function's return value's checking method are
also fixed in this patch.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2008-11-04 13:35:07 -08:00
Sanjeev Premi
85d7a07026 ARM: OMAP: Fix compiler warnings in gpmc.c
Fix these compiler warnings:

gpmc.c: In function 'gpmc_init':
gpmc.c:432: warning: 'return' with a value, in function returning void
gpmc.c:439: warning: 'return' with a value, in function returning void

Signed-off-by: Sanjeev Premi <premi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2008-11-04 13:35:06 -08:00
Tony Luck
f2b3fdc887 [IA64] Build VT-D iommu support into generic kernel
Now that all the ia64 mmu pieces are in the tree we can build
support into the generic kernel.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-11-04 11:32:15 -08:00
FUJITA Tomonori
d8d54b0252 [IA64] remove dead BIO_VMERGE_BOUNDARY definition
The block layer dropped the virtual merge feature
(b8b3e16cfe). BIO_VMERGE_BOUNDARY
definition is meaningless now (For IA64, BIO_VMERGE_BOUNDARY has been
meaningless for a long time since IA64 disables the virtual merge
feature).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-11-04 11:31:58 -08:00
Huang Weiyi
6a2d26fd3f [IA64] remove duplicated #include from pci-dma.c
Removed duplicated #include <asm/machvec.h> and <linux/string.h> in
arch/ia64/kernel/pci-dma.c.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-11-04 11:31:42 -08:00
Joerg Roedel
9979aa7778 [IA64] use common header for software IO/TLB
Remove the swiotlb prototypes from the architecture code and use the
common header file instead.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-11-04 11:31:29 -08:00
Ken'ichi Ohmichi
aca14f3310 [IA64] fix the difference between node_mem_map and node_start_pfn
makedumpfile[1] cannot run on ia64 discontigmem kernel, because the member
node_mem_map of struct pgdat_list has invalid value.  This patch fixes it.

node_start_pfn shows the start pfn of each node, and node_mem_map should
point 'struct page' of each node's node_start_pfn.  On my machine, node0's
node_start_pfn shows 0x400 and its node_mem_map points 0xa0007fffbf000000.
 This address is the same as vmem_map, so the node_mem_map points 'struct
page' of pfn 0, even if its node_start_pfn shows 0x400.

The cause is due to the round down of min_pfn in count_node_pages() and
node0's node_mem_map points 'struct page' of inactive pfn (0x0).  This
patch fixes it.

makedumpfile[1]: dump filtering command
https://sourceforge.net/projects/makedumpfile/

Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Cc: Bernhard Walle <bwalle@suse.de>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-11-04 11:31:12 -08:00
Russ Anderson
d6e15199d1 [IA64] Add error_recovery_info field to SAL section header
Add the error_recovery_info field to the SAL section header,
as defined in the SAL Spec.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-11-04 11:30:43 -08:00
Russ Anderson
7576f68449 [IA64] Add UV watchlist support.
This is used by SGI xp drivers (drivers/misc/sgi-xp).

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-11-04 11:30:23 -08:00
Russ Anderson
9ac8d3fb22 [IA64] Simplify SGI uv vs. sn2 driver issues
Add partition id, coherence id, and region size to UV to
make life simpler for drivers shared between sn2 & uv.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-11-04 11:29:39 -08:00
Linus Torvalds
75fa67706c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  xfrm: Fix xfrm_policy_gc_lock handling.
  niu: Use pci_ioremap_bar().
  bnx2x: Version Update
  bnx2x: Calling netif_carrier_off at the end of the probe
  bnx2x: PCI configuration bug on big-endian
  bnx2x: Removing the PMF indication when unloading
  mv643xx_eth: fix SMI bus access timeouts
  net: kconfig cleanup
  fs_enet: fix polling
  XFRM: copy_to_user_kmaddress() reports local address twice
  SMC91x: Fix compilation on some platforms.
  udp: Fix the SNMP counter of UDP_MIB_INERRORS
  udp: Fix the SNMP counter of UDP_MIB_INDATAGRAMS
  drivers/net/smc911x.c: Fix lockdep warning on xmit.
2008-11-04 08:30:12 -08:00
Linus Torvalds
4edfd20faf Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: mask off DET when restoring SControl for detach
  libata: implement ATA_HORKAGE_ATAPI_MOD16_DMA and apply it
  libata: Fix a potential race condition in ata_scsi_park_show()
  sata_nv: fix generic, nf2/3 detection regression
  sata_via: restore vt*_prepare_host error handling
  sata_promise: add ATA engine reset to reset ops
2008-11-04 08:19:01 -08:00
Jianjun Kong
54074d5932 drivers: remove duplicated #include
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-04 08:18:19 -08:00
Frederic Weisbecker
79a9d461fd tracing/ftrace: fix a bug when switch current tracer to sched tracer
Impact: fix boot tracer + sched tracer coupling bug

Fix a bug that made the sched_switch tracer unable to run
if set as the current_tracer after the boot tracer.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 17:14:08 +01:00
Frederic Weisbecker
efade6e782 tracing/ftrace: types and naming corrections for sched tracer
Impact: cleanup

This patch applies some corrections suggested by Steven Rostedt.

Change the type of shed_ref into int since it is used
into a Mutex, we don't need it anymore as an atomic
variable in the sched_switch tracer.
Also change the name of the register mutex.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 17:14:07 +01:00
Frederic Weisbecker
d7ad44b697 tracing/fastboot: use sched switch tracer from boot tracer
Impact: enhance boot trace output with scheduling events

Use the sched_switch tracer from the boot tracer.

We also can trace schedule events inside the initcalls.
Sched tracing is disabled after the initcall has finished and
then reenabled before the next one is started.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 17:14:06 +01:00
Frederic Weisbecker
e55f605c14 tracing/ftrace: remove unused code in sched_switch tracer
Impact: cleanup

When init_sched_switch_trace() is called, it has no reason to start
the sched tracer if the sched_ref is not zero.

_ If this is non-zero, the tracer is already used, but we can register it
to the tracing engine. There is already a security which avoid the tracer
probes not to be resgistered twice.

_ If this is zero, this block will not be used.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 17:14:05 +01:00
Frederic Weisbecker
07695fa04e tracing/ftrace: fix a race condition in sched_switch tracer
Impact: fix race condition in sched_switch tracer

This patch fixes a race condition in the sched_switch tracer. If
several tasks (IE: concurrent initcalls) are playing with
tracing_start_cmdline_record() and tracing_stop_cmdline_record(), the
following situation could happen:

_ Task A and B are using the same tracepoint probe. Task A holds it.
  Task B is sleeping and doesn't hold it.

_ Task A frees the sched tracer, then sched_ref is decremented to 0.

_ Task A is preempted and hadn't yet unregistered its tracepoint
  probe, then B runs.

_ B increments sched_ref, sees it's 1 and then guess it has to
  register its probe. But it has not been freed by task A.

_ A lot of bad things can happen after that...

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 17:14:04 +01:00
Frederic Weisbecker
71566a0d16 tracing/fastboot: Enable boot tracing only during initcalls
Impact: modify boot tracer

We used to disable the initcall tracing at a specified time (IE: end
of builtin initcalls). But we don't need it anymore. It will be
stopped when initcalls are finished.

However we want two things:

_Start this tracing only after pre-smp initcalls are finished.

_Since we are planning to trace sched_switches at the same time, we
want to enable them only during the initcall execution.

For this purpose, this patch introduce two functions to enable/disable
the sched_switch tracing during boot.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 17:14:02 +01:00
Russell King
d2ed5cb80a [ARM] fix VFP+softfloat binaries
2.6.28-rc tightened up the ELF architecture checks on ARM.  For
non-EABI it only allows VFP if the hardware supports it.  However,
the kernel fails to also inspect the soft-float flag, so it
incorrectly rejects binaries using soft-VFP.

The fix is simple: also check that EF_ARM_SOFT_FLOAT isn't set
before rejecting VFP binaries on non-VFP hardware.

Acked-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-11-04 15:02:09 +00:00
Mark McLoughlin
e4ab1b3cbb x86/docs: remove noirqbalance param docs
Impact: documentation fix

irqbalance was removed by:

  commit 8b8e8c1bf7
  Author: Yinghai Lu <yhlu.kernel@gmail.com>
  Date:   Tue Aug 19 20:50:23 2008 -0700

Remove the associated documentation for noirqbalance.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 14:50:34 +01:00
Takashi Iwai
c4dc507185 ALSA: hda - Limit the number of GPIOs show in proc
Limit the number of GPIOs shown in proc.  Otherwise it gets too long
unnecessarily, and hard to analyze.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-11-04 13:30:57 +01:00
Peter Zijlstra
3299b4dd11 ftrace: sysctl typo
Impact: fix sysctl name typo

Steve must have needed more coffee ;-)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 13:04:40 +01:00
Peter Zijlstra
69f698adcf ftrace: sysrq-z to dump the buffers
Impact: add SysRq-z support to dump trace buffers

Allows one to force an ftrace dump from sysrq

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 11:01:35 +01:00
Steven Rostedt
42ec632e7b ftrace: ftrace.txt version update
Impact: Documentation update only

Update the version that the ftrace document was written for.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 10:12:24 +01:00
Steven Rostedt
9b803c0fc3 ftrace: update txt document
Impact: Documentation update only

A lot of changes have gone into ftrace. This patch updates
the ftrace.txt document.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 10:12:23 +01:00
Steven Rostedt
b2a866f934 ftrace: function tracer with irqs disabled
Impact: disable interrupts during trace entry creation (as opposed to preempt)

To help with performance, I set the ftracer to not disable interrupts,
and only to disable preemption. If an interrupt occurred, it would not
be traced, because the function tracer protects itself from recursion.
This may be faster, but the trace output might miss some traces.

This patch makes the fuction trace disable interrupts, but it also
adds a runtime feature to disable preemption instead. It does this by
having two different tracer functions. When the function tracer is
enabled, it will check to see which version is requested (irqs disabled
or preemption disabled). Then it will use the corresponding function
as the tracer.

Irq disabling is the default behaviour, but if the user wants better
performance, with the chance of missing traces, then they can choose
the preempt disabled version.

Running hackbench 3 times with the irqs disabled and 3 times with
the preempt disabled function tracer yielded:

tracing type       times            entries recorded
------------      --------          ----------------
irq disabled      43.393            166433066
                  43.282            166172618
                  43.298            166256704

preempt disabled  38.969            159871710
                  38.943            159972935
                  39.325            161056510

Average:

   irqs disabled:  43.324           166287462
preempt disabled:  39.079           160300385

 preempt is 10.8 percent faster than irqs disabled.

I wrote a patch to count function trace recursion and reran hackbench.

With irq disabled: 1,150 times the function tracer did not trace due to
  recursion.
with preempt disabled: 5,117,718 times.

The thousand times with irq disabled could be due to NMIs, or simply a case
where it called a function that was not protected by notrace.

But we also see that a large amount of the trace is lost with the
preempt version.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 10:09:50 +01:00
Steven Rostedt
182e9f5f70 ftrace: insert in the ftrace_preempt_disable()/enable() functions
Impact: use new, consolidated APIs in ftrace plugins

This patch replaces the schedule safe preempt disable code with the
ftrace_preempt_disable() and ftrace_preempt_enable() safe functions.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 10:09:49 +01:00
Steven Rostedt
8f0a056fcb ftrace: introduce ftrace_preempt_disable()/enable()
Impact: add new ftrace-plugin internal APIs

Parts of the tracer needs to be careful about schedule recursion.
If the NEED_RESCHED flag is set, a preempt_enable will call schedule.
Inside the schedule function, the NEED_RESCHED flag is cleared.

The problem arises when a trace happens in the schedule function but before
NEED_RESCHED is cleared. The race is as follows:

schedule()
  >> tracer called

    trace_function()
       preempt_disable()
       [ record trace ]
       preempt_enable()  <<- here's the issue.

         [check NEED_RESCHED]
          schedule()
          [ Repeat the above, over and over again ]

The naive approach is simply to use preempt_enable_no_schedule instead.
The problem with that approach is that, although we solve the schedule
recursion issue, we now might lose a preemption check when not in the
schedule function.

  trace_function()
    preempt_disable()
    [ record trace ]
    [Interrupt comes in and sets NEED_RESCHED]
    preempt_enable_no_resched()
    [continue without scheduling]

The way ftrace handles this problem is with the following approach:

	int resched;

	resched = need_resched();
	preempt_disable_notrace();
	[record trace]
	if (resched)
		preempt_enable_no_sched_notrace();
	else
		preempt_enable_notrace();

This may seem like the opposite of what we want. If resched is set
then we call the "no_sched" version??  The reason we do this is because
if NEED_RESCHED is set before we disable preemption, there's two reasons
for that:

  1) we are in an atomic code path
  2) we are already on our way to the schedule function, and maybe even
     in the schedule function, but have yet to clear the flag.

Both the above cases we do not want to schedule.

This solution has already been implemented within the ftrace infrastructure.
But the problem is that it has been implemented several times. This patch
encapsulates this code to two nice functions.

  resched = ftrace_preempt_disable();
  [ record trace]
  ftrace_preempt_enable(resched);

This way the tracers do not need to worry about getting it right.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 10:09:48 +01:00
Alok Kataria
70de9a9704 x86: don't use tsc_khz to calculate lpj if notsc is passed
Impact: fix udelay when "notsc" boot parameter is passed

With notsc passed on commandline, tsc may not be used for
udelays, make sure that we do not use tsc_khz to calculate
the lpj value in such cases.

Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 09:55:26 +01:00