Commit graph

133225 commits

Author SHA1 Message Date
Eric Biederman
8f19d47293 seq_file: properly cope with pread
Currently seq_read assumes that the offset passed to it is always the
offset it passed to user space.  In the case pread this assumption is
broken and we do the wrong thing when presented with pread.

To solve this I introduce an offset cache inside of struct seq_file so we
know where our logical file position is.  Then in seq_read if we try to
read from another offset we reset our data structures and attempt to go to
the offset user space wanted.

[akpm@linux-foundation.org: restore FMODE_PWRITE]
[pjt@google.com: seq_open needs its fmode opened up to take advantage of this]
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Turner <pjt@google.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:53 -08:00
Paul Turner
55ec82176e vfs: separate FMODE_PREAD/FMODE_PWRITE into separate flags
Separate FMODE_PREAD and FMODE_PWRITE into separate flags to reflect the
reality that the read and write paths may have independent restrictions.

A git grep verifies that these flags are always cleared together so this
new behavior will only apply to interfaces that change to clear flags
individually.

This is required for "seq_file: properly cope with pread", a post-2.6.25
regression fix.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Paul Turner <pjt@google.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc:  Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:53 -08:00
Li Zefan
b851ee7921 cgroups: update documentation about css_set hash table
The css_set hash table was introduced in 2.6.26, so update the
documentation accordingly.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:53 -08:00
Ed Cashin
b6d6c51758 aoe: ignore vendor extension AoE responses
The Welland ME-747K-SI AoE target generates unsolicited AoE responses that
are marked as vendor extensions.  Instead of ignoring these packets, the
aoe driver was generating kernel messages for each unrecognized response
received.  This patch corrects the behavior.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Reported-by: <karaluh@karaluh.pl>
Tested-by: <karaluh@karaluh.pl>
Cc: <stable@kernel.org>
Cc: Alex Buell <alex.buell@munted.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:53 -08:00
Benjamin Herrenschmidt
c296861291 vmalloc: add __get_vm_area_caller()
We have get_vm_area_caller() and __get_vm_area() but not
__get_vm_area_caller()

On powerpc, I use __get_vm_area() to separate the ranges of addresses
given to vmalloc vs.  ioremap (various good reasons for that) so in order
to be able to implement the new caller tracking in /proc/vmallocinfo, I
need a "_caller" variant of it.

(akpm: needed for ongoing powerpc development, so merge it early)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:53 -08:00
Jean Pihet
3ebf74b1de omap_hsmmc: Change while(); loops with finite version
Replace the infinite 'while() ;' loops
with a finite loop version.

Signed-off-by: Jean Pihet <jpihet@mvista.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-18 22:14:21 +01:00
Jean Pihet
c232f457e4 omap_hsmmc: recover from transfer failures
Timeouts during a command that has a data phase can result in the next
command issued after the command that failed not being processed, i.e.  no
interrupt ever occurs to indicate the command has completed.  This failure
can result in a deadlock.

This patch resets the data state machine to clear the error in case of a
command timeout.

Tested on OMAP3430 chip and intensive MMC/SD device removal while
transferring data.

Signed-off-by: Andy Lowe <alowe@mvista.com>
Signed-off-by: Jean Pihet <jpihet@mvista.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-18 22:10:49 +01:00
David Brownell
eb25082657 omap_hsmmc: only MMC1 allows HCTL.SDVS != 1.8V
Based on a patch from Tony Lindgren ... after initialization,
never change HCTL.SDVS except for MMC1.  The other controller
instances only support 1.8V in that field, although they can
suport other card/SDIO/eMMC/... voltages with level shifting
solutions such as external transceivers.

MMC2 behavior sanity tested on Overo/WLAN, OMAP3430 SDP, and
custom hardware.  MMC1 also sanity tested on those platforms
plus Beagle.  This also fixes a bug preventing MMC2 (and also
presumably MMC3) from powering down when requested.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-18 22:09:56 +01:00
David Brownell
249d0fa9d5 omap_hsmmc: card detect irq bugfix
Work around lockdep issue when card detect IRQ handlers run in
thread context ... it forces IRQF_DISABLED, which prevents all
access to twl4030 card detect signals.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-18 21:27:30 +01:00
Helmut Schaa
5dbace0c9b sdhci: fix led naming
Fix the led device naming for the sdhci driver.

The led class documentation defines the led name to have the
form "devicename:colour:function" while not applicable sections
should be left blank.

To comply with the documentation the led device name is changed
from "mmc*" to "mmc*::".

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-18 21:02:38 +01:00
Rabin Vincent
58a5dd3e0e mmc_test: fix basic read test
Due to a typo in the Basic Read test, it's currently identical to the
Basic Write test.  Fix this.

Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-18 21:01:14 +01:00
Yauhen Kharuzhy
9942448837 s3cmci: Fix hangup in do_pio_write()
This commit fixes the regression what was added by commit
088a78af97 "s3cmci: Support transfers
which are not multiple of 32 bits."

fifo_free() now returns amount of available space in FIFO buffer in
bytes.  But do_pio_write() writes to FIFO 32-bit words.  Condition for
return from cycle is (fifo_free() == 0), but when fifo has 1..3 bytes
of free space then this condition will never be true and system hangs.

This patch changes condition in the while() to (fifo_free() > 3).

Signed-off-by: Yauhen Kharuzhy <jekhor@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-18 20:56:04 +01:00
Yinghai Lu
e0ae4f5503 PCI quirk: enable MSI on 8132
David reported that LSI SAS doesn't work with MSI.  It turns out that
his BIOS doesn't enable it, but the HT MSI 8132 does support HT MSI.
Add quirk to enable it

Cc: stable@kernel.org
Reported-by: David Lang <david@lang.hm>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-02-18 10:59:46 -08:00
Steven Rostedt
712406a6bf tracing/function-graph-tracer: make arch generic push pop functions
There is nothing really arch specific of the push and pop functions
used by the function graph tracer. This patch moves them to generic
code.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-18 13:43:04 -05:00
Takashi Iwai
2678f60d2b ALSA: jack - Use card->shortname for input name
Currently the jack layer refers to card->longname as a part of
its input device name string.  However, longname is often really long
and way too ugly as an identifier, such as,
"HDA Intel at 0xf8400000 irq 21".

This patch changes the code to use card->shortname instead.
The shortname string contains usually the h/w vendor and product
names but without messy I/O port or IRQ numbers.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-02-18 16:46:27 +01:00
Jan Engelhardt
eb132205ca netfilter: make proc/net/ip* print names from foreign NFPROTO
When extensions were moved to the NFPROTO_UNSPEC wildcard in
ab4f21e6fb, they disappeared from the
procfs files.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 16:42:19 +01:00
Patrick McHardy
5962fc6d5f netfilter: nf_conntrack: don't try to deliver events for untracked connections
The untracked conntrack actually does usually have events marked for
delivery as its not special-cased in that part of the code. Skip the
actual delivery since it impacts performance noticeably.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 15:30:34 +01:00
Eric Leblond
2c6764b743 netfilter: nfnetlink_log: fix timeout handling
NFLOG timeout was computed in timer by doing:

    flushtimeout*HZ/100

Default value of flushtimeout was HZ (for 1 second delay). This was
wrong for non 100HZ computer. This patch modify the default delay by
using 100 instead of HZ.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 15:29:49 +01:00
Eric Leblond
5ca431f9ae netfilter: nfnetlink_log: fix per-rule qthreshold override
In NFLOG the per-rule qthreshold should overrides per-instance only
it is set. With current code, the per-rule qthreshold is 1 if not set
and it overrides the per-instance qthreshold.

This patch modifies the default xt_NFLOG threshold from 1 to
0. Thus a value of 0 means there is no per-rule setting and the instance
parameter has to apply.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 15:29:23 +01:00
Eric Leblond
4aa3b2ee19 netfilter: nf_conntrack_ipv6: fix nf_log_packet message in icmpv6 conntrack
This patch fixes a trivial typo that was adding a new line at end of
the nf_log_packet() prefix. It also make the logging conditionnal by
adding a LOG_INVALID test.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 15:28:46 +01:00
Patrick McHardy
4667ba1511 Merge branch 'master' of /repos/git/net-2.6 2009-02-18 15:16:18 +01:00
Hannes Reinecke
be987fdb55 block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
blk_abort_queue() iterates the timeout list and aborts each request on the
list, but if the driver error handling readds a request to the timeout list
during this processing, we could be looping forever. Fix this by splicing
current entries to a local list and run over that list instead.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-18 10:34:16 +01:00
Neil Brown
41b8c853a4 block: fix booting from partitioned md array
Hi Tejun,

 it looks like your commit:

   block: don't depend on consecutive minor space
   f331c0296f

 broke a particular case for booting from partitioned md/raid devices.
 That is the second time this has been broken recently.  The previous
 time was fixed by

   block: do_mounts - accept root=<non-existant partition>
   30f2f0eb4b

 Because the data isn't available when an md device is first created
 (we add disks and set it up after creation), the initial partition
 scan finds nothing.  It is not until the device is opened that
 another partition scan happens and finds something.

 So at the point where the kernel parameter "root=/dev/md_d0p1" is
 being parsed, md_d0 exists, but md_d0p1 does not.
 However if we let blk_lookup_devt return the correct device number
 even though the device doesn't exist, then the attempt to mount it
 will successfully find the partition.

 I have tried in the past to find a way to get the partition table to
 be read as soon as the array is assembled but that proved impossible
 (at the time).  I don't remember the details, and could possibly
 revisit it.  However it would be really nice if blk_lookup_devt
 could be adjusted to again accept non existant partitions.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-18 10:33:59 +01:00
Jens Axboe
78f707bfc7 block: revert part of 18ce3751cc
The above commit added WRITE_SYNC and switched various places to using
that for committing writes that will be waited upon immediately after
submission. However, this causes a performance regression with AS and CFQ
for ext3 at least, since sync_dirty_buffer() will submit some writes with
WRITE_SYNC while ext3 has sumitted others dependent writes without the sync
flag set. This causes excessive anticipation/idling in the IO scheduler
because sync and async writes get interleaved, causing a big performance
regression for the below test case (which is meant to simulate sqlite
like behaviour).

---- test case ----

int main(int argc, char **argv)
{

	int fdes, i;
	FILE *fp;
	struct timeval start;
	struct timeval end;
	struct timeval res;

	gettimeofday(&start, NULL);
	for (i=0; i<ROWS; i++) {
		fp = fopen("test_file", "a");
		fprintf(fp, "Some Text Data\n");
		fdes = fileno(fp);
		fsync(fdes);
		fclose(fp);
	}
	gettimeofday(&end, NULL);

	timersub(&end, &start, &res);
	fprintf(stdout, "time to write %d lines is %ld(msec)\n", ROWS,
			(res.tv_sec*1000000 + res.tv_usec)/1000);

	return 0;
}

-------------------

Thanks to Sean.White@APCC.com for tracking down this performance
regression and providing a test case.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-18 10:32:01 +01:00
Chip Coldwell
82eb03cfd8 cciss: PCI power management reset for kexec
The kexec kernel resets the CCISS hardware in three steps:

1. Use PCI power management states to reset the controller in the
   kexec kernel.

2. Clear the MSI/MSI-X bits in PCI configuration space so that MSI
   initialization in the kexec kernel doesn't fail.

3. Use the CCISS "No-op" message to determine when the controller
   firmware has recovered from the PCI PM reset.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-18 10:32:01 +01:00
Roel Kluin
c8cbec6bdf paride/pg.c: xs(): &&/|| confusion
&&/|| confusion

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-18 10:32:01 +01:00
Subhash Peddamallu
a60e78e57a fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free
When freeing from bio pool use right ptr to account for bs->front_pad,
instead of bio ptr,

Signed-off-by: Subhash Peddamallu <subhash.peddamallu@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-18 10:32:01 +01:00
Jens Axboe
93dbb39350 block: fix bad definition of BIO_RW_SYNC
We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before
213d9417fe.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-18 10:32:00 +01:00
Boaz Harrosh
c1c201200a bsg: Fix sense buffer bug in SG_IO
When submitting requests via SG_IO, which does a sync io, a
bsg_command is not allocated. So an in-Kernel sense_buffer was not
set. However when calling blk_execute_rq() with no sense buffer
one is provided from the stack. Now bsg at blk_complete_sgv4_hdr_rq()
would check if rq->sense_len and a sense was requested by sg_io_v4
the rq->sense was copy_user() back, but by now it is already mangled
stack memory.

I have fixed that by forcing a sense_buffer when calling bsg_map_hdr().
The bsg_command->sense is provided in the write/read path like before,
and on-the-stack buffer is provided when doing SG_IO.

I have also fixed a dprintk message to print rq->errors in hex because
of the scsi bit-field use of this member. For other block devices it
does not matter anyway.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-18 10:32:00 +01:00
Pierre Ossman
86a6a8749d Revert "sdhci: force high speed capability on some controllers"
This reverts commit a4b7619377.

It turned out that the controller had problem running at the
higher speed, so go back to trusting the hardware capability
bits.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-18 06:36:22 +01:00
Yi Li
444122fd58 MMC: fix bug - SDHC card capacity not correct
Signed-off-by: Yi Li <yi.li@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-18 06:27:33 +01:00
David S. Miller
92a0acce18 net: Kill skb_truesize_check(), it only catches false-positives.
A long time ago we had bugs, primarily in TCP, where we would modify
skb->truesize (for TSO queue collapsing) in ways which would corrupt
the socket memory accounting.

skb_truesize_check() was added in order to try and catch this error
more systematically.

However this debugging check has morphed into a Frankenstein of sorts
and these days it does nothing other than catch false-positives.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-17 21:24:05 -08:00
Zlatko Calusic
5955c7a2cf Add support for VT6415 PCIE PATA IDE Host Controller
Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-17 16:56:31 -08:00
Rafael J. Wysocki
3494252d56 USB/PCI: Fix resume breakage of controllers behind cardbus bridges
If a USB PCI controller is behind a cardbus bridge, we are trying to
restore its configuration registers too early, before the cardbus
bridge is operational.  To fix this, call pci_restore_state() from
usb_hcd_pci_resume() and remove usb_hcd_pci_resume_early() which is
no longer necessary (the configuration spaces of USB controllers that
are not behind cardbus bridges will be restored by the PCI PM core
with interrupts disabled anyway).

This patch fixes the regression from 2.6.28 tracked as
http://bugzilla.kernel.org/show_bug.cgi?id=12659

[ Side note: the proper long-term fix is probably to just force the
  unplug event at suspend time instead of doing a plug/unplug at resume
  time, but this patch is fine regardless  - Linus ]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-17 16:56:31 -08:00
Frederic Weisbecker
fa7c7f6e11 tracing/core: remove unused parameter in tracing_fill_pipe_page()
Impact: cleanup

The struct page *pages parameter is unused.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-18 01:40:20 +01:00
Frederic Weisbecker
6eaaa5d57e tracing/core: use appropriate waiting on trace_pipe
Impact: api and pipe waiting change

Currently, the waiting used in tracing_read_pipe() is done through a
100 msecs schedule_timeout() loop which periodically check if there
are traces on the buffer.

This can cause small latencies for programs which are reading the incoming
events.

This patch makes the reader waiting for the trace_wait waitqueue except
for few tracers such as the sched and functions tracers which might be
already hold the runqueue lock while waking up the reader.

This is performed through a new callback wait_pipe() on struct tracer.
If none is implemented on a specific tracer, the default waiting for
trace_wait queue is attached.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-18 01:40:20 +01:00
Ingo Molnar
ac07bcaa82 Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace 2009-02-18 01:09:07 +01:00
Ingo Molnar
37bd824a35 Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core 2009-02-18 01:08:13 +01:00
Huang Ying
ef41df4344 x86, mce: fix a race condition in mce_read()
Impact: bugfix

Considering the situation as follow:

before: mcelog.next == 1, mcelog.entry[0].finished = 1

+--------------------------------------------------------------------------
R                   W1                  W2                  W3

read mcelog.next (1)
                    mcelog.next++ (2)
                    (working on entry 1,
                    finished == 0)

mcelog.next = 0
                                        mcelog.next++ (1)
                                        (working on entry 0)
                                                           mcelog.next++ (2)
                                                           (working on entry 1)
                        <----------------- race ---------------->
                    (done on entry 1,
                    finished = 1)
                                                           (done on entry 1,
                                                           finished = 1)

To fix the race condition, a cmpxchg loop is added to mce_read() to
ensure no new MCE record can be added between mcelog.next reading and
mcelog.next = 0.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:33:05 -08:00
Andi Kleen
d6b75584a3 x86, mce: disable machine checks on offlined CPUs
Impact: Lower priority bug fix

Offlined CPUs could still get machine checks, but the machine check handler
cannot handle them properly, leading to an unconditional crash. Disable
machine checks on CPUs that are going down.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:56 -08:00
Andi Kleen
5b4408fdaa x86, mce: don't set up mce sysdev devices with mce=off
Impact: bug fix, in this case the resume handler shouldn't run which
	avoids incorrectly reenabling machine checks on resume

When MCEs are completely disabled on the command line don't set
up the sysdev devices for them either.

Includes a comment fix from Thomas Gleixner.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:50 -08:00
Andi Kleen
52d168e28b x86, mce: switch machine check polling to per CPU timer
Impact: Higher priority bug fix

The machine check poller runs a single timer and then broadcasted an
IPI to all CPUs to check them. This leads to unnecessary
synchronization between CPUs. The original CPU running the timer has
to wait potentially a long time for all other CPUs answering. This is
also real time unfriendly and in general inefficient.

This was especially a problem on systems with a lot of events where
the poller run with a higher frequency after processing some events.
There could be more and more CPU time wasted with this, to
the point of significantly slowing down machines.

The machine check polling is actually fully independent per CPU, so
there's no reason to not just do this all with per CPU timers.  This
patch implements that.

Also switch the poller also to use standard timers instead of work
queues. It was using work queues to be able to execute a user program
on a event, but mce_notify_user() handles this case now with a
separate callback. So instead always run the poll code in in a
standard per CPU timer, which means that in the common case of not
having to execute a trigger there will be less overhead.

This allows to clean up the initialization significantly, because
standard timers are already up when machine checks get init'ed.  No
multiple initialization functions.

Thanks to Thomas Gleixner for some help.

Cc: thockin@google.com
v2: Use del_timer_sync() on cpu shutdown and don't try to handle
migrated timers.
v3: Add WARN_ON for timer running on unexpected CPU

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:44 -08:00
Andi Kleen
9bd9840580 x86, mce: always use separate work queue to run trigger
Impact: Needed for bug fix in next patch

This relaxes the requirement that mce_notify_user has to run in process
context. Useful for future changes, but also leads to cleaner
behaviour now. Now instead mce_notify_user can be called directly
from interrupt (but not NMI) context.

The work queue only uses a single global work struct, which can be done safely
because it is always free to reuse before the trigger function is executed.
This way no events can be lost.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:41 -08:00
Andi Kleen
123aa76ec0 x86, mce: don't disable machine checks during code patching
Impact: low priority bug fix

This removes part of a a patch I added myself some time ago. After some
consideration the patch was a bad idea. In particular it stopped machine check
exceptions during code patching.

To quote the comment:

        * MCEs only happen when something got corrupted and in this
        * case we must do something about the corruption.
        * Ignoring it is worse than a unlikely patching race.
        * Also machine checks tend to be broadcast and if one CPU
        * goes into machine check the others follow quickly, so we don't
        * expect a machine check to cause undue problems during to code
        * patching.

So undo the machine check related parts of
8f4e956b31 NMIs are still disabled.

This only removes code, the only additions are a new comment.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:38 -08:00
Andi Kleen
973a2dd1d5 x86, mce: disable machine checks on suspend
Impact: Bug fix

During suspend it is not reliable to process machine check
exceptions, because CPUs disappear but can still get machine check
broadcasts.  Also the system is slightly more likely to
machine check them, but the handler is typically not a position
to handle them in a meaningfull way.

So disable them during suspend and enable them during resume.

Also make sure they are always disabled on hot-unplugged CPUs.

This new code assumes that suspend always hotunplugs all
non BP CPUs.

v2: Remove the WARN_ONs Thomas objected to.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:14 -08:00
Andi Kleen
07db1c140e x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown
Impact: Bugfix

The ifdef for the apic clear on shutdown for the 64bit intel thermal
vector was incorrect and never triggered. Fix that.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:24:34 -08:00
Andi Kleen
380851bc6b x86, mce: use force_sig_info to kill process in machine check
Impact: bug fix (with tolerant == 3)

do_exit cannot be called directly from the exception handler because
it can sleep and the exception handler runs on the exception stack.
Use force_sig() instead.

Based on a earlier patch by Ying Huang who debugged the problem.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:24:31 -08:00
Andi Kleen
6ec68bff3c x86, mce: reinitialize per cpu features on resume
Impact: Bug fix

This fixes a long standing bug in the machine check code. On resume the
boot CPU wouldn't get its vendor specific state like thermal handling
reinitialized. This means the boot cpu wouldn't ever get any thermal
events reported again.

Call the respective initialization functions on resume

v2: Remove ancient init because they don't have a resume device anyways.
    Pointed out by Thomas Gleixner.
v3: Now fix the Subject too to reflect v2 change

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:24:28 -08:00
Nicolas Pitre
fd4b9b3650 [ARM] 5401/1: Orion: fix edge triggered GPIO interrupt support
The GPIO interrupts can be configured as either level triggered or edge
triggered, with a default of level triggered.  When an edge triggered
interrupt is requested, the gpio_irq_set_type method is called which
currently switches the given IRQ descriptor between two struct irq_chip
instances: orion_gpio_irq_level_chip and orion_gpio_irq_edge_chip. This
happens via __setup_irq() which also calls irq_chip_set_defaults() to
assign default methods to uninitialized ones.  The problem is that
irq_chip_set_defaults() is called before the irq_chip reference is
switched, leaving the new irq_chip (orion_gpio_irq_edge_chip in this
case) with uninitialized methods such as chip->startup() causing a kernel
oops.

Many solutions are possible, such as making irq_chip_set_defaults() global
and calling it from gpio_irq_set_type(), or calling __irq_set_trigger()
before irq_chip_set_defaults() in __setup_irq().  But those require
modifications to the generic IRQ code which might have adverse effect on
other architectures, and that would still be a fragile arrangement.
Manually copying the missing methods from within gpio_irq_set_type()
would be really ugly and it would break again the day new methods with
automatic defaults are added.

A better solution is to have a single irq_chip instance which can deal
with both edge and level triggered interrupts.  It is also a good idea
to switch the IRQ handler instead, as the edge IRQ handler allows for
one edge IRQ event to be queued as the IRQ is actually masked only when
that second IRQ is received, at which point the hardware can queue an
additional IRQ event, making edge triggered interrupts a bit more
reliable.

Tested-by: Martin Michlmayr <tbm@cyrius.com>

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-17 22:37:09 +00:00
Linus Torvalds
e78ac4b9de Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: cpu hotplug fix
2009-02-17 14:30:06 -08:00