Commit graph

413245 commits

Author SHA1 Message Date
Geert Uytterhoeven
6bab2c613d genirq: Correct fuzzy and fragile IRQ_RETVAL() definition
commit bedd30d986 ("genirq: make irqreturn_t an enum") blindly replaced
"0" by "IRQ_NONE" in the "IRQ_RETVAL(x)" macro definition.

However, as "x" is a condition, "0" meant "boolean false", not an
irqreturn_t value.

All of this worked, and kept working after the addition of IRQ_WAKE_THREAD,
as
  - both "boolean false" and "IRQ_NONE" are "0" (for the comparison),
  - "boolean true" and "boolean false" nicely map to the correct values of
    "IRQ_HANDLED" and "IRQ_NONE" (for the return value).

Correct the macro definition for clarity and future-proofness.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-11-19 19:06:41 +01:00
Jens Axboe
94eddfbeaa blk-mq: ensure that we set REQ_IO_STAT so diskstats work
If disk stats are enabled on the queue, a request needs to
be marked with REQ_IO_STAT for accounting to be active on
that request. This fixes an issue with virtio-blk not
showing up in /proc/diskstats after the conversion to
blk-mq.

Add QUEUE_FLAG_MQ_DEFAULT, setting stats and same cpu-group
completion on by default.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-19 09:25:07 -07:00
Shigeru Yoshida
0515973ffb sched: Fix a trivial typo in comments
Fix a trivial typo in rq_attach_root().

Signed-off-by: Shigeru Yoshida <shigeru.yoshida@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131117.121236.1990617639803941055.shigeru.yoshida@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 17:01:18 +01:00
Alex Shi
b972fc308c sched: Remove unused variable in 'struct sched_domain'
The 'u64 last_update' variable isn't used now, remove it to save a bit of space.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Morten.Rasmussen@arm.com
Cc: linaro-kernel@lists.linaro.org
Link: http://lkml.kernel.org/r/1384852912-24791-1-git-send-email-alex.shi@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 17:01:17 +01:00
Peter Zijlstra
42eb088ed2 sched: Avoid NULL dereference on sd_busy
Commit 37dc6b50ce ("sched: Remove unnecessary iteration over sched
domains to update nr_busy_cpus") forgot to clear 'sd_busy' under some
conditions leading to a possible NULL deref in set_cpu_sd_state_idle().

Reported-by: Anton Blanchard <anton@samba.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131118113701.GF3866@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 17:01:16 +01:00
Srikar Dronamraju
9abf24d465 sched: Check sched_domain before computing group power
After commit 863bffc808 ("sched/fair: Fix group power_orig
computation"), we can dereference rq->sd before it is set.

Fix this by falling back to power_of() in this case and add a comment
explaining things.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[ Added comment and tweaked patch. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mikey@neuling.org
Link: http://lkml.kernel.org/r/20131113151718.GN21461@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 17:01:15 +01:00
Daniel Vetter
1021442098 drm/i915: Replicate BIOS eDP bpp clamping hack for hsw
Haswell's DDI encoders have their own ->get_config callback and in

commit c6cd2ee2d5
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Mon Oct 21 10:52:07 2013 +0300

    drm/i915/dp: workaround BIOS eDP bpp clamping issue

we've forgotten to replicate this hack. So let's do it that.

Note for backporters: The above commit and all it's depencies need to
be backported first.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71049
Cc: stable@vger.kernel.org
Tested-by: Gökçen Eraslan <gokcen.eraslan@gmail.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-19 16:59:39 +01:00
Vince Weaver
0022cedd4a perf/trace: Properly use u64 to hold event_id
The 64-bit attr.config value for perf trace events was being copied into
an "int" before doing a comparison, meaning the top 32 bits were
being truncated.

As far as I can tell this didn't cause any errors, but it did mean
it was possible to create valid aliases for all the tracepoint ids
which I don't think was intended.  (For example, 0xffffffff00000018
and 0x18 both enable the same tracepoint).

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1311151236100.11932@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 16:57:44 +01:00
Peter Zijlstra
06db0b2171 perf: Remove fragile swevent hlist optimization
Currently we only allocate a single cpu hashtable for per-cpu
swevents; do away with this optimization for it is fragile in the face
of things like perf_pmu_migrate_context().

The easiest thing is to make sure all CPUs are consistent wrt state.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130913111447.GN31370@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 16:57:42 +01:00
Peter Zijlstra
d5b5f391d4 ftrace, perf: Avoid infinite event generation loop
Vince's perf-trinity fuzzer found yet another 'interesting' problem.

When we sample the irq_work_exit tracepoint with period==1 (or
PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an
infinite event generation loop:

  ,-> <IPI>
  |     irq_work_exit() ->
  |       trace_irq_work_exit() ->
  |         ...
  |           __perf_event_overflow() -> (due to fasync)
  |             irq_work_queue() -> (irq_work_list must be empty)
  '---------      arch_irq_work_raise()

Similar things can happen due to regular poll() wakeups if we exceed
the ring-buffer wakeup watermark, or have an event_limit.

To avoid this, dis-allow sampling this particular tracepoint.

In order to achieve this, create a special perf_perm function pointer
for each event and call this (when set) on trying to create a
tracepoint perf event.

[ roasted: use expr... to allow for ',' in your expression ]

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 16:57:40 +01:00
Luís Fernando Cornachioni Estrozi
acab78b996 netfilter: ebt_ip6: fix source and destination matching
This bug was introduced on commit 0898f99a2. This just recovers two
checks that existed before as suggested by Bart De Schuymer.

Signed-off-by: Luís Fernando Cornachioni Estrozi <lestrozi@uolinc.com>
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-19 15:33:29 +01:00
Andrew Morton
050ded1bba tick: Document tick_do_timer_cpu
Taken straight from a tglx email ;)

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-11-19 14:59:50 +01:00
Joe Perches
da554eba2e timer: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
Use the helper function instead of __GFP_ZERO.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-11-19 14:59:50 +01:00
Thomas Gleixner
d689fe222a NOHZ: Check for nohz active instead of nohz enabled
RCU and the fine grained idle time accounting functions check
tick_nohz_enabled. But that variable is merily telling that NOHZ has
been enabled in the config and not been disabled on the command line.

But it does not tell anything about nohz being active. That's what all
this should check for.

Matthew reported, that the idle accounting on his old P1 machine
showed bogus values, when he enabled NOHZ in the config and did not
disable it on the kernel command line. The reason is that his machine
uses (refined) jiffies as a clocksource which explains why the "fine"
grained accounting went into lala land, because it depends on when the
system goes and leaves idle relative to the jiffies increment.

Provide a tick_nohz_active indicator and let RCU and the accounting
code use this instead of tick_nohz_enable.

Reported-and-tested-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: john.stultz@linaro.org
Cc: mwhitehe@redhat.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311132052240.30673@ionos.tec.linutronix.de
2013-11-19 14:59:50 +01:00
Steven Rostedt
eff2c92f86 tools lib traceevent: Fix use of multiple options in processing field
Jiri Olsa reported that the scsi_dispatch_cmd_done event failed to parse
with:

  Error: expected type 5 but read 4
  Error: expected type 5 but read 4

The problem is with this part of the print_fmt:

  __print_symbolic(((REC->result) >> 24) & 0xff, ...

The __print_symbolic() helper function's first parameter is the field to
use to determine what symbol to print based on the value of the result.
The parser can handle one operation, but it can not handle multiple
operations ('>>' and '&').

Add code to process all operations for the field argument for
__print_symbolic() as well as __print_flags().

Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20131118142314.27ca334b@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-11-19 10:34:39 -03:00
Namhyung Kim
50a2740b83 perf header: Fix possible memory leaks in process_group_desc()
After processing all group descriptors or encountering an error, it
frees all descriptors.  However, current logic can leak memory since it
might not traverse all descriptors.

Note that the 'i' can have different value than nr_groups when an error
occurred and it's safe to call free(desc[i].name) for every desc since
we already make it NULL when it's reused for group names.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1384741244-7271-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-11-19 10:34:05 -03:00
Namhyung Kim
210e812f03 perf header: Fix bogus group name
When processing event group descriptor in perf file header, we reuse an
allocated group name but forgot to prevent it from freeing.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1384741244-7271-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-11-19 10:33:57 -03:00
Frederic Weisbecker
a5285ad9e3 perf tools: Tag thread comm as overriden
The problem is that when a thread overrides its default ":%pid" comm, we
forget to tag the thread comm as overriden. Hence, this overriden comm
is not inherited on future forks. Fix it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20131116010207.GA18855@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-11-19 10:33:29 -03:00
Chris Wilson
7c6c2652ba drm/i915: Do not enable package C8 on unsupported hardware
If the hardware does not support package C8, then do not even schedule
work to enable it. Thereby we can eliminate a bunch of dangerous work.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-19 13:05:09 +01:00
Chris Wilson
be1c1fe21b drm/i915: Hold pc8 lock around toggling pc8.gpu_idle
We need to hold the pc8 lock around toggling the value of gpu_idle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-19 13:03:38 +01:00
David Henningsson
521290003a ALSA: hda - Also enable mute/micmute LED control for "Lenovo dock" fixup
The docking station is a Thinkpad thing, so it makes sense to check
for mute/micmute LEDs for that quirk type too.

Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-11-19 11:28:41 +01:00
Jiri Kosina
9316e58076 Revert "HID: wiimote: add LEGO-wiimote VID"
This reverts commit 86b84167d4 as it introduced a
VID/PID conflict with its original owner: hid-wiimote got
hid:b0005g*v0000054Cp00000306 added but hid-sony already has this id for the
PS3 Remote (and the ID is oficically assigned to Sony).

Revert the commit to avoid hid-sony regression. David is working on a
bluez patch to force proper ID on the wiimote.

Reported-by: David Herrmann <dh.herrmann@gmail.com>
Reported-by: Michel Kraus <mksolpa@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-11-19 11:28:03 +01:00
Gleb Natapov
2ecd1aba59 Fix percpu vmalloc allocations
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJSiDFHAAoJEEtpOizt6ddyCWQH/jmTxPGBK3Bx1qTr25izgW8Z
 69SYTfkRoCV/3O3VWh9LiOSz9zrnCm0ODopQmMUDbEYF8KnzMiLASZ5mf3xmvoo1
 3KB/upM6DXEd2u2ZEQKnwp0cQZmcTzXM0PfGZ86xlmLFH6zjMt0iPa/FwflQMWsM
 QJyRJOMrB/WyLbY77FdsyK8RD4fJy7mlLS6LSXnnDb5gMeiPu8d9PWXdlIwCsEe3
 Zk0AAyJBXYn0yjzrZ/5HbUlmTBMLbEIomzCqCVAglpiXUb7E0w7YoyTc3dagyMyK
 zZ+6OqJAZpIi5tliGuhadLsioinM/L9xfnOFtd8jtvPqZbYcetkphTbBg9Bj4mw=
 =VmDY
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-fixes-3.13-1' of git://git.linaro.org/people/cdall/linux-kvm-arm into next

Fix percpu vmalloc allocations
2013-11-19 10:43:05 +02:00
Sven Eckelmann
9f323b6811 HID: sony: Send FF commands in non-atomic context
The ff_memless has a timer running which gets run in an atomic context and
calls the play_effect callback. The callback function for sony uses the
hid_output_raw_report (overwritten by sixaxis_usb_output_raw_report) function
to handle differences in the control message format. It is not safe for an
atomic context because it may sleep later in usb_start_wait_urb.

This "scheduling while atomic" can cause the system to lock up. A workaround is
to make the force feedback state update using work_queues and use the
play_effect function only to enqueue the work item.

Reported-by: Simon Wood <simon@mungewell.org>
Reported-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wood <simon@mungewell.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-11-19 09:37:45 +01:00
Takashi Sakamoto
777fb574a5 ALSA: firewire-lib: include sound/asound.h to refer to snd_pcm_format_t
'snd_pcm_format_t' is used by amdtp_out_stream_set_pcm_format().

Currently, when just including amdtp.h, compiler cannot find this type because
this type is defined in uapi/sound/asound.h and this header is not included by
amdtp.h.

Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-11-19 08:07:20 +01:00
majianpeng
60aaf93385 md/raid5: Use conf->device_lock protect changing of multi-thread resources.
When we change group_thread_cnt from sysfs entry, it can OOPS.

The kernel messages are:
[  135.299021] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  135.299073] IP: [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[  135.299107] PGD 0
[  135.299122] Oops: 0000 [#1] SMP
[  135.299144] Modules linked in: netconsole e1000e ptp pps_core
[  135.299188] CPU: 3 PID: 2225 Comm: md0_raid5 Not tainted 3.12.0+ #24
[  135.299214] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
[  135.299255] task: ffff8800b9638f80 ti: ffff8800b77a4000 task.ti: ffff8800b77a4000
[  135.299283] RIP: 0010:[<ffffffff815188ab>]  [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[  135.299323] RSP: 0018:ffff8800b77a5c48  EFLAGS: 00010002
[  135.299344] RAX: ffff880037bb5c70 RBX: 0000000000000000 RCX: 0000000000000008
[  135.299371] RDX: ffff880037bb5cb8 RSI: 0000000000000001 RDI: ffff880037bb5c00
[  135.299398] RBP: ffff8800b77a5d08 R08: 0000000000000001 R09: 0000000000000000
[  135.299425] R10: ffff8800b77a5c98 R11: 00000000ffffffff R12: ffff880037bb5c00
[  135.299452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880037bb5c70
[  135.299479] FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
[  135.299510] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  135.299532] CR2: 0000000000000000 CR3: 0000000001c0b000 CR4: 00000000000407e0
[  135.299559] Stack:
[  135.299570]  ffff8800b77a5c88 ffffffff8107383e ffff8800b77a5c88 ffff880037a64300
[  135.299611]  000000000000ec08 ffff880037bb5cb8 ffff8800b77a5c98 ffffffffffffffd8
[  135.299654]  000000000000ec08 ffff880037bb5c60 ffff8800b77a5c98 ffff8800b77a5c98
[  135.299696] Call Trace:
[  135.299711]  [<ffffffff8107383e>] ? __wake_up+0x4e/0x70
[  135.299733]  [<ffffffff81518f88>] raid5d+0x4c8/0x680
[  135.299756]  [<ffffffff817174ed>] ? schedule_timeout+0x15d/0x1f0
[  135.299781]  [<ffffffff81524c9f>] md_thread+0x11f/0x170
[  135.299804]  [<ffffffff81069cd0>] ? wake_up_bit+0x40/0x40
[  135.299826]  [<ffffffff81524b80>] ? md_rdev_init+0x110/0x110
[  135.299850]  [<ffffffff81069656>] kthread+0xc6/0xd0
[  135.299871]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[  135.299899]  [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
[  135.299923]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[  135.299951] Code: ff ff ff 0f 84 d7 fe ff ff e9 5c fe ff ff 66 90 41 8b b4 24 d8 01 00 00 45 31 ed 85 f6 0f 8e 7b fd ff ff 49 8b 9c 24 d0 01 00 00 <48> 3b 1b 49 89 dd 0f 85 67 fd ff ff 48 8d 43 28 31 d2 eb 17 90
[  135.300005] RIP  [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[  135.300005]  RSP <ffff8800b77a5c48>
[  135.300005] CR2: 0000000000000000
[  135.300005] ---[ end trace 504854e5bb7562ed ]---
[  135.300005] Kernel panic - not syncing: Fatal exception

This is because raid5d() can be running when the multi-thread
resources are changed via system. We see need to provide locking.

mddev->device_lock is suitable, but we cannot simple call
alloc_thread_groups under this lock as we cannot allocate memory
while holding a spinlock.
So change alloc_thread_groups() to allocate and return the data
structures, then raid5_store_group_thread_cnt() can take the lock
while updating the pointers to the data structures.

This fixes a bug introduced in 3.12 and so is suitable for the 3.12.x
stable series.

Fixes: b721420e87
Cc: stable@vger.kernel.org (3.12)
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>
2013-11-19 15:19:18 +11:00
majianpeng
d206dcfa98 md/raid5: Before freeing old multi-thread worker, it should flush them.
When changing group_thread_cnt from sysfs entry, the kernel can oops.

The kernel messages are:
[  740.961389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[  740.961444] IP: [<ffffffff81062570>] process_one_work+0x30/0x500
[  740.961476] PGD b9013067 PUD b651e067 PMD 0
[  740.961503] Oops: 0000 [#1] SMP
[  740.961525] Modules linked in: netconsole e1000e ptp pps_core
[  740.961577] CPU: 0 PID: 3683 Comm: kworker/u8:5 Not tainted 3.12.0+ #23
[  740.961602] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
[  740.961646] task: ffff88013abe0000 ti: ffff88013a246000 task.ti: ffff88013a246000
[  740.961673] RIP: 0010:[<ffffffff81062570>]  [<ffffffff81062570>] process_one_work+0x30/0x500
[  740.961708] RSP: 0018:ffff88013a247e08  EFLAGS: 00010086
[  740.961730] RAX: ffff8800b912b400 RBX: ffff88013a61e680 RCX: ffff8800b912b400
[  740.961757] RDX: ffff8800b912b600 RSI: ffff8800b912b600 RDI: ffff88013a61e680
[  740.961782] RBP: ffff88013a247e48 R08: ffff88013a246000 R09: 000000000002c09d
[  740.961808] R10: 000000000000010f R11: 0000000000000000 R12: ffff88013b00cc00
[  740.961833] R13: 0000000000000000 R14: ffff88013b00cf80 R15: ffff88013a61e6b0
[  740.961861] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[  740.961893] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  740.962001] CR2: 00000000000000b8 CR3: 00000000b24fe000 CR4: 00000000000407f0
[  740.962001] Stack:
[  740.962001]  0000000000000008 ffff8800b912b600 ffff88013b00cc00 ffff88013a61e680
[  740.962001]  ffff88013b00cc00 ffff88013b00cc18 ffff88013b00cf80 ffff88013a61e6b0
[  740.962001]  ffff88013a247eb8 ffffffff810639c6 0000000000012a80 ffff88013a247fd8
[  740.962001] Call Trace:
[  740.962001]  [<ffffffff810639c6>] worker_thread+0x206/0x3f0
[  740.962001]  [<ffffffff810637c0>] ? manage_workers+0x2c0/0x2c0
[  740.962001]  [<ffffffff81069656>] kthread+0xc6/0xd0
[  740.962001]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[  740.962001]  [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
[  740.962001]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[  740.962001] Code: 89 e5 41 57 41 56 41 55 45 31 ed 41 54 53 48 89 fb 48 83 ec 18 48 8b 06 4c 8b 67 48 48 89 c1 30 c9 a8 04 4c 0f 45 e9 80 7f 58 00 <49> 8b 45 08 44 8b b0 00 01 00 00 78 0c 41 f6 44 24 10 04 0f 84
[  740.962001] RIP  [<ffffffff81062570>] process_one_work+0x30/0x500
[  740.962001]  RSP <ffff88013a247e08>
[  740.962001] CR2: 0000000000000008
[  740.962001] ---[ end trace 39181460000748de ]---
[  740.962001] Kernel panic - not syncing: Fatal exception

This can happen if there are some stripes left, fewer than MAX_STRIPE_BATCH.
A worker is queued to handle them.
But before calling raid5_do_work, raid5d handles those
stripes making conf->active_stripe = 0.
So mddev_suspend() can return.
We might then free old worker resources before the queued
raid5_do_work() handled them.  When it runs, it crashes.

	raid5d()		raid5_store_group_thread_cnt()
	queue_work		mddev_suspend()
				handle_strips
				active_stripe=0
				free(old worker resources)
	process_one_work
	raid5_do_work

To avoid this, we should only flush the worker resources before freeing them.

This fixes a bug introduced in 3.12 so is suitable for the 3.12.x
stable series.

Cc: stable@vger.kernel.org (3.12)
Fixes: b721420e87
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>
2013-11-19 15:19:18 +11:00
majianpeng
e59aa23f4c md/raid5: For stripe with R5_ReadNoMerge, we replace REQ_FLUSH with REQ_NOMERGE.
For R5_ReadNoMerge,it mean this bio can't merge with other bios or
request.It used REQ_FLUSH to achieve this. But REQ_NOMERGE can do the
same work.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:19:18 +11:00
Aurelien Jarno
c0f8bd146a UAPI: include <asm/byteorder.h> in linux/raid/md_p.h
linux/raid/md_p.h is using conditionals depending on endianess and fails
with an error if neither of __BIG_ENDIAN, __LITTLE_ENDIAN or
__BYTE_ORDER are defined, but it doesn't include any header which can
define these constants. This make this header unusable alone.

This patch adds a #include <asm/byteorder.h> at the beginning of this
header to make it usable alone. This is needed to compile klibc on MIPS.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:19:18 +11:00
majianpeng
79ef3a8aa1 raid1: Rewrite the implementation of iobarrier.
There is an iobarrier in raid1 because of contention between normal IO and
resync IO.  It suspends all normal IO when resync/recovery happens.

However if normal IO is out side the resync window, there is no contention.
So this patch changes the barrier mechanism to only block IO that
could contend with the resync that is currently happening.

We partition the whole space into five parts.
|---------|-----------|------------|----------------|-------|
        start   next_resync   start_next_window    end_window

start + RESYNC_WINDOW = next_resync
next_resync + NEXT_NORMALIO_DISTANCE = start_next_window
start_next_window + NEXT_NORMALIO_DISTANCE = end_window

Firstly we introduce some concepts:

1 - RESYNC_WINDOW: For resync, there are 32 resync requests at most at the
      same time. A sync request is RESYNC_BLOCK_SIZE(64*1024).
      So the RESYNC_WINDOW is 32 * RESYNC_BLOCK_SIZE, that is 2MB.
2 - NEXT_NORMALIO_DISTANCE: the distance between next_resync
      and start_next_window.  It also indicates the distance between
      start_next_window and end_window.
      It is currently 3 * RESYNC_WINDOW_SIZE but could be tuned if
      this turned out not to be optimal.
3 - next_resync: the next sector at which we will do sync IO.
4 - start: a position which is at most RESYNC_WINDOW before
      next_resync.
5 - start_next_window:  a position which is NEXT_NORMALIO_DISTANCE
      beyond next_resync.  Normal-io after this position doesn't need to
      wait for resync-io to complete.
6 - end_window:  a position which is 2 * NEXT_NORMALIO_DISTANCE beyond
      next_resync.  This also doesn't need to wait, but is counted
      differently.
7 - current_window_requests:  the count of normalIO between
      start_next_window and end_window.
8 - next_window_requests: the count of normalIO after end_window.

NormalIO will be partitioned into four types:

NormIO1:  the end sector of bio is smaller or equal the start
NormIO2:  the start sector of bio larger or equal to end_window
NormIO3:  the start sector of bio larger or equal to
          start_next_window.
NormIO4:  the location between start_next_window and end_window

|--------|-----------|--------------------|----------------|-------------|
    | start   |   next_resync   |  start_next_window   |  end_window |
 NormIO1   NormIO4            NormIO4                NormIO3      NormIO2

For NormIO1, we don't need any io barrier.
For NormIO4, we used a similar approach to the original iobarrier
    mechanism.  The normalIO and resyncIO must be kept separate.
For NormIO2/3, we add two fields to struct r1conf: "current_window_requests"
    and "next_window_requests". They indicate the count of active
    requests in the two window.
    For these, we don't wait for resync io to complete.

For resync action, if there are NormIO4s, we must wait for it.
If not, we can proceed.
But if resync action reaches start_next_window and
current_window_requests > 0 (that is there are NormIO3s), we must
wait until the current_window_requests becomes zero.
When current_window_requests becomes zero,  start_next_window also
moves forward. Then current_window_requests will replaced by
next_window_requests.

There is a problem which when and how to change from NormIO2 to
NormIO3.  Only then can sync action progress.

We add a field in struct r1conf "start_next_window".

A: if start_next_window == MaxSector, it means there are no NormIO2/3.
   So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
B: if current_window_requests == 0 && next_window_requests != 0, it
   means start_next_window move to end_window

There is another problem which how to differentiate between
old NormIO2(now it is NormIO3) and NormIO2.
For example, there are many bios which are NormIO2 and a bio which is
NormIO3. NormIO3 firstly completed, so the bios of NormIO2 became NormIO3.

We add a field in struct r1bio "start_next_window".
This is used to record the position conf->start_next_window when the call
to wait_barrier() is made in make_request().

In allow_barrier(), we check the conf->start_next_window.
If r1bio->stat_next_window == conf->start_next_window, it means
there is no transition between NormIO2 and NormIO3.
If r1bio->start_next_window != conf->start_next_window, it mean
there was a transition between NormIO2 and NormIO3.  There can only
have been one transition.  So it only means the bio is old NormIO2.

For one bio, there may be many r1bio's. So we make sure
all the r1bio->start_next_window are the same value.
If we met blocked_dev in make_request(), it must call allow_barrier
and wait_barrier. So the former and the later value of
conf->start_next_window will be change.
If there are many r1bio's with differnet start_next_window,
for the relevant bio, it depend on the last value of r1bio.
It will cause error. To avoid this, we must wait for previous r1bios
to complete.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:19:18 +11:00
majianpeng
8e005f7c02 raid1: Add some macros to make code clearly.
In a subsequent patch, we'll use some const parameters.
Using macros will make the code clearly.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:19:18 +11:00
majianpeng
07169fd478 raid1: Replace raise_barrier/lower_barrier with freeze_array/unfreeze_array when reconfiguring the array.
We used to use raise_barrier to suspend normal IO while we reconfigure
the array.  However raise_barrier will soon only suspend some normal
IO, not all.  So we need something else.
Change it to use freeze_array.
But freeze_array not only suspends normal io, it also suspends
resync io.
For the place where call raise_barrier for reconfigure, it isn't a
problem.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:19:18 +11:00
majianpeng
b364e3d048 raid1: Add a field array_frozen to indicate whether raid in freeze state.
Because the following patch will rewrite the content between normal IO
and resync IO. So we used a parameter to indicate whether raid is in freeze
array.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:19:18 +11:00
Joe Perches
82592c38a8 md: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:19:18 +11:00
NeilBrown
30b8feb730 md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes.
When raid5 recovery hits a fresh badblock, this badblock will flagged as unack
badblock until md_update_sb() is called.
But md_stop will take reconfig lock which means raid5d can't call
md_update_sb() in md_check_recovery(), the badblock will always
be unack, so raid5d thread enters an infinite loop and md_stop_write()
can never stop sync_thread. This causes deadlock.

To solve this, when STOP_ARRAY ioctl is issued and sync_thread is
running, we need set md->recovery FROZEN and INTR flags and wait for
sync_thread to stop before we (re)take reconfig lock.

This requires that raid5 reshape_request notices MD_RECOVERY_INTR
(which it probably should have noticed anyway) and stops waiting for a
metadata update in that case.

Reported-by: Jianpeng Ma <majianpeng@gmail.com>
Reported-by: Bian Yu <bianyu@kedacom.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:19:17 +11:00
NeilBrown
c91abf5a35 md: use MD_RECOVERY_INTR instead of kthread_should_stop in resync thread.
We currently use kthread_should_stop() in various places in the
sync/reshape code to abort early.
However some places set MD_RECOVERY_INTR but don't immediately call
md_reap_sync_thread() (and we will shortly get another one).
When this happens we are relying on md_check_recovery() to reap the
thread and that only happen when it finishes normally.
So MD_RECOVERY_INTR must lead to a normal finish without the
kthread_should_stop() test.

So replace all relevant tests, and be more careful when the thread is
interrupted not to acknowledge that latest step in a reshape as it may
not be fully committed yet.

Also add a test on MD_RECOVERY_INTR in the 'is_mddev_idle' loop
so we don't wait have to wait for the speed to drop before we can abort.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:19:17 +11:00
NeilBrown
29f097c4d9 md: fix some places where mddev_lock return value is not checked.
Sometimes we need to lock and mddev and cannot cope with
failure due to interrupt.
In these cases we should use mutex_lock, not mutex_lock_interruptible.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:19:17 +11:00
Bian Yu
edfa1f651e raid5: Retry R5_ReadNoMerge flag when hit a read error.
Because of block layer merge, one bio fails will cause other bios
which belongs to the same request fails, so raid5_end_read_request
will record all these bios as badblocks.
If retry request with R5_ReadNoMerge flag to avoid bios merge,
badblocks can only record sector which is bad exactly.

test:
hdparm --yes-i-know-what-i-am-doing --make-bad-sector 300000 /dev/sdb
mdadm -C /dev/md0 -l5 -n3 /dev/sd[bcd] --assume-clean
mdadm /dev/md0 -f /dev/sdd
mdadm /dev/md0 -r /dev/sdd
mdadm --zero-superblock /dev/sdd
mdadm /dev/md0 -a /dev/sdd

1. Without this patch:
cat /sys/block/md0/md/rd*/bad_blocks
299776 256
299776 256

2. With this patch:
cat /sys/block/md0/md/rd*/bad_blocks
300000 8
300000 8

Signed-off-by: Bian Yu <bianyu@kedacom.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:18:24 +11:00
Shaohua Li
4bda556aea raid5: relieve lock contention in get_active_stripe()
track empty inactive list count, so md_raid5_congested() can use it to make
decision.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19 15:18:22 +11:00
Al Viro
801a76050b seq_file: always clear m->count when we free m->buf
Once we'd freed m->buf, m->count should become zero - we have no valid
contents reachable via m->buf.

Reported-by: Charley (Hao Chuan) Chu <charley.chu@broadcom.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-18 19:07:53 -08:00
Olof Johansson
5761704a41 ARM: 7892/1: Fix warning for V7M builds
Fixes a harmless warning when building for V7M (!MMU):
 arch/arm/kernel/traps.c:859:123: warning: 'kuser_init' defined but not used [-Wunused-function]

By making the stub static inline instead of just static.

Fixes: f6f91b0d9f ('ARM: allow kuser helpers to be removed from the vector page')

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-11-19 00:41:03 +00:00
Tony Lindgren
b2ff479061 ARM: OMAP2+: Remove legacy omap4_twl6030_hsmmc_init
This is no longer used, omap4 is device tree based now.

Signed-off-by: Tony Lindgren <tony@atomide.com>
2013-11-18 16:24:58 -08:00
Tony Lindgren
e30b06f4d5 ARM: OMAP2+: Remove legacy mux code for display.c
This is all omap4 specific, which is device tree based
nowadays and should use pinctrl-single instead.

Signed-off-by: Tony Lindgren <tony@atomide.com>
2013-11-18 16:24:57 -08:00
Rafael J. Wysocki
b38f67c4ae Merge branch 'pm-sleep'
* pm-sleep:
  PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps()
2013-11-19 01:07:08 +01:00
Rafael J. Wysocki
9bad584548 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs()
  cpufreq: OMAP: Fix compilation error 'r & ret undeclared'
  cpufreq: conservative: set requested_freq to policy max when it is over policy max
2013-11-19 01:07:00 +01:00
Rafael J. Wysocki
77aa26514a Merge branch 'pm-runtime'
* pm-runtime:
  PM / Runtime: Fix error path for prepare
  PM / Runtime: Update documentation around probe|remove|suspend
2013-11-19 01:06:49 +01:00
Rafael J. Wysocki
6431b43097 Merge branch 'pm-tools'
* pm-tools:
  tools / power turbostat: Support Silvermont
2013-11-19 01:06:38 +01:00
Rafael J. Wysocki
077a671d7b Merge branch 'pm-cpuidle'
* pm-cpuidle:
  intel_idle: Support Intel Atom Processor C2000 Product Family
2013-11-19 01:06:28 +01:00
Rafael J. Wysocki
9a04d08b11 Merge branch 'acpi-video'
* acpi-video:
  ACPI / video: clean up DMI table for initial black screen problem
2013-11-19 01:06:17 +01:00
Rafael J. Wysocki
dcaea2c18e Merge branch 'acpi-ec'
* acpi-ec:
  ACPI / EC: Ensure lock is acquired before accessing ec struct members
2013-11-19 01:06:06 +01:00