Commit graph

22,402 commits

Author SHA1 Message Date
Roman Pen
346c09f804 workqueue: fix ghost PENDING flag while doing MQ IO
The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
with the following backtrace:

[  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
[  601.347574]       Tainted: G           O    4.4.5-1-storage+ #6
[  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  601.348142] kworker/u129:5  D ffff880803077988     0  1636      2 0x00000000
[  601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
[  601.348999]  ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
[  601.349662]  ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
[  601.350333]  ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
[  601.350965] Call Trace:
[  601.351203]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
[  601.351444]  [<ffffffff815b01d5>] schedule+0x35/0x80
[  601.351709]  [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230
[  601.351958]  [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220
[  601.352208]  [<ffffffff810bd737>] ? ktime_get+0x37/0xa0
[  601.352446]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
[  601.352688]  [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110
[  601.352951]  [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
[  601.353196]  [<ffffffff815b093b>] bit_wait_io+0x1b/0x70
[  601.353440]  [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90
[  601.353689]  [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0
[  601.353958]  [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40
[  601.354200]  [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140
[  601.354441]  [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30
[  601.354688]  [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70
[  601.354932]  [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50
[  601.355193]  [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0
[  601.355432]  [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100
[  601.355679]  [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0
[  601.355925]  [<ffffffff81198379>] vfs_write+0xa9/0x1a0
[  601.356164]  [<ffffffff811c59d8>] kernel_write+0x38/0x50

The underlying device is a null_blk, with default parameters:

  queue_mode    = MQ
  submit_queues = 1

Verification that nullb0 has something inflight:

root@pserver8:~# cat /sys/block/nullb0/inflight
       0        1
root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
...
/sys/block/nullb0/mq/0/cpu2/rq_list
CTX pending:
        ffff8838038e2400
...

During debug it became clear that stalled request is always inserted in
the rq_list from the following path:

   save_stack_trace_tsk + 34
   blk_mq_insert_requests + 231
   blk_mq_flush_plug_list + 281
   blk_flush_plug_list + 199
   wait_on_page_bit + 192
   __filemap_fdatawait_range + 228
   filemap_fdatawait_range + 20
   filemap_write_and_wait_range + 63
   blkdev_fsync + 27
   vfs_fsync_range + 73
   blkdev_write_iter + 202
   __vfs_write + 170
   vfs_write + 169
   kernel_write + 56

So blk_flush_plug_list() was called with from_schedule == true.

If from_schedule is true, that means that finally blk_mq_insert_requests()
offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
i.e. it calls kblockd_schedule_delayed_work_on().

That means, that we race with another CPU, which is about to execute
__blk_mq_run_hw_queue() work.

Further debugging shows the following traces from different CPUs:

  CPU#0                                  CPU#1
  ----------------------------------     -------------------------------
  reqeust A inserted
  STORE hctx->ctx_map[0] bit marked
  kblockd_schedule...() returns 1
  <schedule to kblockd workqueue>
                                         request B inserted
                                         STORE hctx->ctx_map[1] bit marked
                                         kblockd_schedule...() returns 0
  *** WORK PENDING bit is cleared ***
  flush_busy_ctxs() is executed, but
  bit 1, set by CPU#1, is not observed

As a result request B pended forever.

This behaviour can be explained by speculative LOAD of hctx->ctx_map on
CPU#0, which is reordered with clear of PENDING bit and executed _before_
actual STORE of bit 1 on CPU#1.

The proper fix is an explicit full barrier <mfence>, which guarantees
that clear of PENDING bit is to be executed before all possible
speculative LOADS or STORES inside actual work function.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Michael Wang <yun.wang@profitbricks.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-04-26 11:23:22 -04:00
Tejun Heo
5cf1cacb49 cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
Since e93ad19d05 ("cpuset: make mm migration asynchronous"), cpuset
kicks off asynchronous NUMA node migration if necessary during task
migration and flushes it from cpuset_post_attach_flush() which is
called at the end of __cgroup_procs_write().  This is to avoid
performing migration with cgroup_threadgroup_rwsem write-locked which
can lead to deadlock through dependency on kworker creation.

memcg has a similar issue with charge moving, so let's convert it to
an official callback rather than the current one-off cpuset specific
function.  This patch adds cgroup_subsys->post_attach callback and
makes cpuset register cpuset_post_attach_flush() as its ->post_attach.

The conversion is mostly one-to-one except that the new callback is
called under cgroup_mutex.  This is to guarantee that no other
migration operations are started before ->post_attach callbacks are
finished.  cgroup_mutex is one of the outermost mutex in the system
and has never been and shouldn't be a problem.  We can add specialized
synchronization around __cgroup_procs_write() but I don't think
there's any noticeable benefit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> # 4.4+ prerequisite for the next patch
2016-04-25 15:45:14 -04:00
Linus Torvalds
82b23cb94b Merge branches 'perf-urgent-for-linus', 'smp-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf, cpu hotplug and timer fixes from Ingo Molnar:
 "perf:
   - A single tooling fix for a user-triggerable segfault.

  CPU hotplug:
   - Fix a CPU hotplug corner case regression, introduced by the recent
     hotplug rework

  timers:
   - Fix a boot hang in the ARM based Tango SoC clocksource driver"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf intel-pt: Fix segfault tracing transactions

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Fix rollback during error-out in __cpu_disable()

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/tango-xtal: Fix boot hang due to incorrect test
2016-04-23 11:45:52 -07:00
Linus Torvalds
0e11d25651 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "Misc fixes:

  pvqspinlocks:
   - an instrumentation fix

  futexes:
   - preempt-count vs pagefault_disable decouple corner case fix
   - futex requeue plist race window fix
   - futex UNLOCK_PI transaction fix for a corner case"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic()
  futex: Acknowledge a new waiter in counter before plist
  futex: Handle unlock_pi race gracefully
  locking/pvqspinlock: Fix division by zero in qstat_read()
2016-04-23 11:39:48 -07:00
Linus Torvalds
16ecb41410 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
 "A core irq affinity masks related fix and a MIPS irqchip driver fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mips-gic: Don't overrun pcpu_masks array
  genirq: Dont allow affinity mask to be updated on IPIs
2016-04-23 11:34:39 -07:00
Xunlei Pang
fec148c000 sched/deadline: Fix a bug in dl_overflow()
I got a minus(very big) dl_b->total_bw during my deadline tests.

    # grep dl /proc/sched_debug
    dl_rq[0]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -222297900

Something unusual must have happened.

After some digging, I finally noticed that when changing a deadline
task to normal(cfs), and changing it back to deadline immediately,
after it died, we will got the wrong dl_bw->total_bw.

The root cause is in dl_overflow(), it has:
    if (new_bw == p->dl.dl_bw)
	return 0;

1) When a deadline task is changed to !deadline task, it will start
   dl timer in switched_from_dl(), and retain previous deadline parameter
   till the timer expires.

2) If we change it back to deadline with the same bandwidth parameter
   before the timer expires, as it keeps the old bandwidth although it
   is not a deadline task. dl_overflow() simply returns success without
   updating the right data, and got the wrong dl_bw->total_bw.

The solution is simple, if @p is not deadline, don't return.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@arm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460636368-1993-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:20:43 +02:00
Frederic Weisbecker
9fd81dd5ce sched/fair: Optimize !CONFIG_NO_HZ_COMMON CPU load updates
Some code in CPU load update only concern NO_HZ configs but it is
built on all configurations. When NO_HZ isn't built, that code is harmless
but just happens to take some useless ressources in CPU and memory:

1) one useless field in struct rq
2) jiffies record on every tick that is never used (cpu_load_update_periodic)
3) decay_load_missed is called two times on every tick to eventually
   return immediately with no action taken. And that function is dead
   code.

For pure optimization purposes, lets conditionally build the NO_HZ
related code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461080211-16271-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:20:42 +02:00
Frederic Weisbecker
1f41906a6f sched/fair: Correctly handle nohz ticks CPU load accounting
Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask
mode. In fact "nohz" or "dynticks" only mean that we exit the periodic
mode and we try to minimize the ticks as much as possible. The nohz
subsystem uses a confusing terminology with the internal state
"ts->tick_stopped" which is also available through its public interface
with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead
reduced with the best effort rather than stopped. In the best case the
tick can indeed be actually stopped but there is no guarantee about that.
If a timer needs to fire one second later, a tick will fire while the
CPU is in nohz mode and this is a very common scenario.

Now this confusion happens to be a problem with CPU load updates:
cpu_load_update_active() doesn't handle nohz ticks correctly because it
assumes that ticks are completely stopped in nohz mode and that
cpu_load_update_active() can't be called in dynticks mode. When that
happens, the whole previous tickless load is ignored and the function
just records the load for the current tick, ignoring potentially long
idle periods behind.

In order to solve this, we could account the current load for the
previous nohz time but there is a risk that we account the load of a
task that got freshly enqueued for the whole nohz period.

So instead, lets record the dynticks load on nohz frame entry so we know
what to record in case of nohz ticks, then use this record to account
the tickless load on nohz ticks and nohz frame end.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460555812-25375-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:20:42 +02:00
Frederic Weisbecker
cee1afce30 sched/fair: Gather CPU load functions under a more conventional namespace
The CPU load update related functions have a weak naming convention
currently, starting with update_cpu_load_*() which isn't ideal as
"update" is a very generic concept.

Since two of these functions are public already (and a third is to come)
that's enough to introduce a more conventional naming scheme. So let's
do the following rename instead:

	update_cpu_load_*() -> cpu_load_update_*()

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460555812-25375-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:20:41 +02:00
Steve Muckle
a2c6c91f98 sched/fair: Call cpufreq hook in additional paths
The cpufreq hook should be called any time the root CFS rq utilization
changes. This can occur when a task is switched to or from the fair
class, or a task moves between groups or CPUs, but these paths
currently do not call the cpufreq hook.

Fix this by adding the hook to attach_entity_load_avg() and
detach_entity_load_avg().

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
[ Added the .update_freq argument to update_cfs_rq_load_avg() to avoid a double cpufreq call. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <Juri.Lelli@arm.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1458858367-2831-1-git-send-email-smuckle@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:20:40 +02:00
Steve Muckle
41e0d37f7a sched/fair: Do not call cpufreq hook unless util changed
There's no reason to call the cpufreq hook if the root cfs_rq
utilization has not been modified.

Signed-off-by: Steve Muckle <smuckle@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <Juri.Lelli@arm.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/1458606068-7476-2-git-send-email-smuckle@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:20:36 +02:00
Steve Muckle
21e96f8877 sched/fair: Move cpufreq hook to update_cfs_rq_load_avg()
The cpufreq hook should be called whenever the root cfs_rq
utilization changes so update_cfs_rq_load_avg() is a better
place for it. The current location is not invoked in the
enqueue_entity() or update_blocked_averages() paths.

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <Juri.Lelli@arm.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1458606068-7476-1-git-send-email-smuckle@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:20:35 +02:00
Srikar Dronamraju
1f621e028b sched/fair: Fix asym packing to select correct CPU
When asymmetric packing is set in the sched_domain and target CPU is
busy, update_sd_pick_busiest() may not select the busiest runqueue.
When target CPU is busy, find_busiest_group() will ignore checks for
asym packing and may continue to load balance using the currently
selected not-the-busiest runqueue as source runqueue.
Selecting the busiest runqueue as source when the target CPU is busy,
should result in achieving much better load balance.

Also when target CPU is not busy and asymmetric packing is set in sd,
select higher CPU as source CPU for load balancing.

While doing this change, move the check to see if target CPU is busy
into check_asym_packing().

The extent of performance benefit from this change decreases with the
increasing load. However there is benefit in undercommit as well as
overcommit conditions.

1. Record per second ebizzy (32 threads) on a 64 CPU power 7 box. (5 iterations)
4.6.0-rc2
	Testcase:         Min         Max         Avg      StdDev
	  ebizzy:  5223767.00 10368236.00  7946971.00  1753094.76

4.6.0-rc2+asym-changes
	Testcase:         Min         Max         Avg      StdDev     %Change
	  ebizzy:  8617191.00 13872356.00 11383980.00  1783400.89     +24.78%

2. Record per second ebizzy (64 threads) on a 64 CPU power 7 box. (5 iterations)
4.6.0-rc2
	Testcase:         Min         Max         Avg      StdDev
	  ebizzy:  6497666.00 18399783.00 10818093.20  4051452.08

4.6.0-rc2+asym-changes
	Testcase:         Min         Max         Avg      StdDev     %Change
	  ebizzy:  7567365.00 19456937.00 11674063.60  4295407.48      +4.40%

3. Record per second ebizzy (128 threads) on a 64 CPU power 7 box. (5 iterations)
4.6.0-rc2
	Testcase:         Min         Max         Avg      StdDev
	  ebizzy: 37073983.00 40341911.00 38776241.80  1259766.82

4.6.0-rc2+asym-changes
	Testcase:         Min         Max         Avg      StdDev     %Change
	  ebizzy: 38030399.00 41333378.00 39827404.40  1255001.86      +2.54%

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1459948660-16073-1-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:20:35 +02:00
Ingo Molnar
84eaae155a Linux 4.6-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXFELfAAoJEHm+PkMAQRiGRYIH+wWsUva7TR9arN1ZrURvI17b
 KqyQH8Ov9zJBsIaq/rFXOr5KfNgx7BU9BL9h7QkBy693HXTWf+GTZ1czHM4N12C3
 0ZdHGrLwTHo2zdisiQaFORZSfhSVTUNGXGHXw13bUMgEqatPgkozXEnsvXXNdt1Z
 HtlcuJn3pcj+QIY7qDXZgTLTwgn248hi1AgNag+ntFcWiz21IYaMIi7/mCY9QUIi
 AY+Y3hqFQM7/8cVyThGS5wZPTg1YzdhsLJpoCk0TbS8FvMEnA+ylcTgc15C78bwu
 AxOwM3OCmH4gMsd7Dd/O+i9lE3K6PFrgzdDisYL3P7eHap+EdiLDvVzPDPPx0xg=
 =Q7r3
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc4' into sched/core, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:16:36 +02:00
Wang Nan
9ecda41acb perf/core: Add ::write_backward attribute to perf event
This patch introduces 'write_backward' bit to perf_event_attr, which
controls the direction of a ring buffer. After set, the corresponding
ring buffer is written from end to beginning. This feature is design to
support reading from overwritable ring buffer.

Ring buffer can be created by mapping a perf event fd. Kernel puts event
records into ring buffer, user tooling like perf fetch them from
address returned by mmap(). To prevent racing between kernel and tooling,
they communicate to each other through 'head' and 'tail' pointers.
Kernel maintains 'head' pointer, points it to the next free area (tail
of the last record). Tooling maintains 'tail' pointer, points it to the
tail of last consumed record (record has already been fetched). Kernel
determines the available space in a ring buffer using these two
pointers to avoid overwrite unfetched records.

By mapping without 'PROT_WRITE', an overwritable ring buffer is created.
Different from normal ring buffer, tooling is unable to maintain 'tail'
pointer because writing is forbidden. Therefore, for this type of ring
buffers, kernel overwrite old records unconditionally, works like flight
recorder. This feature would be useful if reading from overwritable ring
buffer were as easy as reading from normal ring buffer. However,
there's an obscure problem.

The following figure demonstrates a full overwritable ring buffer. In
this figure, the 'head' pointer points to the end of last record, and a
long record 'E' is pending. For a normal ring buffer, a 'tail' pointer
would have pointed to position (X), so kernel knows there's no more
space in the ring buffer. However, for an overwritable ring buffer,
kernel ignore the 'tail' pointer.

   (X)                              head
    .                                |
    .                                V
    +------+-------+----------+------+---+
    |A....A|B.....B|C........C|D....D|   |
    +------+-------+----------+------+---+

Record 'A' is overwritten by event 'E':

      head
       |
       V
    +--+---+-------+----------+------+---+
    |.E|..A|B.....B|C........C|D....D|E..|
    +--+---+-------+----------+------+---+

Now tooling decides to read from this ring buffer. However, none of these
two natural positions, 'head' and the start of this ring buffer, are
pointing to the head of a record. Even the full ring buffer can be
accessed by tooling, it is unable to find a position to start decoding.

The first attempt tries to solve this problem AFAIK can be found from
[1]. It makes kernel to maintain 'tail' pointer: updates it when ring
buffer is half full. However, this approach introduces overhead to
fast path. Test result shows a 1% overhead [2]. In addition, this method
utilizes no more tham 50% records.

Another attempt can be found from [3], which allows putting the size of
an event at the end of each record. This approach allows tooling to find
records in a backward manner from 'head' pointer by reading size of a
record from its tail. However, because of alignment requirement, it
needs 8 bytes to record the size of a record, which is a huge waste. Its
performance is also not good, because more data need to be written.
This approach also introduces some extra branch instructions to fast
path.

'write_backward' is a better solution to this problem.

Following figure demonstrates the state of the overwritable ring buffer
when 'write_backward' is set before overwriting:

       head
        |
        V
    +---+------+----------+-------+------+
    |   |D....D|C........C|B.....B|A....A|
    +---+------+----------+-------+------+

and after overwriting:
                                     head
                                      |
                                      V
    +---+------+----------+-------+---+--+
    |..E|D....D|C........C|B.....B|A..|E.|
    +---+------+----------+-------+---+--+

In each situation, 'head' points to the beginning of the newest record.
From this record, tooling can iterate over the full ring buffer and fetch
records one by one.

The only limitation that needs to be considered is back-to-back reading.
Due to the non-deterministic of user programs, it is impossible to ensure
the ring buffer keeps stable during reading. Consider an extreme situation:
tooling is scheduled out after reading record 'D', then a burst of events
come, eat up the whole ring buffer (one or multiple rounds). When the
tooling process comes back, reading after 'D' is incorrect now.

To prevent this problem, we need to find a way to ensure the ring buffer
is stable during reading. ioctl(PERF_EVENT_IOC_PAUSE_OUTPUT) is
suggested because its overhead is lower than
ioctl(PERF_EVENT_IOC_ENABLE).

By carefully verifying 'header' pointer, reader can avoid pausing the
ring-buffer. For example:

    /* A union of all possible events */
    union perf_event event;

    p = head = perf_mmap__read_head();
    while (true) {
        /* copy header of next event */
        fetch(&event.header, p, sizeof(event.header));

        /* read 'head' pointer */
        head = perf_mmap__read_head();

        /* check overwritten: is the header good? */
        if (!verify(sizeof(event.header), p, head))
            break;

        /* copy the whole event */
        fetch(&event, p, event.header.size);

        /* read 'head' pointer again */
        head = perf_mmap__read_head();

        /* is the whole event good? */
        if (!verify(event.header.size, p, head))
            break;
        p += event.header.size;
    }

However, the overhead is high because:

 a) In-place decoding is not safe.
    Copying-verifying-decoding is required.
 b) Fetching 'head' pointer requires additional synchronization.

(From Alexei Starovoitov:

Even when this trick works, pause is needed for more than stability of
reading. When we collect the events into overwrite buffer we're waiting
for some other trigger (like all cpu utilization spike or just one cpu
running and all others are idle) and when it happens the buffer has
valuable info from the past. At this point new events are no longer
interesting and buffer should be paused, events read and unpaused until
next trigger comes.)

This patch utilizes event's default overflow_handler introduced
previously. perf_event_output_backward() is created as the default
overflow handler for backward ring buffers. To avoid extra overhead to
fast path, original perf_event_output() becomes __perf_event_output()
and marked '__always_inline'. In theory, there's no extra overhead
introduced to fast path.

Performance testing:

Calling 3000000 times of 'close(-1)', use gettimeofday() to check
duration.  Use 'perf record -o /dev/null -e raw_syscalls:*' to capture
system calls. In ns.

Testing environment:

  CPU    : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
  Kernel : v4.5.0
                    MEAN         STDVAR
 BASE            800214.950    2853.083
 PRE1           2253846.700    9997.014
 PRE2           2257495.540    8516.293
 POST           2250896.100    8933.921

Where 'BASE' is pure performance without capturing. 'PRE1' is test
result of pure 'v4.5.0' kernel. 'PRE2' is test result before this
patch. 'POST' is test result after this patch. See [4] for the detailed
experimental setup.

Considering the stdvar, this patch doesn't introduce performance
overhead to the fast path.

 [1] http://lkml.iu.edu/hypermail/linux/kernel/1304.1/04584.html
 [2] http://lkml.iu.edu/hypermail/linux/kernel/1307.1/00535.html
 [3] http://lkml.iu.edu/hypermail/linux/kernel/1512.0/01265.html
 [4] http://lkml.kernel.org/g/56F89DCD.1040202@huawei.com

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: <acme@kernel.org>
Cc: <pi3orama@163.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1459865478-53413-1-git-send-email-wangnan0@huawei.com
[ Fixed the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:12:39 +02:00
Ingo Molnar
65cbbd037b Merge branch 'perf/urgent' into perf/core, to resolve conflict
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:12:10 +02:00
Peter Zijlstra
75dd602a51 lockdep: Fix lock_chain::base size
lock_chain::base is used to store an index into the chain_hlocks[]
array, however that array contains more elements than can be indexed
using the u16.

Change the lock_chain structure to use a bitfield to encode the data
it needs and add BUILD_BUG_ON() assertions to check the fields are
wide enough.

Also, for DEBUG_LOCKDEP, assert that we don't run out of elements of
that array; as that would wreck the collision detectoring.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alfredo Alvarez Fernandez <alfredoalvarezfernandez@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160330093659.GS3408@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 13:53:03 +02:00
Boqun Feng
c246975662 locking/lockdep: Fix ->irq_context calculation
task_irq_context() returns the encoded irq_context of the task, the
return value is encoded in the same as ->irq_context of held_lock.

Always return 0 if !(CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING)

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/1455602265-16490-2-git-send-email-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 13:53:03 +02:00
Peter Zijlstra
b303e7c15d perf/core: Make sysctl_perf_cpu_time_max_percent conform to documentation
Markus reported that 0 should also disable the throttling we per
Documentation/sysctl/kernel.txt.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 91a612eea9 ("perf/core: Fix dynamic interrupt throttle")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 13:47:50 +02:00
Sebastian Andrzej Siewior
3b9d6da67e cpu/hotplug: Fix rollback during error-out in __cpu_disable()
The recent introduction of the hotplug thread which invokes the callbacks on
the plugged cpu, cased the following regression:

If takedown_cpu() fails, then we run into several issues:

 1) The rollback of the target cpu states is not invoked. That leaves the smp
    threads and the hotplug thread in disabled state.

 2) notify_online() is executed due to a missing skip_onerr flag. That causes
    that both CPU_DOWN_FAILED and CPU_ONLINE notifications are invoked which
    confuses quite some notifiers.

 3) The CPU_DOWN_FAILED notification is not invoked on the target CPU. That's
    not an issue per se, but it is inconsistent and in consequence blocks the
    patches which rely on these states being invoked on the target CPU and not
    on the controlling cpu. It also does not preserve the strict call order on
    rollback which is problematic for the ongoing state machine conversion as
    well.

To fix this we add a rollback flag to the remote callback machinery and invoke
the rollback including the CPU_DOWN_FAILED notification on the remote
cpu. Further mark the notify online state with 'skip_onerr' so we don't get a
double invokation.

This workaround will go away once we moved the unplug invocation to the target
cpu itself.

[ tglx: Massaged changelog and moved the CPU_DOWN_FAILED notifiaction to the
  	target cpu ]

Fixes: 4cb28ced23 ("cpu/hotplug: Create hotplug threads")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-s390@vger.kernel.org
Cc: rt@linutronix.de
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: http://lkml.kernel.org/r/20160408124015.GA21960@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-22 09:49:49 +02:00
Michal Hocko
916633a403 locking/rwsem: Provide down_write_killable()
Now that all the architectures implement the necessary glue code
we can introduce down_write_killable(). The only difference wrt. regular
down_write() is that the slow path waits in TASK_KILLABLE state and the
interruption by the fatal signal is reported as -EINTR to the caller.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-12-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 08:58:33 +02:00
Paul E. McKenney
dcd36d01fb Merge branches 'doc.2016.04.19a', 'exp.2016.03.31d', 'fixes.2016.03.31d' and 'torture.2016.04.21a' into HEAD
doc.2016.04.19a: Documentation updates
exp.2016.03.31d: Expedited grace-period updates
fixes.2016.03.31d: Miscellaneous fixes
torture.2016.004.21a Torture-test updates
2016-04-21 13:48:20 -07:00
Paul E. McKenney
0aa67e75b3 rcutorture: Add irqs-disabled test for call_rcu()
Mutation testing carried out by Iftekhar Ahmed of Oregon State
University showed that rcutorture is failing to test invocations
of call_rcu() having interrupts disabled.  This commit therefore
adds interrupt disabling around one of the existing invocations
of call_rcu() (and friends).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-04-21 13:47:04 -07:00
Paul E. McKenney
e9fb365a88 rcutorture: Dump trace buffer upon shutdown
When running from the scripts, rcutorture is completely headless,
so there is no way to to manually dump the trace buffer.  This commit
therefore unconditionally dumps the trace buffer upon timed shutdown.
However, if you are using rmmod to end the test, it is still up to you
to manually dump the trace buffer.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-04-21 13:47:04 -07:00
Linus Torvalds
c5edde3a81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix memory leak in iwlwifi, from Matti Gottlieb.

 2) Add missing registration of netfilter arp_tables into initial
    namespace, from Florian Westphal.

 3) Fix potential NULL deref in DecNET routing code.

 4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry
    Ivanov.

 5) Fix dst ref counting in VRF, from David Ahern.

 6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck.

 7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause.

 8) Ravalidate IPV6 datagram socket cached routes properly, particularly
    with UDP, from Martin KaFai Lau.

 9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang.

10) Fix stats typing in bcmgenet driver, from Eric Dumazet.

11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing,
    from Joe Stringer.

12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown.

13) atl2 doesn't actually support scatter-gather, so don't advertise the
    feature.  From Ben Hucthings.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
  openvswitch: use flow protocol when recalculating ipv6 checksums
  Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
  atl2: Disable unimplemented scatter/gather feature
  net/mlx4_en: Split SW RX dropped counter per RX ring
  net/mlx4_core: Don't allow to VF change global pause settings
  net/mlx4_core: Avoid repeated calls to pci enable/disable
  net/mlx4_core: Implement pci_resume callback
  net: phy: spi_ks8895: Don't leak references to SPI devices
  net: ethernet: davinci_emac: Fix platform_data overwrite
  net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable
  qede: Fix single MTU sized packet from firmware GRO flow
  qede: Fix setting Skb network header
  qede: Fix various memory allocation error flows for fastpath
  tcp: Merge tx_flags and tskey in tcp_shifted_skb
  tcp: Merge tx_flags and tskey in tcp_collapse_retrans
  drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open
  tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
  openvswitch: Orphan skbs before IPv6 defrag
  Revert "Prevent NUll pointer dereference with two PHYs on cpsw"
  VSOCK: Only check error on skb_recv_datagram when skb is NULL
  ...
2016-04-21 12:57:34 -07:00
Matt Redfearn
4589f450fb genirq: Dont allow affinity mask to be updated on IPIs
The IPI domain re-purposes the IRQ affinity to signify the mask of CPUs
that this IPI will deliver to. This must not be modified before the IPI
is destroyed again, so set the IRQ_NO_BALANCING flag to prevent the
affinity being overwritten by setup_affinity().

Without this, if an IPI is reserved for a single target CPU, then
allocated using __setup_irq(), the affinity is overwritten with
cpu_online_mask. When ipi_destroy() is subsequently called on a
multi-cpu system, it will attempt to free cpumask_weight() IRQs
that were never allocated, and crash.

Fixes: d17bf24e69 ("genirq: Add a new generic IPI reservation code to irq core")
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: jason@lakedaemon.net
Cc: marc.zyngier@arm.com
Cc: ralf@linux-mips.org
Cc: Qais Yousef <qsyousef@gmail.com>
Cc: lisa.parratt@imgtec.com
Link: http://lkml.kernel.org/r/1461229712-13057-1-git-send-email-matt.redfearn@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-21 12:05:15 +02:00
Davidlohr Bueso
fe1bce9e21 futex: Acknowledge a new waiter in counter before plist
Otherwise an incoming waker on the dest hash bucket can miss
the waiter adding itself to the plist during the lockless
check optimization (small window but still the correct way
of doing this); similarly to the decrement counterpart.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: bigeasy@linutronix.de
Cc: dvhart@infradead.org
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1461208164-29150-1-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-21 11:06:09 +02:00
Sebastian Andrzej Siewior
89e9e66ba1 futex: Handle unlock_pi race gracefully
If userspace calls UNLOCK_PI unconditionally without trying the TID -> 0
transition in user space first then the user space value might not have the
waiters bit set. This opens the following race:

CPU0	    	      	    CPU1
uval = get_user(futex)
			    lock(hb)
lock(hb)
			    futex |= FUTEX_WAITERS
			    ....
			    unlock(hb)

cmpxchg(futex, uval, newval)

So the cmpxchg fails and returns -EINVAL to user space, which is wrong because
the futex value is valid.

To handle this (yes, yet another) corner case gracefully, check for a flag
change and retry.

[ tglx: Massaged changelog and slightly reworked implementation ]

Fixes: ccf9e6a80d ("futex: Make unlock_pi more robust")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1460723739-5195-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-20 12:33:13 +02:00
Davidlohr Bueso
6687659568 locking/pvqspinlock: Fix division by zero in qstat_read()
While playing with the qstat statistics (in <debugfs>/qlockstat/) I ran into
the following splat on a VM when opening pv_hash_hops:

  divide error: 0000 [#1] SMP
  ...
  RIP: 0010:[<ffffffff810b61fe>]  [<ffffffff810b61fe>] qstat_read+0x12e/0x1e0
  ...
  Call Trace:
    [<ffffffff811cad7c>] ? mem_cgroup_commit_charge+0x6c/0xd0
    [<ffffffff8119750c>] ? page_add_new_anon_rmap+0x8c/0xd0
    [<ffffffff8118d3b9>] ? handle_mm_fault+0x1439/0x1b40
    [<ffffffff811937a9>] ? do_mmap+0x449/0x550
    [<ffffffff811d3de3>] ? __vfs_read+0x23/0xd0
    [<ffffffff811d4ab2>] ? rw_verify_area+0x52/0xd0
    [<ffffffff811d4bb1>] ? vfs_read+0x81/0x120
    [<ffffffff811d5f12>] ? SyS_read+0x42/0xa0
    [<ffffffff815720f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8

Fix this by verifying that qstat_pv_kick_unlock is in fact non-zero,
similarly to what the qstat_pv_latency_wake case does, as if nothing
else, this can come from resetting the statistics, thus having 0 kicks
should be quite valid in this context.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Waiman Long <Waiman.Long@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: waiman.long@hpe.com
Link: http://lkml.kernel.org/r/1460961103-24953-1-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:49:19 +02:00
Linus Torvalds
ac82a57aff Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixlet from Ingo Molnar:
 "Fixes a build warning on certain Kconfig combinations"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Fix print_collision() unused warning
2016-04-16 15:43:19 -07:00
Linus Torvalds
51d7b12041 /proc/iomem: only expose physical resource addresses to privileged users
In commit c4004b02f8 ("x86: remove the kernel code/data/bss resources
from /proc/iomem") I was hoping to remove the phyiscal kernel address
data from /proc/iomem entirely, but that had to be reverted because some
system programs actually use it.

This limits all the detailed resource information to properly
credentialed users instead.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-14 12:56:09 -07:00
Alexei Starovoitov
d82bccc690 bpf/verifier: reject invalid LD_ABS | BPF_DW instruction
verifier must check for reserved size bits in instruction opcode and
reject BPF_LD | BPF_ABS | BPF_DW and BPF_LD | BPF_IND | BPF_DW instructions,
otherwise interpreter will WARN_RATELIMIT on them during execution.

Fixes: ddd872bc30 ("bpf: verifier: add checks for BPF_ABS | BPF_IND instructions")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 01:31:50 -04:00
Anton Blanchard
bd92883051 sched/cpuacct: Check for NULL when using task_pt_regs()
task_pt_regs() can return NULL for kernel threads, so add a check.
This fixes an oops at boot on ppc64.

Reported-and-Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: htejun@gmail.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: tj@kernel.org
Cc: yangds.fnst@cn.fujitsu.com
Link: http://lkml.kernel.org/r/20160406215950.04bc3f0b@kryten
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 13:22:37 +02:00
Daniel Lezcano
2c923e94cd sched/clock: Make local_clock()/cpu_clock() inline
The local_clock/cpu_clock functions were changed to prevent a double
identical test with sched_clock_cpu() when HAVE_UNSTABLE_SCHED_CLOCK
is set. That resulted in one line functions.

As these functions are in all the cases one line functions and in the
hot path, it is useful to specify them as static inline in order to
give a strong hint to the compiler.

After verification, it appears the compiler does not inline them
without this hint. Change those functions to static inline.

sched_clock_cpu() is called via the inlined local_clock()/cpu_clock()
functions from sched.h. So any module code including sched.h will
reference sched_clock_cpu(). Thus it must be exported with the
EXPORT_SYMBOL_GPL macro.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460385514-14700-2-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 12:25:22 +02:00
Daniel Lezcano
c78b17e28c sched/clock: Remove pointless test in cpu_clock/local_clock
In case the HAVE_UNSTABLE_SCHED_CLOCK config is set, the cpu_clock() version
checks if sched_clock_stable() is not set and calls sched_clock_cpu(),
otherwise it calls sched_clock().

sched_clock_cpu() checks also if sched_clock_stable() is set and, if true,
calls sched_clock().

sched_clock() will be called in sched_clock_cpu() if sched_clock_stable() is
true.

Remove the duplicate test by directly calling sched_clock_cpu() and let the
static key act in this function instead.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460385514-14700-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 12:25:22 +02:00
Rabin Vincent
fb90a6e93c sched/debug: Don't dump sched debug info in SysRq-W
sysrq_sched_debug_show() can dump a lot of information.  Don't print out
all that if we're just trying to get a list of blocked tasks (SysRq-W).
The information is still accessible with SysRq-T.

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:23:21 +02:00
Michal Hocko
d47996082f locking/rwsem: Introduce basis for down_write_killable()
Introduce a generic implementation necessary for down_write_killable().

This is a trivial extension of the already existing down_write() call
which can be interrupted by SIGKILL.  This patch doesn't provide
down_write_killable() yet because arches have to provide the necessary
pieces before.

rwsem_down_write_failed() which is a generic slow path for the
write lock is extended to take a task state and renamed to
__rwsem_down_write_failed_common(). The return value is either a valid
semaphore pointer or ERR_PTR(-EINTR).

rwsem_down_write_failed_killable() is exported as a new way to wait for
the lock and be killable.

For rwsem-spinlock implementation the current __down_write() it updated
in a similar way as __rwsem_down_write_failed_common() except it doesn't
need new exports just visible __down_write_killable().

Architectures which are not using the generic rwsem implementation are
supposed to provide their __down_write_killable() implementation and
use rwsem_down_write_failed_killable() for the slow path.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-7-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:42:20 +02:00
Michal Hocko
f8e04d8545 locking/rwsem: Get rid of __down_write_nested()
This is no longer used anywhere and all callers (__down_write()) use
0 as a subclass. Ditch __down_write_nested() to make the code easier
to follow.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:42:16 +02:00
Denys Vlasenko
c003ed9289 locking/lockdep: Deinline register_lock_class(), save 2328 bytes
This function compiles to 1328 bytes of machine code. Three callsites.

Registering a new lock class is definitely not *that* time-critical to inline it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1460141926-13069-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:06:13 +02:00
Ingo Molnar
889fac6d67 Linux 4.6-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXCva8AAoJEHm+PkMAQRiGXBoIAIkrjxdbuT2nS9A3tHwkiFXa
 6/Th1UjbNaoLuZ+MckQHayAD9NcWY9lVjOUmFsSiSWMCQK/rTWDl8x5ITputrY2V
 VuhrJCwI7huEtu6GpRaJaUgwtdOjhIHz1Ue2MCdNIbKX3l+LjVyyJ9Vo8rruvZcR
 fC7kiivH04fYX58oQ+SHymCg54ny3qJEPT8i4+g26686m11hvZLI3UAs2PAn6ut+
 atCjxdQ4yLN3DWsbjuA7wYGWhTgFloxL4TIoisuOUc3FXnSi/ivIbXZvu4lUfisz
 LA2JBhfII3AEMBWG9xfGbXPijJTT4q7yNlTD0oYcnMtAt/Roh2F04asqB1LetEY=
 =bri6
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc3' into perf/core, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 08:57:03 +02:00
Davidlohr Bueso
c1c33b92db locking/locktorture: Fix NULL pointer dereference for cleanup paths
It has been found that paths that invoke cleanups through
lock_torture_cleanup() can trigger NULL pointer dereferencing
bugs during the statistics printing phase. This is mainly
because we should not be calling into statistics before we are
sure things have been set up correctly.

Specifically, early checks (and the need for handling this in
the cleanup call) only include parameter checks and basic
statistics allocation. Once we start write/read kthreads
we then consider the test as started. As such, update the function
in question to check for cxt.lwsa writer stats, if not set,
we either have a bogus parameter or -ENOMEM situation and
therefore only need to deal with general torture calls.

Reported-and-tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bobby.prani@gmail.com
Cc: dhowells@redhat.com
Cc: dipankar@in.ibm.com
Cc: dvhart@linux.intel.com
Cc: edumazet@google.com
Cc: fweisbec@gmail.com
Cc: jiangshanlai@gmail.com
Cc: josh@joshtriplett.org
Cc: mathieu.desnoyers@efficios.com
Cc: oleg@redhat.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1460476038-27060-2-git-send-email-paulmck@linux.vnet.ibm.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 08:52:23 +02:00
Davidlohr Bueso
1f19093189 locking/locktorture: Fix deboosting NULL pointer dereference
For the case of rtmutex torturing we will randomly call into the
boost() handler, including upon module exiting when the tasks are
deboosted before stopping. In such cases the task may or may not have
already been boosted, and therefore the NULL being explicitly passed
can occur anywhere. Currently we only assume that the task will is
at a higher prio, and in consequence, dereference a NULL pointer.

This patch fixes the case of a rmmod locktorture exploding while
pounding on the rtmutex lock (partial trace):

 task: ffff88081026cf80 ti: ffff880816120000 task.ti: ffff880816120000
 RSP: 0018:ffff880816123eb0  EFLAGS: 00010206
 RAX: ffff88081026cf80 RBX: ffff880816bfa630 RCX: 0000000000160d1b
 RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
 RBP: ffff88081026cf80 R08: 000000000000001f R09: ffff88017c20ca80
 R10: 0000000000000000 R11: 000000000048c316 R12: ffffffffa05d1840
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88203f880000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000008 CR3: 0000000001c0a000 CR4: 00000000000406e0
 Stack:
  ffffffffa05d141d ffff880816bfa630 ffffffffa05d1922 ffff88081e70c2c0
  ffff880816bfa630 ffffffff81095fed 0000000000000000 ffffffff8107bf60
  ffff880816bfa630 ffffffff00000000 ffff880800000000 ffff880816123f08
 Call Trace:
  [<ffffffff81095fed>] kthread+0xbd/0xe0
  [<ffffffff815cf40f>] ret_from_fork+0x3f/0x70

This patch ensures that if the random state pointer is not NULL and current
is not boosted, then do nothing.

 RIP: 0010:[<ffffffffa05c6185>]  [<ffffffffa05c6185>] torture_random+0x5/0x60 [torture]
  [<ffffffffa05d141d>] torture_rtmutex_boost+0x1d/0x90 [locktorture]
  [<ffffffffa05d1922>] lock_torture_writer+0xe2/0x170 [locktorture]

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bobby.prani@gmail.com
Cc: dhowells@redhat.com
Cc: dipankar@in.ibm.com
Cc: dvhart@linux.intel.com
Cc: edumazet@google.com
Cc: fweisbec@gmail.com
Cc: jiangshanlai@gmail.com
Cc: josh@joshtriplett.org
Cc: mathieu.desnoyers@efficios.com
Cc: oleg@redhat.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1460476038-27060-1-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 08:52:23 +02:00
Linus Torvalds
4a2d057e4f Merge branch 'PAGE_CACHE_SIZE-removal'
Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
 "PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
  ago with promise that one day it will be possible to implement page
  cache with bigger chunks than PAGE_SIZE.

  This promise never materialized.  And unlikely will.

  Let's stop pretending that pages in page cache are special.  They are
  not.

  The first patch with most changes has been done with coccinelle.  The
  second is manual fixups on top.

  The third patch removes macros definition"

[ I was planning to apply this just before rc2, but then I spaced out,
  so here it is right _after_ rc2 instead.

  As Kirill suggested as a possibility, I could have decided to only
  merge the first two patches, and leave the old interfaces for
  compatibility, but I'd rather get it all done and any out-of-tree
  modules and patches can trivially do the converstion while still also
  working with older kernels, so there is little reason to try to
  maintain the redundant legacy model.    - Linus ]

* PAGE_CACHE_SIZE-removal:
  mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
  mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
  mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
2016-04-04 10:50:24 -07:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Borislav Petkov
5c8a010c24 locking/lockdep: Fix print_collision() unused warning
Fix this:

  kernel/locking/lockdep.c:2051:13: warning: ‘print_collision’ defined but not used [-Wunused-function]
  static void print_collision(struct task_struct *curr,
              ^

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459759327-2880-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-04 11:41:34 +02:00
Linus Torvalds
4c3b73c6a2 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc kernel side fixes:

   - fix event leak
   - fix AMD PMU driver bug
   - fix core event handling bug
   - fix build bug on certain randconfigs

  Plus misc tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd/ibs: Fix pmu::stop() nesting
  perf/core: Don't leak event in the syscall error path
  perf/core: Fix time tracking bug with multiplexing
  perf jit: genelf makes assumptions about endian
  perf hists: Fix determination of a callchain node's childlessness
  perf tools: Add missing initialization of perf_sample.cpumode in synthesized samples
  perf tools: Fix build break on powerpc
  perf/x86: Move events_sysfs_show() outside CPU_SUP_INTEL
  perf bench: Fix detached tarball building due to missing 'perf bench memcpy' headers
  perf tests: Fix tarpkg build test error output redirection
2016-04-03 07:22:12 -05:00
Linus Torvalds
7b367f5dba Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core kernel fixes from Ingo Molnar:
 "This contains the nohz/atomic cleanup/fix for the fetch_or() ugliness
  you noted during the original nohz pull request, plus there's also
  misc fixes:

   - fix liblockdep build bug
   - fix uapi header build bug
   - print more lockdep hash collision info to help debug recent reports
     of hash collisions
   - update MAINTAINERS email address"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Update my email address
  locking/lockdep: Print chain_key collision information
  uapi/linux/stddef.h: Provide __always_inline to userspace headers
  tools/lib/lockdep: Fix unsupported 'basename -s' in run_tests.sh
  locking/atomic, sched: Unexport fetch_or()
  timers/nohz: Convert tick dependency mask to atomic_t
  locking/atomic: Introduce atomic_fetch_or()
2016-04-03 07:06:53 -05:00
Linus Torvalds
05cf8077e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Missing device reference in IPSEC input path results in crashes
    during device unregistration.  From Subash Abhinov Kasiviswanathan.

 2) Per-queue ISR register writes not being done properly in macb
    driver, from Cyrille Pitchen.

 3) Stats accounting bugs in bcmgenet, from Patri Gynther.

 4) Lightweight tunnel's TTL and TOS were swapped in netlink dumps, from
    Quentin Armitage.

 5) SXGBE driver has off-by-one in probe error paths, from Rasmus
    Villemoes.

 6) Fix race in save/swap/delete options in netfilter ipset, from
    Vishwanath Pai.

 7) Ageing time of bridge not set properly when not operating over a
    switchdev device.  Fix from Haishuang Yan.

 8) Fix GRO regression wrt nested FOU/GUE based tunnels, from Alexander
    Duyck.

 9) IPV6 UDP code bumps wrong stats, from Eric Dumazet.

10) FEC driver should only access registers that actually exist on the
    given chipset, fix from Fabio Estevam.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
  net: mvneta: fix changing MTU when using per-cpu processing
  stmmac: fix MDIO settings
  Revert "stmmac: Fix 'eth0: No PHY found' regression"
  stmmac: fix TX normal DESC
  net: mvneta: use cache_line_size() to get cacheline size
  net: mvpp2: use cache_line_size() to get cacheline size
  net: mvpp2: fix maybe-uninitialized warning
  tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter
  net: usb: cdc_ncm: adding Telit LE910 V2 mobile broadband card
  rtnl: fix msg size calculation in if_nlmsg_size()
  fec: Do not access unexisting register in Coldfire
  net: mvneta: replace MVNETA_CPU_D_CACHE_LINE_SIZE with L1_CACHE_BYTES
  net: mvpp2: replace MVPP2_CPU_D_CACHE_LINE_SIZE with L1_CACHE_BYTES
  net: dsa: mv88e6xxx: Clear the PDOWN bit on setup
  net: dsa: mv88e6xxx: Introduce _mv88e6xxx_phy_page_{read, write}
  bpf: make padding in bpf_tunnel_key explicit
  ipv6: udp: fix UDP_MIB_IGNOREDMULTI updates
  bnxt_en: Fix ethtool -a reporting.
  bnxt_en: Fix typo in bnxt_hwrm_set_pause_common().
  bnxt_en: Implement proper firmware message padding.
  ...
2016-04-01 20:03:33 -05:00
Paul E. McKenney
9eb5188a07 torture: Clarify refusal to run more than one torture test
This commit clarifies error messages -- you only get to run one torture
test at a time!

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:39:52 -07:00
Anna-Maria Gleixner
de26ca19a5 rcutorture: Consider FROZEN hotplug notifier transitions
The hotplug notifier rcutorture_cpu_notify() doesn't consider the
corresponding CPU_XXX_FROZEN transitions. They occur on
suspend/resume and are usually handled the same way as the
corresponding non frozen transitions.

Mask the switch case action argument with '~CPU_TASKS_FROZEN' to map
CPU_XXX_FROZEN hotplug transitions on corresponding non-frozen
transitions.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:39:52 -07:00