Commit graph

576140 commits

Author SHA1 Message Date
Tirumalesh Chalamarla
d243bed32f ahci: Workaround for ThunderX Errata#22536
Due to Errata in ThunderX, HOST_IRQ_STAT should be
cleared before leaving the interrupt handler.
The patch attempts to satisfy the need.

Changes from V2:
	- removed newfile
	- code is now under CONFIG_ARM64

Changes from V1:
	- Rebased on top of libata/for-4.6
        - Moved ThunderX intr handler to new file

tj: Minor adjustments to comments.

Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-02-29 16:16:26 -05:00
Vittorio Alfieri
3c4c615d70 USB: cp210x: Add ID for Parrot NMEA GPS Flight Recorder
The Parrot NMEA GPS Flight Recorder is a USB composite device
consisting of hub, flash storage, and cp210x usb to serial chip.
It is an accessory to the mass-produced Parrot AR Drone 2.
The device emits standard NMEA messages which make the it compatible
with NMEA compatible software. It was tested using gpsd version 3.11-3
as an NMEA interpreter and using the official Parrot Flight Recorder.

Signed-off-by: Vittorio Alfieri <vittorio88@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
2016-02-29 19:01:10 +01:00
Patrik Halfar
013dd239d6 USB: qcserial: add Dell Wireless 5809e Gobi 4G HSPA+ (rev3)
New revision of Dell Wireless 5809e Gobi 4G HSPA+ Mobile Broadband Card
has new idProduct.

Bus 002 Device 006: ID 413c:81b3 Dell Computer Corp.
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0        64
  idVendor           0x413c Dell Computer Corp.
  idProduct          0x81b3
  bcdDevice            0.06
  iManufacturer           1 Sierra Wireless, Incorporated
  iProduct                2 Dell Wireless 5809e Gobi™ 4G HSPA+ Mobile Broadband Card
  iSerial                 3
  bNumConfigurations      2

Signed-off-by: Patrik Halfar <patrik_halfar@halfarit.cz>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
2016-02-29 18:59:22 +01:00
Al Viro
a528aca7f3 use ->d_seq to get coherency between ->d_inode and ->d_flags
Games with ordering and barriers are way too brittle.  Just
bump ->d_seq before and after updating ->d_inode and ->d_flags
type bits, so that verifying ->d_seq would guarantee they are
coherent.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-29 12:16:43 -05:00
Takashi Iwai
eab3c4db19 ALSA: hdsp: Fix wrong boolean ctl value accesses
snd-hdsp driver accesses enum item values (int) instead of boolean
values (long) wrongly for some ctl elements.  This patch fixes them.

Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-29 18:13:34 +01:00
Takashi Iwai
c1099c3294 ALSA: hdspm: Fix zero-division
HDSPM driver contains a code issuing zero-division potentially in
system sample rate ctl code.  This patch fixes it by not processing
a zero or invalid rate value as a divisor, as well as excluding the
invalid value to be passed via the given ctl element.

Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-29 18:13:34 +01:00
Takashi Iwai
537e481362 ALSA: hdspm: Fix wrong boolean ctl value accesses
snd-hdspm driver accesses enum item values (int) instead of boolean
values (long) wrongly for some ctl elements.  This patch fixes them.

Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-29 18:13:34 +01:00
Joerg Roedel
b6809ee573 iommu/amd: Detach device from domain before removal
Detach the device that is about to be removed from its
domain (if it has one) to clear any related state like DTE
entry and device's ATS state.

Reported-by: Kelly Zytaruk <Kelly.Zytaruk@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-02-29 17:25:25 +01:00
Michael S. Tsirkin
887349f69f MIPS: kvm: Fix ioctl error handling.
Calling return copy_to_user(...) or return copy_from_user in an ioctl
will not do the right thing if there's a pagefault:
copy_to_user/copy_from_user return the number of bytes not copied in
this case.

Fix up kvm on mips to do
	return copy_to_user(...)) ?  -EFAULT : 0;
and
	return copy_from_user(...)) ?  -EFAULT : 0;

everywhere.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org
Cc: kvm@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12709/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-02-29 15:52:20 +01:00
Govindraj Raja
56fa81fc9a MIPS: scache: Fix scache init with invalid line size.
In current scache init cache line_size is determined from
cpu config register, however if there there no scache
then mips_sc_probe_cm3 function populates a invalid line_size of 2.

The invalid line_size can cause a NULL pointer deference
during r4k_dma_cache_inv as r4k_blast_scache is populated
based on line_size. Scache line_size of 2 is invalid option in
r4k_blast_scache_setup.

This issue was faced during a MIPS I6400 based virtual platform bring up
where scache was not available in virtual platform model.

Signed-off-by: Govindraj Raja <Govindraj.Raja@imgtec.com>
Fixes: 7d53e9c4cd21("MIPS: CM3: Add support for CM3 L2 cache.")
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hartley <James.Hartley@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.2+
Patchwork: https://patchwork.linux-mips.org/patch/12710/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-02-29 15:44:23 +01:00
Steven Rostedt
a674533078 tools lib traceevent: Split pevent_print_event() into specific functionality functions
Currently there's a single function that is used to display a record's
data in human readable format. That's pevent_print_event().
Unfortunately, this gives little room for adding other output within the
line without updating that function call.

I've decided to split that function into 3 parts.

 pevent_print_event_task() which prints the task comm, pid and the CPU
 pevent_print_event_time() which outputs the record's timestamp
 pevent_print_event_data() which outputs the rest of the event data.

pevent_print_event() now simply calls these three functions.

To save time from doing the search for event from the record's type, I
created a new helper function called pevent_find_event_by_record(),
which returns the record's event, and this event has to be passed to the
above functions.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20160229090128.43a56704@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-02-29 11:35:21 -03:00
Taeung Song
026842d148 tracing/syscalls: Rename "/format" tracepoint field name "nr" to "__syscall_nr:
Some tracepoint have multiple fields with the same name, "nr", the first
one is a unique syscall ID, the other is a syscall argument:

  # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
  name: sys_enter_io_getevents
  ID: 747
  format:
 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
 	field:int common_pid;	offset:4;	size:4;	signed:1;

 	field:int nr;	offset:8;	size:4;	signed:1;
 	field:aio_context_t ctx_id;	offset:16;	size:8;	signed:0;
 	field:long min_nr;	offset:24;	size:8;	signed:0;
 	field:long nr;	offset:32;	size:8;	signed:0;
 	field:struct io_event * events;	offset:40;	size:8;	signed:0;
 	field:struct timespec * timeout;	offset:48;	size:8;	signed:0;

  print fmt: "ctx_id: 0x%08lx, min_nr: 0x%08lx, nr: 0x%08lx, events: 0x%08lx, timeout: 0x%08lx", ((unsigned long)(REC->ctx_id)), ((unsigned long)(REC->min_nr)), ((unsigned long)(REC->nr)), ((unsigned long)(REC->events)), ((unsigned long)(REC->timeout))
  #

Fix it by renaming the "/format" common tracepoint field "nr" to "__syscall_nr".

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
[ Do not rename the struct member, just the '/format' field name ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160226132301.3ae065a4@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-02-29 11:34:53 -03:00
Taeung Song
c42de706da perf trace: Check and discard not only 'nr' but also '__syscall_nr'
Format fields of a syscall have the first variable '__syscall_nr' or
'nr' that mean the syscall number.  But it isn't relevant here so drop
it.

'nr' among fields of syscall was renamed '__syscall_nr'.  So add
exception handling to drop '__syscall_nr' and modify the comment for
this excpetion handling.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1456492465-5946-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-02-29 11:34:28 -03:00
Jiri Olsa
67d5268908 perf tools: Fix python extension build
The util/python-ext-sources file contains source files required to build
the python extension relative to $(srctree)/tools/perf,

Such a file path $(FILE).c is handed over to the python extension build
system, which builds the final object in the
$(PYTHON_EXTBUILD)/tmp/$(FILE).o path.

After the build is done all files from $(PYTHON_EXTBUILD)lib/ are
carried as the result binaries.

Above system fails when we add source file relative to ../lib, which we
do for:

  ../lib/bitmap.c
  ../lib/find_bit.c
  ../lib/hweight.c
  ../lib/rbtree.c

All above objects will be built like:

  $(PYTHON_EXTBUILD)/tmp/../lib/bitmap.c
  $(PYTHON_EXTBUILD)/tmp/../lib/find_bit.c
  $(PYTHON_EXTBUILD)/tmp/../lib/hweight.c
  $(PYTHON_EXTBUILD)/tmp/../lib/rbtree.c

which accidentally happens to be final library path:

  $(PYTHON_EXTBUILD)/lib/

Changing setup.py to pass full paths of source files to Extension build
class and thus keep all built objects under $(PYTHON_EXTBUILD)tmp
directory.

Reported-by: Jeff Bastian <jbastian@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lkml.kernel.org/r/20160227201350.GB28494@krava.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-02-29 11:18:25 -03:00
Frederic Barrat
923adb1646 cxl: Fix PSL timebase synchronization detection
The PSL timebase synchronization is seemingly failing for
configuration not including VIRT_CPU_ACCOUNTING_NATIVE. The driver
shows the following trace in dmesg:
PSL: Timebase sync: giving up!

The PSL timebase register is actually syncing correctly, but the cxl
driver is not detecting it. Fix is to use the proper timebase-to-time
conversion.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # 4.3+
Acked-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29 21:36:00 +11:00
Daniel Sanders
51ff5d7767 MIPS: Avoid variant of .type unsupported by LLVM Assembler
The target independent parts of the LLVM Lexer considers 'fault@function'
to be a single token representing the 'fault' symbol with a 'function'
modifier. However, this is not the case in the .type directive where
'function' refers to STT_FUNC from the ELF standard.

Although GAS accepts it, '.type symbol@function' is an undocumented form of
this directive. The documentation specifies a comma between the symbol and
'@function'.

Signed-off-by: Scott Egerton <Scott.Egerton@imgtec.com>
Signed-off-by: Daniel Sanders <daniel.sanders@imgtec.com>
Reviewed-by: Maciej W. Rozycki <macro@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12587/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-02-29 11:23:49 +01:00
Ralf Baechle
71e60073ca MIPS: jz4740: Fix surviving instance of irq_to_gpio()
This is fallout from commit 832f5dacfa ("MIPS: Remove all the uses of
custom gpio.h").

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Suggested-by: Lars-Peter Clausen <lars@metafoo.de>
2016-02-29 11:23:49 +01:00
Michael S. Tsirkin
4cad67fca3 arm/arm64: KVM: Fix ioctl error handling
Calling return copy_to_user(...) in an ioctl will not
do the right thing if there's a pagefault:
copy_to_user returns the number of bytes not copied
in this case.

Fix up kvm to do
	return copy_to_user(...)) ?  -EFAULT : 0;

everywhere.

Cc: stable@vger.kernel.org
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29 09:56:40 +00:00
Ingo Molnar
9e4e7554e7 locking/lockdep: Detect chain_key collisions
Add detection for chain_key collision under CONFIG_DEBUG_LOCKDEP.
When a collision is detected the problem is reported and all lock
debugging is turned off.

Tested using liblockdep and the added tests before and after
applying the fix, confirming both that the code added for the
detection correctly reports the problem and that the fix actually
fixes it.

Tested tweaking lockdep to generate false collisions and
verified that the problem is reported and that lock debugging is
turned off.

Also tested with lockdep's test suite after applying the patch:

    [    0.000000] Good, all 253 testcases passed! |

Signed-off-by: Alfredo Alvarez Fernandez <alfredoalvarezernandez@gmail.com>
Cc: Alfredo Alvarez Fernandez <alfredoalvarezfernandez@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/1455864533-7536-4-git-send-email-alfredoalvarezernandez@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:32:29 +01:00
Alfredo Alvarez Fernandez
5f18ab5c6b locking/lockdep: Prevent chain_key collisions
The chain_key hashing macro iterate_chain_key(key1, key2) does not
generate a new different value if both key1 and key2 are 0. In that
case the generated value is again 0. This can lead to collisions which
can result in lockdep not detecting deadlocks or circular
dependencies.

Avoid the problem by using class_idx (1-based) instead of class id
(0-based) as an input for the hashing macro 'key2' in
iterate_chain_key(key1, key2).

The use of class id created collisions in cases like the following:

1.- Consider an initial state in which no class has been acquired yet.
Under these circumstances an AA deadlock will not be detected by
lockdep:

  lock  [key1,key2]->new key  (key1=old chain_key, key2=id)
  --------------------------
  A     [0,0]->0
  A     [0,0]->0 (collision)

  The newly generated chain_key collides with the one used before and as
  a result the check for a deadlock is skipped

  A simple test using liblockdep and a pthread mutex confirms the
  problem: (omitting stack traces)

    new class 0xe15038: 0x7ffc64950f20
    acquire class [0xe15038] 0x7ffc64950f20
    acquire class [0xe15038] 0x7ffc64950f20
    hash chain already cached, key: 0000000000000000 tail class:
    [0xe15038] 0x7ffc64950f20

2.- Consider an ABBA in 2 different tasks and no class yet acquired.

  T1 [key1,key2]->new key     T2[key1,key2]->new key
  --                          --
  A [0,0]->0

                              B [0,1]->1

  B [0,1]->1  (collision)

                              A

In this case the collision prevents lockdep from creating the new
dependency A->B. This in turn results in lockdep not detecting the
circular dependency when T2 acquires A.

Signed-off-by: Alfredo Alvarez Fernandez <alfredoalvarezernandez@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/1455147212-2389-4-git-send-email-alfredoalvarezernandez@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:32:29 +01:00
Ingo Molnar
013e379a30 tools/lib/lockdep: Fix link creation warning
This warning triggers if the .so library has already been linked:

 triton:~/tip/tools/lib/lockdep> make
  CC       common.o
  CC       lockdep.o
  CC       rbtree.o
  LD       liblockdep-in.o
  LD       liblockdep.a
  ln: failed to create symbolic link ‘liblockdep.so’: File exists
  LD       liblockdep.so.4.5.0-rc6

Overwrite the link.

Cc: Alfredo Alvarez Fernandez <alfredoalvarezfernandez@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:32:28 +01:00
Alfredo Alvarez Fernandez
11a1ac206d tools/lib/lockdep: Add tests for AA and ABBA locking
Add test for AA and 2 threaded ABBA locking.

Rename AA.c to ABA.c since it was implementing an ABA instead of a pure
AA. Now both cases are covered.

The expected output for AA.c is that the process blocks and lockdep
reports a deadlock.

ABBA_2threads.c differs from ABBA.c in that lockdep keeps separate chains
of held locks per task. This can lead to different behaviour regarding
lock detection. The expected output for this test is that the process
blocks and lockdep reports a circular locking dependency.

These tests found a lockdep bug - fixed by the next commit.

Signed-off-by: Alfredo Alvarez Fernandez <alfredoalvarezfernandez@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1455864533-7536-3-git-send-email-alfredoalvarezernandez@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:29:33 +01:00
Alfredo Alvarez Fernandez
9d5a23ac8e tools/lib/lockdep: Add userspace version of READ_ONCE()
This was added to the kernel code in <1658d35ead> ("list: Use
READ_ONCE() when testing for empty lists").

There's nothing special we need to do about it in userspace.

Signed-off-by: Alfredo Alvarez Fernandez <alfredoalvarezfernandez@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1455864533-7536-2-git-send-email-alfredoalvarezernandez@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:29:27 +01:00
Ingo Molnar
b2ed0998f6 tools/lib/lockdep: Fix the build on recent kernels
The following upstream commit:

  4a389810bc ("kernel/locking/lockdep.c: convert hash tables to hlists")

broke the tools/lib/lockdep build. Add trivial RCU wrappers to fix it.

These wrappers should probably be moved into their own header file.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:29:26 +01:00
Dan Streetman
b82e530290 locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
Move the __ARCH_SPIN_LOCK_UNLOCKED definition from qspinlock.h into
qspinlock_types.h.

The definition of __ARCH_SPIN_LOCK_UNLOCKED comes from the build arch's
include files; but on x86 when CONFIG_QUEUED_SPINLOCKS=y, it just
it's defined in asm-generic/qspinlock.h.  In most cases, this doesn't
matter because linux/spinlock.h includes asm/spinlock.h, which for x86
includes asm-generic/qspinlock.h.  However, any code that only includes
linux/mutex.h will break, because it only includes asm/spinlock_types.h.

For example, this breaks systemtap, which only includes mutex.h.

Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <Waiman.Long@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1455907767-17821-1-git-send-email-dan.streetman@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:02:43 +01:00
Davidlohr Bueso
1329ce6fbb locking/mutex: Allow next waiter lockless wakeup
Make use of wake-queues and enable the wakeup to occur after releasing the
wait_lock. This is similar to what we do with rtmutex top waiter,
slightly shortening the critical region and allow other waiters to
acquire the wait_lock sooner. In low contention cases it can also help
the recently woken waiter to find the wait_lock available (fastpath)
when it continues execution.

Reviewed-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: Will Deacon <Will.Deacon@arm.com>
Link: http://lkml.kernel.org/r/20160125022343.GA3322@linux-uzut.site
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:02:42 +01:00
Waiman Long
32d62510f9 locking/pvqspinlock: Enable slowpath locking count tracking
This patch enables the tracking of the number of slowpath locking
operations performed. This can be used to compare against the number
of lock stealing operations to see what percentage of locks are stolen
versus acquired via the regular slowpath.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1449778666-13593-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:02:42 +01:00
Waiman Long
cb037fdad6 locking/qspinlock: Use smp_cond_acquire() in pending code
The newly introduced smp_cond_acquire() was used to replace the
slowpath lock acquisition loop. Similarly, the new function can also
be applied to the pending bit locking loop. This patch uses the new
function in that loop.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1449778666-13593-1-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:02:42 +01:00
Waiman Long
eaff0e7003 locking/pvqspinlock: Move lock stealing count tracking code into pv_queued_spin_steal_lock()
This patch moves the lock stealing count tracking code into
pv_queued_spin_steal_lock() instead of via a jacket function simplifying
the code.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1449778666-13593-3-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:02:41 +01:00
Peter Zijlstra
920c720aa5 locking/mcs: Fix mcs_spin_lock() ordering
Similar to commit b4b29f9485 ("locking/osq: Fix ordering of node
initialisation in osq_lock") the use of xchg_acquire() is
fundamentally broken with MCS like constructs.

Furthermore, it turns out we rely on the global transitivity of this
operation because the unlock path observes the pointer with a
READ_ONCE(), not an smp_load_acquire().

This is non-critical because the MCS code isn't actually used and
mostly serves as documentation, a stepping stone to the more complex
things we've build on top of the idea.

Reported-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 3552a07a9c ("locking/mcs: Use acquire/release semantics")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 10:02:41 +01:00
Ingo Molnar
39a1142dbb Linux 4.5-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1
 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG
 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl
 w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa
 v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM
 /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg=
 =tvkh
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc6' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:55:22 +01:00
Peter Zijlstra
801ccdbf01 sched/deadline: Remove superfluous call to switched_to_dl()
if (A || B) {

	} else if (A && !B) {

	}

If A we'll take the first branch, if !A we will not satisfy the second.
Therefore the second branch will never be taken.

Reported-by: luca abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160225140149.GK6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:11 +01:00
Sebastian Andrzej Siewior
f904f58263 sched/debug: Fix preempt_disable_ip recording for preempt_disable()
The preempt_disable() invokes preempt_count_add() which saves the caller
in ->preempt_disable_ip. It uses CALLER_ADDR1 which does not look for
its caller but for the parent of the caller. Which means we get the correct
caller for something like spin_lock() unless the architectures inlines
those invocations. It is always wrong for preempt_disable() or
local_bh_disable().

This patch makes the function get_lock_parent_ip() which tries
CALLER_ADDR0,1,2 if the former is a locking function.
This seems to record the preempt_disable() caller properly for
preempt_disable() itself as well as for get_cpu_var() or
local_bh_disable().

Steven asked for the get_parent_ip() -> get_lock_parent_ip() rename.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160226135456.GB18244@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:10 +01:00
Rik van Riel
ff9a9b4c43 sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
When profiling syscall overhead on nohz-full kernels,
after removing __acct_update_integrals() from the profile,
native_sched_clock() remains as the top CPU user. This can be
reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.

This will reduce timing accuracy on nohz_full CPUs to jiffy
based sampling, just like on normal CPUs. It results in
totally removing native_sched_clock from the profile, and
significantly speeding up the syscall entry and exit path,
as well as irq entry and exit, and KVM guest entry & exit.

Additionally, only call the more expensive functions (and
advance the seqlock) when jiffies actually changed.

This code relies on another CPU advancing jiffies when the
system is busy. On a nohz_full system, this is done by a
housekeeping CPU.

A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.

Run times for the microbenchmark:

 4.4				3.8 seconds
 4.5-rc1			3.7 seconds
 4.5-rc1 + first patch		3.3 seconds
 4.5-rc1 + first 3 patches	3.1 seconds
 4.5-rc1 + all patches		2.3 seconds

A non-NOHZ_FULL cpu (not the housekeeping CPU):

 all kernels			1.86 seconds

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: clark@redhat.com
Cc: eric.dumazet@gmail.com
Cc: fweisbec@gmail.com
Cc: luto@amacapital.net
Link: http://lkml.kernel.org/r/1455152907-18495-5-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:10 +01:00
Rik van Riel
9344c92c2e time, acct: Drop irq save & restore from __acct_update_integrals()
It looks like all the call paths that lead to __acct_update_integrals()
already have irqs disabled, and __acct_update_integrals() does not need
to disable irqs itself.

This is very convenient since about half the CPU time left in this
function was spent in local_irq_save alone.

Performance of a microbenchmark that calls an invalid syscall
ten million times in a row on a nohz_full CPU improves 21% vs.
4.5-rc1 with both the removal of divisions from __acct_update_integrals()
and this patch, with runtime dropping from 3.7 to 2.9 seconds.

With these patches applied, the highest remaining cpu user in
the trace is native_sched_clock, which is addressed in the next
patch.

For testing purposes I stuck a WARN_ON(!irqs_disabled()) test
in __acct_update_integrals(). It did not trigger.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: clark@redhat.com
Cc: eric.dumazet@gmail.com
Cc: fweisbec@gmail.com
Cc: luto@amacapital.net
Link: http://lkml.kernel.org/r/1455152907-18495-4-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:09 +01:00
Rik van Riel
b2add86edd acct, time: Change indentation in __acct_update_integrals()
Change the indentation in __acct_update_integrals() to make the function
a little easier to read.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: clark@redhat.com
Cc: eric.dumazet@gmail.com
Cc: fweisbec@gmail.com
Cc: luto@amacapital.net
Link: http://lkml.kernel.org/r/1455152907-18495-3-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:09 +01:00
Rik van Riel
382c2fe994 sched, time: Remove non-power-of-two divides from __acct_update_integrals()
When running a microbenchmark calling an invalid syscall number
in a loop, on a nohz_full CPU, we spend a full 9% of our CPU
time in __acct_update_integrals().

This function converts cputime_t to jiffies, to a timeval, only to
convert the timeval back to microseconds before discarding it.

This patch leaves __acct_update_integrals() functionally equivalent,
but speeds things up by about 12%, with 10 million calls to an
invalid syscall number dropping from 3.7 to 3.25 seconds.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: clark@redhat.com
Cc: eric.dumazet@gmail.com
Cc: fweisbec@gmail.com
Cc: luto@amacapital.net
Link: http://lkml.kernel.org/r/1455152907-18495-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:08 +01:00
Steven Rostedt
c3a990dc9f sched/rt: Kick RT bandwidth timer immediately on start up
I've been debugging why deadline tasks can cause the RT scheduler to
throttle, even when the deadline tasks are only taking up 50% of the
CPU and RT tasks are not even using 1% of the CPU. Here's what I found.

In order to keep a CPU from being hogged by RT tasks, the deadline
scheduler adds its run time (delta_exec) to the rt_time of the RT
bandwidth. That way, if the two use more than 95% of the CPU within one
second (default settings), the RT tasks are throttled to allow non RT
tasks to run.

Although the deadline tasks add their run time to the RT bandwidth, it
lets the RT tasks do the accounting. This is where the problem lies. If
a deadline task runs for a bit, and no RT tasks are running, then it
will continually add to the RT rt_time that is used to calculate how
much CPU the RT tasks use. But no RT period is in play, and this
accumulation of the runtime never gets reset.

When an RT task finally gets to run, and the watchdog goes off, it can
see that the RT task has used more than it should of, because the
deadline task added all this runtime to its rt_time. Then the RT task
that just woke up gets throttled for no good reason.

I also noticed that when an RT task is queued, it starts the timer to
account for overload and such. But that timer goes off one period
later, which may be too late and the extra rt_time will trigger a
throttle.

This is a quick work around to the problem. When a new RT task is
queued, the bandwidth timer is set to go off immediately. Then the
timer can clear out the extra time added to the rt_time while there was
no RT task running. This stops my tests from triggering the throttle,
and it will still throttle if an RT task runs too much, even while a
deadline task is running.

A better solution may be to subtract the bandwidth that the deadline
task uses from the rt_runtime, and add it back when its finished. Then
there wont be a need for runtime tracking of the time used by deadline
tasks.

I may play with that solution tomorrow.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <juri.lelli@gmail.com>
Cc: <williams@redhat.com>
Cc: Clark Williams
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160216183746.349ec98b@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:07 +01:00
Steven Rostedt (Red Hat)
ef477183d0 sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug
Playing with SCHED_DEADLINE and cpusets, I found that I was unable to create
new SCHED_DEADLINE tasks, with the error of EBUSY as if the bandwidth was
already used up. I then realized there wa no way to see what bandwidth is
used by the runqueues to debug the issue.

By adding the dl_bw->bw and dl_bw->total_bw to the output of the deadline
info in /proc/sched_debug, this allows us to see what bandwidth has been
reserved and where a problem may exist.

For example, before the issue we see the ratio of the bandwidth:

 # cat /proc/sys/kernel/sched_rt_runtime_us
 950000
 # cat /proc/sys/kernel/sched_rt_period_us
 1000000

  # grep dl /proc/sched_debug
  dl_rq[0]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[1]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[2]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[3]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[4]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[5]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[6]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[7]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0

Note: (950000 / 1000000) << 20 == 996147

After I played with cpusets and hit the issue, the result is now:

  # grep dl /proc/sched_debug
  dl_rq[0]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -104857
  dl_rq[1]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 104857
  dl_rq[2]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 104857
  dl_rq[3]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 104857
  dl_rq[4]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -104857
  dl_rq[5]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -104857
  dl_rq[6]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -104857
  dl_rq[7]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -104857

This shows that there is definitely a problem as we should never have a
negative total bandwidth.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160222212825.756849091@goodmis.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:07 +01:00
Steven Rostedt (Red Hat)
3866e845ed sched/debug: Move sched_domain_sysctl to debug.c
The sched_domain_sysctl setup is only enabled when SCHED_DEBUG is
configured. As debug.c is only compiled when SCHED_DEBUG is configured as
well, move the setup of sched_domain_sysctl into that file.

Note, the (un)register_sched_domain_sysctl() functions had to be changed
from static to allow access to them from core.c.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160222212825.599278093@goodmis.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:06 +01:00
Steven Rostedt (Red Hat)
d6ca41d792 sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c
As /sys/kernel/debug/sched_features is only created when SCHED_DEBUG is enabled, and the file
debug.c is only compiled when SCHED_DEBUG is enabled, it makes sense to move
sched_feature setup into that file and get rid of the #ifdef.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160222212825.464193063@goodmis.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:06 +01:00
Peter Zijlstra
ff77e46853 sched/rt: Fix PI handling vs. sched_setscheduler()
Andrea Parri reported:

> I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not
> handled correctly:
>
>     T1 (prio = 20)
>        lock(rtmutex);
>
>     T2 (prio = 20)
>        blocks on rtmutex  (rt_nr_boosted = 0 on T1's rq)
>
>     T1 (prio = 20)
>        sys_set_scheduler(prio = 0)
>           [new_effective_prio == oldprio]
>           T1 prio = 20    (rt_nr_boosted = 0 on T1's rq)
>
> The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted());
> in particular, if we continue with
>
>    T1 (prio = 20)
>       unlock(rtmutex)
>          wakeup(T2)
>          adjust_prio(T1)
>             [prio != rt_mutex_getprio(T1)]
>	    dequeue(T1)
>	       rt_nr_boosted = (unsigned long)(-1)
>	       ...
>             T1 prio = 0
>
> then we end up leaving rt_nr_boosted in an "inconsistent" state.
>
> The simple program attached could reproduce the previous scenario; note
> that, as a consequence of the presence of this state, the "assertion"
>
>     WARN_ON(!rt_nr_running && rt_nr_boosted)
>
> from dec_rt_group() may trigger.

So normally we dequeue/enqueue tasks in sched_setscheduler(), which
would ensure the accounting stays correct. However in the early PI path
we fail to do so.

So this was introduced at around v3.14, by:

  c365c292d0 ("sched: Consider pi boosting in setscheduler()")

which fixed another problem exactly because that dequeue/enqueue, joy.

Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it
preserve runqueue location with that option. This requires decoupling
the on_rt_rq() state from being on the list.

In order to allow for explicit movement during the SAVE/RESTORE,
introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these
cases to preserve other invariants.

Respecting the SAVE/RESTORE flags also has the (nice) side-effect that
things like sys_nice()/sys_sched_setaffinity() also do not reorder
FIFO tasks (whereas they used to before this patch).

Reported-by: Andrea Parri <parri.andrea@gmail.com>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:05 +01:00
Dongsheng Yang
41d9339733 sched/core: Remove duplicated sched_group_set_shares() prototype
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452674558-31897-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:05 +01:00
Frederic Weisbecker
be68a682c0 sched/fair: Consolidate nohz CPU load update code
Lets factorize a bit of code there. We'll even have a third user soon.
While at it, standardize the idle update function name against the
others.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452700891-21807-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:04 +01:00
Byungchul Park
7400d3bbaa sched/fair: Avoid using decay_load_missed() with a negative value
decay_load_missed() cannot handle nagative values, so we need to prevent
using the function with a negative value.

Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: perterz@infradead.org
Fixes: 5954327548 ("sched/fair: Prepare __update_cpu_load() to handle active tickless")
Link: http://lkml.kernel.org/r/20160115070749.GA1914@X58A-UD3R
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:03 +01:00
Ingo Molnar
6aa447bcbb Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:42:07 +01:00
Peter Zijlstra
48be3a67da sched/deadline: Always calculate end of period on sched_yield()
Steven noticed that occasionally a sched_yield() call would not result
in a wait for the next period edge as expected.

It turns out that when we call update_curr_dl() and end up with
delta_exec <= 0, we will bail early and fail to throttle.

Further inspection of the yield code revealed that yield_task_dl()
clearing dl.runtime is wrong too, it will not account the last bit of
runtime which could result in dl.runtime < 0, which in turn means that
replenish would gift us with too much runtime.

Fix both issues by not relying on the dl.runtime value for yield.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160223122822.GP6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:41:51 +01:00
Peter Zijlstra
6fe1f348b3 sched/cgroup: Fix cgroup entity load tracking tear-down
When a cgroup's CPU runqueue is destroyed, it should remove its
remaining load accounting from its parent cgroup.

The current site for doing so it unsuited because its far too late and
unordered against other cgroup removal (->css_free() will be, but we're also
in an RCU callback).

Put it in the ->css_offline() callback, which is the start of cgroup
destruction, right after the group has been made unavailable to
userspace. The ->css_offline() callbacks are called in hierarchical order
after the following v4.4 commit:

  aa226ff4a1 ("cgroup: make sure a parent css isn't offlined before its children")

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160121212416.GL6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:41:50 +01:00
Thomas Gleixner
675965b00d perf: Export perf_event_sysfs_show()
Required to use it in modular perf drivers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.930735780@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:27 +01:00
Thomas Gleixner
9de8d68695 perf/x86/intel/rapl: Convert it to a per package facility
RAPL is a per package facility and we already have a mechanism for a dedicated
per package reader. So there is no point to have multiple CPUs doing the
same. The current implementation actually starts two timers on two CPUs if one
does:

	perf stat -C1,2 -e -e power/energy-pkg ....

which makes the whole concept of 1 reader per package moot.

What's worse is that the above returns the double of the actual energy
consumption, but that's a different problem to address and cannot be solved by
removing the pointless per cpuness of that mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.845369524@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:26 +01:00