Commit graph

568243 commits

Author SHA1 Message Date
Hemant Kumar
270bde1e76 perf probe: Search both .eh_frame and .debug_frame sections for probe location
'perf probe' through debuginfo__find_probes() in util/probe-finder.c
checks for the functions' frame descriptions in either .eh_frame section
of an ELF or the .debug_frame.

The check is based on whether either one of these sections is present.
Depending on distro, toolchain defaults, architetcutre, build flags,
etc., CFI might be found in either .eh_frame and/or .debug_frame.
Sometimes, it may happen that, .eh_frame, even if present, may not be
complete and may miss some descriptions.

Therefore, to be sure, to find the CFI covering an address we will
always have to investigate both if available.

For e.g., in powerpc, this may happen:
  $ gcc -g bin.c -o bin

  $ objdump --dwarf ./bin
  <1><145>: Abbrev Number: 7 (DW_TAG_subprogram)
     <146> DW_AT_external   : 1
     <146> DW_AT_name       : (indirect string, offset: 0x9e): main
     <14a> DW_AT_decl_file  : 1
     <14b> DW_AT_decl_line  : 39
     <14c> DW_AT_prototyped : 1
     <14c> DW_AT_type       : <0x57>
     <150> DW_AT_low_pc     : 0x100007b8

If the .eh_frame and .debug_frame are checked for the same binary, we
will find that, .eh_frame (although present) doesn't contain a
description for "main" function.

But, .debug_frame has a description:

  000000d8 00000024 00000000 FDE cie=00000000 pc=100007b8..10000838
    DW_CFA_advance_loc: 16 to 100007c8
    DW_CFA_def_cfa_offset: 144
    DW_CFA_offset_extended_sf: r65 at cfa+16
  ...

Due to this (since, perf checks whether .eh_frame is present and goes on
searching for that address inside that frame), perf is unable to process
the probes:

  # perf probe -x ./bin main
    Failed to get call frame on 0x100007b8
    Error: Failed to add events.

To avoid this issue, we need to check both the sections (.eh_frame and
.debug_frame), which is done in this patch.

Note that, we can always force everything into both .eh_frame and
.debug_frame by:

  $ gcc bin.c -fasynchronous-unwind-tables  -fno-dwarf2-cfi-asm -g -o bin

Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Mark Wielaard <mjw@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1454426806-13974-1-git-send-email-hemant@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-02-02 13:30:16 -03:00
Adrian Hunter
3a4acda1ec perf tools: Fix thread lifetime related segfaut in intel_pt
intel_pt_process_auxtrace_info() creates a pt->unknown_thread thread
that eventually needs to be freed by the last thread__put() on it, when
its refcount hits zero, which may happen in
intel_pt_process_auxtrace_info() error handling path and triggers the
following segfault, which would happen as well at intel_pt_free, when
tools using this intel_pt codebase frees up resources:

  # perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  0  a  anaconda-ks.cfg  bin   perf.data	perf.data.old  perf-f23-bringup.todo
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.217 MB perf.data ]
  #
  # perf script -F event,comm,pid,tid,time,addr,ip,sym,dso,iregs
  Samples for 'instructions:u' event do not have IREGS attribute set. Cannot print 'iregs' field.
  intel_pt_synth_events: failed to synthesize 'instructions' event type
  Segmentation fault (core dumped)
  #

The problem is: there's a union in 'struct thread' combines a list_head
and a rb_node. The standard life cycle of a thread is: init rb_node in
the constructor, insert it into machine->threads rbtree using rb_node,
move it to machine->dead_threads using list_head, clean in the last
thread__put: list_del_init(&thread->node).

In the above command, it clean a thread before adding it into list,
causes the above segfault.

Since pt->unknown_thread will never live in an rbtree, initialize its
list node so that when list_del_init() is done on it we don't segfault.

After this patch:

  # perf script -F event,comm,pid,tid,time,addr,ip,sym,dso,iregs
  Samples for 'instructions:u' event do not have IREGS attribute set. Cannot print 'iregs' field.
  intel_pt_synth_events: failed to synthesize 'instructions' event type
  0x248 [0x88]: failed to process type: 70
  #

Reported-by: Tong Zhang <ztong@vt.edu>
Reported-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lkml.kernel.org/r/1454296865-19749-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-02-02 12:51:11 -03:00
Adrian Hunter
ec183d22cc perf tools: tracepoint_error() can receive e=NULL, robustify it
Fixes segmentation fault using, for instance:

  (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".

 Program received signal SIGSEGV, Segmentation fault.
  0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  (gdb) bt
  #0  0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  #1  0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:433
  #2  0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:498
  #3  0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:936
  #4  0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391
  #5  0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361
  #6  0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401
  #7  0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253
  #8  0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364
  #9  0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664
  #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539
  #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264
  #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390
  #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451
  #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495
  #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618
(gdb)

Intel PT attempts to find the sched:sched_switch tracepoint but that seg
faults if tracefs is not readable, because the error reporting structure
is null, as errors are not reported when automatically adding
tracepoints.  Fix by checking before using.

Committer note:

This doesn't take place in a kernel that supports
perf_event_attr.context_switch, that is the default way that will be
used for tracking context switches, only in older kernels, like 4.2, in
a machine with Intel PT (e.g. Broadwell) for non-priviledged users.

Further info from a similar patch by Wang:

The error is in tracepoint_error: it assumes the 'e' parameter is valid.

However, there are many situation a parse_event() can be called without
parse_events_error. See result of

  $ grep 'parse_events(.*NULL)' ./tools/perf/ -r'

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tong Zhang <ztong@vt.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: 196581717d ("perf tools: Enhance parsing events tracepoint error output")
Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-02-01 11:51:15 -03:00
Ingo Molnar
28fb8a5b6e perf/urgent fixes:
User visible:
 
 - Fix 'perf stat' stddev reporting due to mistakenly cleaning event
   private stats (Jiri Olsa)
 
 - Fix 'perf test CQM' endless loop detected by 'gcc6 -Wmisleading-indentation'
   (Markus Trippelsdorf)
 
 - Fix behaviour of Shift-Tab when nothing is focussed in the annotate TUI browser,
   detected with gcc6 -Wmisleading-indentation (Markus Trippelsdorf)
 
 - Fix mem data cacheline hists browser width setting for unresolved
   addresses (Jiri Olsa)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWp4FSAAoJENZQFvNTUqpAPJwP/2uinr/m6fGhCkG5BtcIWjPy
 zuDrB6kLG+37UI/7oqBzntcas/pAk5rY0WNB9UWUKs+wZaXSPlErlfIyvnQrbcb6
 JENqiFdn+TIpl+BsT/jIWA/zNQMc0KdxwaPMmdtYb5wNfd46sX/5zEozhxqCNaOu
 sxHpssesV1TVuoJ9/UzML5lUcPHZC1AFdwIRqxFos6bE7hMLvypNz7093GYCgJhT
 NpiNLf/X7soRmIj+Q7rWLi1NEGyqMjSNR73SBFdc4nkgcEN+oX1AsGhxe9MFZQ6D
 d61MgSTCk9bGNt8AopLYOgQRniwEVYagELoxAiY0d3xPQo7RLwNotGxWPntCOygl
 4rs4YI7A810ZK4hO16De+VThqlv5vIUU2vQAzxZB07CsP/fy6BEq+cc8UqJytc2R
 8XnTb6hTk1Yf42M9IUGTtSY+sCGXp8T9FzIkE5hfhKaNN8cVC2KDXwSjTPfQno2B
 OYfEqB5WCiOaNUnpOTjgGZwB+VC9vhdEa5Wh9DtShFn/n7DUmG+HJlvB1qPsL0UW
 n2ATJUYcAjhVogVMVnxN1B8ZINNp1rtjg9KcM7/RlGi8pjYawxmS9T2KqpwcsoQd
 5jsI32pWVXIpYZMsyMSQwa4USAbIcCAtvNjINkpIYeABZ6gblLjjYlXwCV3DrzIR
 WW7prmPnPTlNF1KGoD6Y
 =zy8+
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

 - Fix 'perf stat' stddev reporting due to mistakenly cleaning event
   private stats (Jiri Olsa)

 - Fix 'perf test CQM' endless loop detected by 'gcc6 -Wmisleading-indentation'
   (Markus Trippelsdorf)

 - Fix behaviour of Shift-Tab when nothing is focussed in the annotate TUI browser,
   detected with gcc6 -Wmisleading-indentation (Markus Trippelsdorf)

 - Fix mem data cacheline hists browser width setting for unresolved
   addresses (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 09:15:49 +01:00
Peter Zijlstra
5fa7c8ec57 perf: Remove/simplify lockdep annotation
Now that the perf_event_ctx_lock_nested() call has moved from
put_event() into perf_event_release_kernel() the first reason is no
longer valid as that can no longer happen.

The second reason seems to have been invalidated when Al Viro made fput()
unconditionally async in the following commit:

  4a9d4b024a ("switch fput to task_work_add")

such that munmap()->fput()->release()->perf_release() would no longer happen.

Therefore, remove the annotation. This should increase the efficiency
of lockdep coverage of perf locking.

Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:36 +01:00
Peter Zijlstra
c6e5b73242 perf: Synchronously clean up child events
The orphan cleanup workqueue doesn't always catch orphans, for example,
if they never schedule after they are orphaned. IOW, the event leak is
still very real. It also wouldn't work for kernel counters.

Doing it synchonously is a little hairy due to lock inversion issues,
but is made to work.

Patch based on work by Alexander Shishkin.

Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:35 +01:00
Peter Zijlstra
60beda8493 perf: Untangle 'owner' confusion
There are two concepts of owner wrt an event and they are conflated:

 - event::owner / event::owner_list,
   used by prctl(.option = PR_TASK_PERF_EVENTS_{EN,DIS}ABLE).

 - the 'owner' of the event object, typically the file descriptor.

Currently these two concepts are conflated, which gives trouble with
scm_rights passing of file descriptors. Passing the event and then
closing the creating task would render the event 'orphan' and would
have it cleared out. Unlikely what is expectd.

This patch untangles these two concepts by using PERF_EVENT_STATE_EXIT
to denote the second type.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:34 +01:00
Peter Zijlstra
45a0e07abf perf: Add flags argument to perf_remove_from_context()
In preparation to adding more options, convert the boolean argument
into a flags word.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:33 +01:00
Peter Zijlstra
8ba289b8d4 perf: Clean up sync_child_event()
sync_child_event() has outgrown its purpose, it does far too much.
Bring it back to its named purpose.

Rename __perf_event_exit_task() to perf_event_exit_event() to better
reflect what it does and move the event->state assignment under the
ctx->lock, like state changes ought to be.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:32 +01:00
Peter Zijlstra
f47c02c0c8 perf: Robustify event->owner usage and SMP ordering
Use smp_store_release() to clear event->owner and
lockless_dereference() to observe it. Further use READ_ONCE() for all
lockless reads.

This changes perf_remove_from_owner() to leave event->owner cleared.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:31 +01:00
Peter Zijlstra
6e801e0169 perf: Fix STATE_EXIT usage
We should never attempt to enable a STATE_EXIT event.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:30 +01:00
Peter Zijlstra
07c4a77613 perf: Update locking order
Update the locking order to note that ctx::lock nests inside of
child_mutex, as per:

  perf_ioctl():                ctx::mutex
  -> perf_event_for_each():    event::child_mutex
    -> _perf_event_enable():   ctx::lock

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:29 +01:00
Peter Zijlstra
a0733e695b perf: Remove __free_event()
There is but a single caller, remove the function - we already have
_free_event(), the extra indirection is nonsensical..

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:25 +01:00
Alexei Starovoitov
e03e7ee34f perf/bpf: Convert perf_event_array to use struct file
Robustify refcounting.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160126045947.GA40151@ast-mbp.thefacebook.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:25 +01:00
Peter Zijlstra
828b6f0e26 perf: Fix NULL deref
Dan reported:

  1229                  if (ctx->task == TASK_TOMBSTONE ||
  1230                      !atomic_inc_not_zero(&ctx->refcount)) {
  1231                          raw_spin_unlock(&ctx->lock);
  1232                          ctx = NULL;
                                ^^^^^^^^^^
ctx is NULL.

  1233                  }
  1234
  1235                  WARN_ON_ONCE(ctx->task != task);
                                     ^^^^^^^^^^^^^^^^^
The patch adds a NULL dereference.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 63b6da39bb ("perf: Fix perf_event_exit_task() race")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:24 +01:00
Peter Zijlstra
8f04b8536f perf/x86: De-obfuscate code
Get rid of the 'onln' obfuscation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:24 +01:00
Peter Zijlstra
e01d8718de perf/x86: Fix uninitialized value usage
When calling intel_alt_er() with .idx != EXTRA_REG_RSP_* we will not
initialize alt_idx and then use this uninitialized value to index an
array.

When that is not fatal, it can result in an infinite loop in its
caller __intel_shared_reg_get_constraints(), with IRQs disabled.

Alternative error modes are random memory corruption due to the
cpuc->shared_regs->regs[] array overrun, which manifest in either
get_constraints or put_constraints doing weird stuff.

Only took 6 hours of painful debugging to find this. Neither GCC nor
Smatch warnings flagged this bug.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: ae3f011fc2 ("perf/x86/intel: Fix SLM MSR_OFFCORE_RSP1 valid_mask")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:23 +01:00
Peter Zijlstra
6a3351b612 perf: Fix race in perf_event_exit_task_context()
There is a race between perf_event_exit_task_context() and
orphans_remove_work() which results in a use-after-free.

We mark ctx->task with TASK_TOMBSTONE to indicate a context is
'dead', under ctx->lock. After which point event_function_call()
on any event of that context will NOP

A concurrent orphans_remove_work() will only hold ctx->mutex for
the list iteration and not serialize against this. Therefore its
possible that orphans_remove_work()'s perf_remove_from_context()
call will fail, but we'll continue to free the event, with the
result of free'd memory still being on lists and everything.

Once perf_event_exit_task_context() gets around to acquiring
ctx->mutex it too will iterate the event list, encounter the
already free'd event and proceed to free it _again_. This fails
with the WARN in free_event().

Plug the race by having perf_event_exit_task_context() hold
ctx::mutex over the whole tear-down, thereby 'naturally'
serializing against all other sites, including the orphan work.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: alexander.shishkin@linux.intel.com
Cc: dsahern@gmail.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/20160125130954.GY6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-28 20:06:36 +01:00
Peter Zijlstra
78cd2c748f perf: Fix orphan hole
We should set event->owner before we install the event,
otherwise there is a hole where the target task can fork() and
we'll not inherit the event because it thinks the event is
orphaned.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-28 20:06:35 +01:00
Jiri Olsa
3f416f22d1 perf stat: Do not clean event's private stats
Mel reported stddev reporting was broken due to following commit:

	106a94a0f8 ("perf stat: Introduce read_counters function")

This commit merged interval and overall counters reading into single
read_counters function.

The old interval code cleaned the stddev data for some reason (it's
never displayed in interval mode) and the mentioned commit kept on
cleaning the stddev data in merged function, which resulted in the
stddev not being displayed.

Removing the wrong stddev data cleanup init_stats call.

Reported-and-Tested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org # v4.2+
Fixes: 106a94a0f8 ("perf stat: Introduce read_counters function")
Link: http://lkml.kernel.org/r/1453290995-18485-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-26 11:15:11 -03:00
Jiri Olsa
0805909f59 perf hists: Fix HISTC_MEM_DCACHELINE width setting
Set correct width for unresolved mem_dcacheline addr.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 9b32ba71ba ("perf tools: Add dcacheline sort")
Link: http://lkml.kernel.org/r/1453290995-18485-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-26 11:14:55 -03:00
Markus Trippelsdorf
d4913cbd05 perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
The issue was pointed out by gcc-6's -Wmisleading-indentation.

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: c97cf42219 ("perf top: Live TUI Annotation")
Link: http://lkml.kernel.org/r/20151214154403.GB1409@x4
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-26 11:14:25 -03:00
Markus Trippelsdorf
cf89813a5b perf tests: Remove wrong semicolon in while loop in CQM test
The while loop was spinning. Fix by removing a semicolon.

The issue was pointed out by gcc-6's -Wmisleading-indentation.

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 035827e9f2 ("perf tests: Add Intel CQM test")
Link: http://lkml.kernel.org/r/20151214154335.GA1409@x4
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-26 11:14:06 -03:00
Alexander Shishkin
45c815f06b perf: Synchronously free aux pages in case of allocation failure
We are currently using asynchronous deallocation in the error path in
AUX mmap code, which is unnecessary and also presents a problem for users
that wish to probe for the biggest possible buffer size they can get:
they'll get -EINVAL on all subsequent attemts to allocate a smaller
buffer before the asynchronous deallocation callback frees up the pages
from the previous unsuccessful attempt.

Currently, gdb does that for allocating AUX buffers for Intel PT traces.
More specifically, overwrite mode of AUX pmus that don't support hardware
sg (some implementations of Intel PT, for instance) is limited to only
one contiguous high order allocation for its buffer and there is no way
of knowing its size without trying.

This patch changes error path freeing to be synchronous as there won't
be any contenders for the AUX pages at that point.

Reported-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1453216469-9509-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:27 +01:00
Stephane Eranian
0e1eb0a1f5 perf/x86: add Intel SkyLake uncore IMC PMU support
This patch enables the uncore_imc PMU for Intel
SkyLake Desktop processors (Core i7-6700, model 94).

It is possible to compute memory read/write bandwidth
using:

  $ perf stat -a -e uncore_imc/data_reads/,uncore_imc/data_writes/ ....

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1452151546-8853-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:26 +01:00
Peter Zijlstra
63b6da39bb perf: Fix perf_event_exit_task() race
There is a race against perf_event_exit_task() vs
event_function_call(),find_get_context(),perf_install_in_context()
(iow, everyone).

Since there is no permanent marker on a context that its dead, it is
quite possible that we access (and even modify) a context after its
passed through perf_event_exit_task().

For instance, find_get_context() might find the context still
installed, but by the time we get to perf_install_in_context() it
might already have passed through perf_event_exit_task() and be
considered dead, we will however still add the event to it.

Solve this by marking a ctx dead by setting its ctx->task value to -1,
it must be !0 so we still know its a (former) task context.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:25 +01:00
Peter Zijlstra
c97f473643 perf: Add more assertions
Try to trigger warnings before races do damage.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:25 +01:00
Peter Zijlstra
fae3fde651 perf: Collapse and fix event_function_call() users
There is one common bug left in all the event_function_call() users,
between loading ctx->task and getting to the remote_function(),
ctx->task can already have been changed.

Therefore we need to double check and retry if ctx->task != current.

Insert another trampoline specific to event_function_call() that
checks for this and further validates state. This also allows getting
rid of the active/inactive functions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:24 +01:00
Peter Zijlstra
32132a3d0d perf: Specialize perf_event_exit_task()
The perf_remove_from_context() usage in __perf_event_exit_task() is
different from the other usages in that this site has already
detached and scheduled out the task context.

This will stand in the way of stronger assertions checking the (task)
context scheduling invariants.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:24 +01:00
Peter Zijlstra
39a4364076 perf: Fix task context scheduling
There is a very nasty problem wrt disabling the perf task scheduling
hooks.

Currently we {set,clear} ctx->is_active on every
__perf_event_task_sched_{in,out}, _however_ this means that if we
disable these calls we'll have task contexts with ->is_active set that
are not active and 'active' task contexts without ->is_active set.

This can result in event_function_call() looping on the ctx->is_active
condition basically indefinitely.

Resolve this by changing things such that contexts without events do
not set ->is_active like we used to. From this invariant it trivially
follows that if there are no (task) events, every task ctx is inactive
and disabling the context switch hooks is harmless.

This leaves two places that need attention (and already had
accumulated weird and wonderful hacks to work around, without
recognising this actual problem).

Namely:

 - perf_install_in_context() will need to deal with installing events
   in an inactive context, meaning it cannot rely on ctx-is_active for
   its IPIs.

 - perf_remove_from_context() will have to mark a context as inactive
   when it removes the last event.

For specific detail, see the patch/comments.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:23 +01:00
Peter Zijlstra
63e30d3e52 perf: Make ctx->is_active and cpuctx->task_ctx consistent
For no apparent reason and to great confusion the rules for
ctx->is_active and cpuctx->task_ctx are different. This means that its
not always possible to find all active (task) contexts.

Fix this such that if ctx->is_active gets set, we also set (or verify)
cpuctx->task_ctx.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:23 +01:00
Peter Zijlstra
25432ae96a perf: Optimize perf_sched_events() usage
It doesn't make sense to take up-to _4_ references on
perf_sched_events() per event, avoid doing this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:22 +01:00
Peter Zijlstra
aee7dbc45f perf: Simplify/fix perf_event_enable() event scheduling
Like perf_enable_on_exec(), perf_event_enable() event scheduling has problems
respecting the context hierarchy when trying to schedule events (for
example, it will try and add a pinned event without first removing
existing flexible events).

So simplify it by using the new ctx_resched() call which will DTRT.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:22 +01:00
Peter Zijlstra
8833d0e286 perf: Use task_ctx_sched_out()
We have a function that does exactly what we want here, use it. This
reduces the amount of cpuctx->task_ctx muckery.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:21 +01:00
Peter Zijlstra
3e349507d1 perf: Fix perf_enable_on_exec() event scheduling
There are two problems with the current perf_enable_on_exec() event
scheduling:

  - the newly enabled events will be immediately scheduled
    irrespective of their ctx event list order.

  - there's a hole in the ctx->lock between scheduling the events
    out and putting them back on.

Esp. the latter issue is a real problem because a hole in event
scheduling leaves the thing in an observable inconsistent state,
confusing things.

Fix both issues by first doing the enable iteration and at the end,
when there are newly enabled events, reschedule the ctx in one go.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:20 +01:00
Peter Zijlstra
5947f6576e perf: Remove stale comment
The comment here is horribly out of date, remove it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:20 +01:00
Peter Zijlstra
70a0165752 perf: Fix cgroup scheduling in perf_enable_on_exec()
There is a comment that states that perf_event_context_sched_in() will
also switch in the cgroup events, I cannot find it does so. Therefore
all the resulting logic goes out the window too.

Clean that up.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:19 +01:00
Peter Zijlstra
7e41d17753 perf: Fix cgroup event scheduling
There appears to be a problem in __perf_event_task_sched_in() wrt
cgroup event scheduling.

The normal event scheduling order is:

	CPU pinned
	Task pinned
	CPU flexible
	Task flexible

And since perf_cgroup_sched*() only schedules the cpu context, we must
call this _before_ adding the task events.

Note: double check what happens on the ctx switch optimization where
the task ctx isn't scheduled.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:19 +01:00
Peter Zijlstra
c994d61367 perf: Add lockdep assertions
Make various bugs easier to see.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 18:54:18 +01:00
Ingo Molnar
f5540ecb89 perf tools improvements and fixes:
User visible bug fixes:
 
 - Fix reading of build-id from vDSO (Ben Hutchings)
 
 - Fix processing samples for guests, noticed with 'perf kvm',
   but noticeable as well via other tools, such as 'perf top'
   (Ravi Bangoria)
 
 Build infrastructure:
 
 - Add feature-dump target and FEATURES_DUMP make variable, to
   allow reusing the feature detection results among multiple
   tools/ living codebases, such as perf and lib/bpf (Jiri Olsa)
 
 - 'make -C tools/perf build-test' improvements, making it more
   paralelizable and allowing building it outside of the source
   tree, using O= (Wang Nan)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWmV0JAAoJENZQFvNTUqpAjuMQALEDS6HtPwN5y/XpMLJqqWdI
 EigVIOIZQUIsrNTJXT+qYWJ2T4SOeQzX9GjsuenbLy6qOwej4mj//se10v1Ev3zK
 THlwz6pvRlMDtzOoV1zzkPaUdgbiWfuyuC5OTDo+YNtHdWeXNt4gND0CxqWTOs6N
 DouCpHkrRa1DgYtV93hyg0XXJg4gox6P1Efq1gk4tlOpeiWJt188k8RjyBfhAueL
 vjbO2ms3RmJhHa8qvuQdl+xW8rCIW3R//IPHJucsOYFl9dczYPxq/t9PbyOiI/fh
 nBBn/4UAhgR0Rn9mDOJXoMGqXshHZA214T0+2XL8SFVdj/4YiUc3I+kwrJxBGGHk
 BfJwnY/FWTAON0TSkHzG9AuRd29swefobx02EhxbuTyllCcsNCfwXCYPeHLlvtyQ
 jO5WRFjQ0zRA0pz01YkNP3Ib6M0FW2l42o2m96sBPjRE1Crxy6dkHSFFCelL5sON
 aTM2q3cODRXOKC7hoayJvFwsc1a7pQWw2MWKSbrIt/1QPuETJrDvHYi30C7JHPLf
 0IxbI9Zt0WvlKVC+RhiZJSbkJOMap8Jl6tVCS3n2UM09Bj7wjaRGXbuLt0GwAtlI
 lJXUdf3xOsfDg+6XUMir7VkPFUWKa8cKzOvpyWaXgAMUjWO5aNxxkx8QodsgNkWH
 LdyRyeOoFz574tTf/h1W
 =XjpY
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf tools improvements and fixes from Arnaldo Carvalho de Melo:

User visible bug fixes:

  - Fix reading of build-id from vDSO (Ben Hutchings)

  - Fix processing samples for guests, noticed with 'perf kvm',
    but noticeable as well via other tools, such as 'perf top'
    (Ravi Bangoria)

Build infrastructure:

  - Add feature-dump target and FEATURES_DUMP make variable, to
    allow reusing the feature detection results among multiple
    tools/ living codebases, such as perf and lib/bpf (Jiri Olsa)

  - 'make -C tools/perf build-test' improvements, making it more
    paralelizable and allowing building it outside of the source
    tree, using O= (Wang Nan)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-19 08:30:48 +01:00
Jiri Olsa
96b9e70b8e perf build: Introduce FEATURES_DUMP make variable
Introducing FEATURES_DUMP make variable to provide features
detection dump file and bypass the feature detection.

The intention is to use this during build tests to skip
repeated features detection, like:

Get feature dump static build into /tmp/fd file:
  $ make feature-dump FEATURE_DUMP_COPY=/tmp/fd LDFLAGS=-static
    BUILD:   Doing 'make -j4' parallel build

  Auto-detecting system features:
  ...                         dwarf: [ OFF ]

  SNIP

  FEATURE-DUMP file copied into /tmp/fd

Use /tmp/fd to build perf:
  $ make FEATURES_DUMP=/tmp/fd LDFLAGS=-static

  $ file perf
  perf: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for ...

Suggested-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1452830421-77757-7-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-15 16:32:00 -03:00
Jiri Olsa
b8e52be00c perf build: Add feature-dump target
To provide FEATURE-DUMP into $(FEATURE_DUMP_COPY) if defined, with no
further action.

Get feature dump of the current build:
  $ make feature-dump
    BUILD:   Doing 'make -j4' parallel build

  Auto-detecting system features:
  ...                         dwarf: [ on  ]

  FEATURE-DUMP file available in FEATURE-DUMP

Get feature dump static build into /tmp/fd file:

  $ make feature-dump FEATURE_DUMP_COPY=/tmp/fd LDFLAGS=-static
    BUILD:   Doing 'make -j4' parallel build

  Auto-detecting system features:
  ...                         dwarf: [ OFF ]

  SNIP

  FEATURE-DUMP file copied into /tmp/fd

Suggested-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1452830421-77757-6-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-15 16:32:00 -03:00
Wang Nan
c15e758c4b perf build: Pass O option to kernel makefile in build-test
Kernel makefile only follows an 'O' option passed from command line
explicitely. In build-test with 'O' option set, kernel makefile
contaminate kernel source directory. Build test also fail if we don't
create output directory manually.

K_O_OPT is added and passed to kernel makefile if 'O' is passed
to build-test.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1452830421-77757-5-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-15 16:32:00 -03:00
Wang Nan
68824de19a perf build: Test correct path of perf in build-test
If an 'O' is passed to 'make build-test', many 'test -x' and 'test -f'
will fail because perf resides in a different directory. Fix this by
computing PERF_OUT according to 'O' and test correct output files.
For make_kernelsrc and make_kernelsrc_tools, set KBUILD_OUTPUT_DIR
instead because the path is different from others ($(O)/perf vs
 $(O)/tools/perf).

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1452830421-77757-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-15 16:31:59 -03:00
Wang Nan
eb807730c0 perf build: Pass O option to Makefile.perf in build-test
Unlike tools/perf/Makefile, tools/perf/Makefile.perf obey 'O' option
when it is passed through cmdline only, due to code in
tools/scripts/Makefile.include:

 ifneq ($(O),)
 ifeq ($(origin O), command line)
 	...
 	ABSOLUTE_O := $(shell cd $(O) ; pwd)
 	OUTPUT := $(ABSOLUTE_O)/$(if $(subdir),$(subdir)/)
 endif
 endif

This patch passes 'O' to Makefile.perf through cmdline explicitly
to make it follow O variable during build-test.

'make clean' should have identical 'O' option with 'make'. If not,
config-clean may error.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1452830421-77757-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-15 16:31:59 -03:00
Wang Nan
7be43dfb1e perf build: Set parallel making options build-test
'make build-test' is painful because of time consuming. In a full test,
all test cases are built twice with tools/perf/Makefile and
tools/perf/Makefile.perf. 'Makefile' automatically computes parallel
options for make, but 'Makefile.perf' not, so all test cases is built
with one job. It is very slow.

This patch adds '-j' options to Makefile.perf testing. It computes
parallel building options like what tools/perf/Makefile does, and pass
'-j' option to Makefile.perf test.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1452687442-6186-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-15 16:31:59 -03:00
Ben Hutchings
40c4a0f92a perf symbols: Fix reading of build-id from vDSO
We need to use the long name (the filename) when reading the build-id
from a DSO.  Using the short name doesn't work for (at least) vDSOs.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20160113172301.GT28542@decadent.org.uk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-15 16:31:58 -03:00
Ravi Bangoria
3caeaa5627 perf kvm record/report: 'unprocessable sample' error while recording/reporting guest data
While recording guest samples in host using perf kvm record, it will
populate unprocessable sample error, though samples will be recorded
properly. While generating report using perf kvm report, no samples will
be processed and same error will populate. We have seen this behaviour
with upstream perf(4.4-rc3) on x86 and ppc64 hardware.

Reason behind this failure is, when it tries to fetch machine from
rb_tree of machines, it fails. As a part of tracing a bug, we figured
out that this code was incorrectly refactored in commit 54245fdc35
("perf session: Remove wrappers to machines__find").

This patch will change the functionality such that if it can't fetch
machine in first trial, it will create one node of machine and add that to
rb_tree. So next time when it tries to fetch same machine from rb_tree,
it won't fail. Actually it was the case before refactoring of code in
aforementioned commit.

This patch is generated from acme perf/core branch.

Below I've mention an example that demonstrate the behaviour before and
after applying patch.

Before applying patch:
[Note: One needs to run guest before recording data in host]

  ravi@ravi-bangoria:~$ ./perf kvm record -a
  Warning:
  5903 unprocessable samples recorded.
  Do you have a KVM guest running and not using 'perf kvm'?
  [ perf record: Captured and wrote 1.409 MB perf.data.guest (285 samples) ]

  ravi@ravi-bangoria:~$ ./perf kvm report --stdio
  Warning:
  5903 unprocessable samples recorded.
  Do you have a KVM guest running and not using 'perf kvm'?
  # To display the perf.data header info, please use --header/--header-only options.
  #
  # Total Lost Samples: 0
  #
  # Samples: 285  of event 'cycles'
  # Event count (approx.): 88715406
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......
  #

  # (For a higher level overview, try: perf report --sort comm,dso)
  #

After applying patch:

  ravi@ravi-bangoria:~$ ./perf kvm record -a
  [ perf record: Captured and wrote 1.188 MB perf.data.guest (17 samples) ]

  ravi@ravi-bangoria:~$ ./perf kvm report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  # Total Lost Samples: 0
  #
  # Samples: 17  of event 'cycles'
  # Event count (approx.): 700746
  #
  # Overhead  Command  Shared Object     Symbol
  # ........  .......  ................  ......................
  #
      34.19%  :5758    [unknown]         [g] 0xffffffff818682ab
      22.79%  :5758    [unknown]         [g] 0xffffffff812dc7f8
      22.79%  :5758    [unknown]         [g] 0xffffffff818650d0
      14.83%  :5758    [unknown]         [g] 0xffffffff8161a1b6
       2.49%  :5758    [unknown]         [g] 0xffffffff818692bf
       0.48%  :5758    [unknown]         [g] 0xffffffff81869253
       0.05%  :5758    [unknown]         [g] 0xffffffff81869250

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v3.19+
Fixes: 54245fdc35 ("perf session: Remove wrappers to machines__find")
Link: http://lkml.kernel.org/r/1449471302-11283-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-15 16:31:58 -03:00
Linus Torvalds
10a0c0f059 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc changes:
   - fix lguest bug
   - fix /proc/meminfo output on certain configs
   - fix pvclock bug
   - fix reboot on certain iMacs by adding new reboot quirk
   - fix bootup crash
   - fix FPU boot line option parsing
   - add more x86 self-tests
   - small cleanups, documentation improvements, etc"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Remove an unneeded condition in srat_detect_node()
  x86/vdso/pvclock: Protect STABLE check with the seqcount
  x86/mm: Improve switch_mm() barrier comments
  selftests/x86: Test __kernel_sigreturn and __kernel_rt_sigreturn
  x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]
  lguest: Map switcher text R/O
  x86/boot: Hide local labels in verify_cpu()
  x86/fpu: Disable AVX when eagerfpu is off
  x86/fpu: Disable MPX when eagerfpu is off
  x86/fpu: Disable XGETBV1 when no XSAVE
  x86/fpu: Fix early FPU command-line parsing
  x86/mm: Use PAGE_ALIGNED instead of IS_ALIGNED
  selftests/x86: Disable the ldt_gdt_64 test for now
  x86/mm/pat: Make split_page_count() check for empty levels to fix /proc/meminfo output
  x86/boot: Double BOOT_HEAP_SIZE to 64KB
  x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
2016-01-14 11:57:22 -08:00
Linus Torvalds
dcd1bfd50a Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Three clocksource driver fixes"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/vt8500: Increase the minimum delta
  clocksource/drivers/fsl_ftm_timer: Fix CLKSRC_MMIO dependency
  clocksource/drivers: Fix dependencies for !HAS_IOMEM archs
2016-01-14 11:45:56 -08:00