 763122ade7
			
		
	
	
	763122ade7
	
	
	
		
			
			Some Linux symbols (for example __vt_event_wait) are interpreted by the demangler as C++ mangled names, which of course they aren't. Disable kernel symbol demangling by default to avoid this, and allow enabling it with a new option --demangle-kernel for those who wish it. Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Avi Kivity <avi@cloudius-systems.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1410581705-26968-1-git-send-email-avi@cloudius-systems.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
		
			
				
	
	
		
			313 lines
		
	
	
	
		
			10 KiB
			
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			313 lines
		
	
	
	
		
			10 KiB
			
		
	
	
	
		
			Text
		
	
	
	
	
	
| perf-report(1)
 | |
| ==============
 | |
| 
 | |
| NAME
 | |
| ----
 | |
| perf-report - Read perf.data (created by perf record) and display the profile
 | |
| 
 | |
| SYNOPSIS
 | |
| --------
 | |
| [verse]
 | |
| 'perf report' [-i <file> | --input=file]
 | |
| 
 | |
| DESCRIPTION
 | |
| -----------
 | |
| This command displays the performance counter profile information recorded
 | |
| via perf record.
 | |
| 
 | |
| OPTIONS
 | |
| -------
 | |
| -i::
 | |
| --input=::
 | |
|         Input file name. (default: perf.data unless stdin is a fifo)
 | |
| 
 | |
| -v::
 | |
| --verbose::
 | |
|         Be more verbose. (show symbol address, etc)
 | |
| 
 | |
| -n::
 | |
| --show-nr-samples::
 | |
| 	Show the number of samples for each symbol
 | |
| 
 | |
| --showcpuutilization::
 | |
|         Show sample percentage for different cpu modes.
 | |
| 
 | |
| -T::
 | |
| --threads::
 | |
| 	Show per-thread event counters
 | |
| -c::
 | |
| --comms=::
 | |
| 	Only consider symbols in these comms. CSV that understands
 | |
| 	file://filename entries.  This option will affect the percentage of
 | |
| 	the overhead column.  See --percentage for more info.
 | |
| -d::
 | |
| --dsos=::
 | |
| 	Only consider symbols in these dsos. CSV that understands
 | |
| 	file://filename entries.  This option will affect the percentage of
 | |
| 	the overhead column.  See --percentage for more info.
 | |
| -S::
 | |
| --symbols=::
 | |
| 	Only consider these symbols. CSV that understands
 | |
| 	file://filename entries.  This option will affect the percentage of
 | |
| 	the overhead column.  See --percentage for more info.
 | |
| 
 | |
| --symbol-filter=::
 | |
| 	Only show symbols that match (partially) with this filter.
 | |
| 
 | |
| -U::
 | |
| --hide-unresolved::
 | |
|         Only display entries resolved to a symbol.
 | |
| 
 | |
| -s::
 | |
| --sort=::
 | |
| 	Sort histogram entries by given key(s) - multiple keys can be specified
 | |
| 	in CSV format.  Following sort keys are available:
 | |
| 	pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight.
 | |
| 
 | |
| 	Each key has following meaning:
 | |
| 
 | |
| 	- comm: command (name) of the task which can be read via /proc/<pid>/comm
 | |
| 	- pid: command and tid of the task
 | |
| 	- dso: name of library or module executed at the time of sample
 | |
| 	- symbol: name of function executed at the time of sample
 | |
| 	- parent: name of function matched to the parent regex filter. Unmatched
 | |
| 	entries are displayed as "[other]".
 | |
| 	- cpu: cpu number the task ran at the time of sample
 | |
| 	- srcline: filename and line number executed at the time of sample.  The
 | |
| 	DWARF debugging info must be provided.
 | |
| 	- weight: Event specific weight, e.g. memory latency or transaction
 | |
| 	abort cost. This is the global weight.
 | |
| 	- local_weight: Local weight version of the weight above.
 | |
| 	- transaction: Transaction abort flags.
 | |
| 	- overhead: Overhead percentage of sample
 | |
| 	- overhead_sys: Overhead percentage of sample running in system mode
 | |
| 	- overhead_us: Overhead percentage of sample running in user mode
 | |
| 	- overhead_guest_sys: Overhead percentage of sample running in system mode
 | |
| 	on guest machine
 | |
| 	- overhead_guest_us: Overhead percentage of sample running in user mode on
 | |
| 	guest machine
 | |
| 	- sample: Number of sample
 | |
| 	- period: Raw number of event count of sample
 | |
| 
 | |
| 	By default, comm, dso and symbol keys are used.
 | |
| 	(i.e. --sort comm,dso,symbol)
 | |
| 
 | |
| 	If --branch-stack option is used, following sort keys are also
 | |
| 	available:
 | |
| 	dso_from, dso_to, symbol_from, symbol_to, mispredict.
 | |
| 
 | |
| 	- dso_from: name of library or module branched from
 | |
| 	- dso_to: name of library or module branched to
 | |
| 	- symbol_from: name of function branched from
 | |
| 	- symbol_to: name of function branched to
 | |
| 	- mispredict: "N" for predicted branch, "Y" for mispredicted branch
 | |
| 	- in_tx: branch in TSX transaction
 | |
| 	- abort: TSX transaction abort.
 | |
| 
 | |
| 	And default sort keys are changed to comm, dso_from, symbol_from, dso_to
 | |
| 	and symbol_to, see '--branch-stack'.
 | |
| 
 | |
| -F::
 | |
| --fields=::
 | |
| 	Specify output field - multiple keys can be specified in CSV format.
 | |
| 	Following fields are available:
 | |
| 	overhead, overhead_sys, overhead_us, overhead_children, sample and period.
 | |
| 	Also it can contain any sort key(s).
 | |
| 
 | |
| 	By default, every sort keys not specified in -F will be appended
 | |
| 	automatically.
 | |
| 
 | |
| 	If --mem-mode option is used, following sort keys are also available
 | |
| 	(incompatible with --branch-stack):
 | |
| 	symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
 | |
| 
 | |
| 	- symbol_daddr: name of data symbol being executed on at the time of sample
 | |
| 	- dso_daddr: name of library or module containing the data being executed
 | |
| 	on at the time of sample
 | |
| 	- locked: whether the bus was locked at the time of sample
 | |
| 	- tlb: type of tlb access for the data at the time of sample
 | |
| 	- mem: type of memory access for the data at the time of sample
 | |
| 	- snoop: type of snoop (if any) for the data at the time of sample
 | |
| 	- dcacheline: the cacheline the data address is on at the time of sample
 | |
| 
 | |
| 	And default sort keys are changed to local_weight, mem, sym, dso,
 | |
| 	symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.
 | |
| 
 | |
| -p::
 | |
| --parent=<regex>::
 | |
|         A regex filter to identify parent. The parent is a caller of this
 | |
| 	function and searched through the callchain, thus it requires callchain
 | |
| 	information recorded. The pattern is in the exteneded regex format and
 | |
| 	defaults to "\^sys_|^do_page_fault", see '--sort parent'.
 | |
| 
 | |
| -x::
 | |
| --exclude-other::
 | |
|         Only display entries with parent-match.
 | |
| 
 | |
| -w::
 | |
| --column-widths=<width[,width...]>::
 | |
| 	Force each column width to the provided list, for large terminal
 | |
| 	readability.  0 means no limit (default behavior).
 | |
| 
 | |
| -t::
 | |
| --field-separator=::
 | |
| 	Use a special separator character and don't pad with spaces, replacing
 | |
| 	all occurrences of this separator in symbol names (and other output)
 | |
| 	with a '.' character, that thus it's the only non valid separator.
 | |
| 
 | |
| -D::
 | |
| --dump-raw-trace::
 | |
|         Dump raw trace in ASCII.
 | |
| 
 | |
| -g [type,min[,limit],order[,key]]::
 | |
| --call-graph::
 | |
|         Display call chains using type, min percent threshold, optional print
 | |
| 	limit and order.
 | |
| 	type can be either:
 | |
| 	- flat: single column, linear exposure of call chains.
 | |
| 	- graph: use a graph tree, displaying absolute overhead rates.
 | |
| 	- fractal: like graph, but displays relative rates. Each branch of
 | |
| 		 the tree is considered as a new profiled object. +
 | |
| 
 | |
| 	order can be either:
 | |
| 	- callee: callee based call graph.
 | |
| 	- caller: inverted caller based call graph.
 | |
| 
 | |
| 	key can be:
 | |
| 	- function: compare on functions
 | |
| 	- address: compare on individual code addresses
 | |
| 
 | |
| 	Default: fractal,0.5,callee,function.
 | |
| 
 | |
| --children::
 | |
| 	Accumulate callchain of children to parent entry so that then can
 | |
| 	show up in the output.  The output will have a new "Children" column
 | |
| 	and will be sorted on the data.  It requires callchains are recorded.
 | |
| 
 | |
| --max-stack::
 | |
| 	Set the stack depth limit when parsing the callchain, anything
 | |
| 	beyond the specified depth will be ignored. This is a trade-off
 | |
| 	between information loss and faster processing especially for
 | |
| 	workloads that can have a very long callchain stack.
 | |
| 
 | |
| 	Default: 127
 | |
| 
 | |
| -G::
 | |
| --inverted::
 | |
|         alias for inverted caller based call graph.
 | |
| 
 | |
| --ignore-callees=<regex>::
 | |
|         Ignore callees of the function(s) matching the given regex.
 | |
|         This has the effect of collecting the callers of each such
 | |
|         function into one place in the call-graph tree.
 | |
| 
 | |
| --pretty=<key>::
 | |
|         Pretty printing style.  key: normal, raw
 | |
| 
 | |
| --stdio:: Use the stdio interface.
 | |
| 
 | |
| --tui:: Use the TUI interface, that is integrated with annotate and allows
 | |
|         zooming into DSOs or threads, among other features. Use of --tui
 | |
| 	requires a tty, if one is not present, as when piping to other
 | |
| 	commands, the stdio interface is used.
 | |
| 
 | |
| --gtk:: Use the GTK2 interface.
 | |
| 
 | |
| -k::
 | |
| --vmlinux=<file>::
 | |
|         vmlinux pathname
 | |
| 
 | |
| --kallsyms=<file>::
 | |
|         kallsyms pathname
 | |
| 
 | |
| -m::
 | |
| --modules::
 | |
|         Load module symbols. WARNING: This should only be used with -k and
 | |
|         a LIVE kernel.
 | |
| 
 | |
| -f::
 | |
| --force::
 | |
|         Don't complain, do it.
 | |
| 
 | |
| --symfs=<directory>::
 | |
|         Look for files with symbols relative to this directory.
 | |
| 
 | |
| -C::
 | |
| --cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
 | |
| 	be provided as a comma-separated list with no space: 0,1. Ranges of
 | |
| 	CPUs are specified with -: 0-2. Default is to report samples on all
 | |
| 	CPUs.
 | |
| 
 | |
| -M::
 | |
| --disassembler-style=:: Set disassembler style for objdump.
 | |
| 
 | |
| --source::
 | |
| 	Interleave source code with assembly code. Enabled by default,
 | |
| 	disable with --no-source.
 | |
| 
 | |
| --asm-raw::
 | |
| 	Show raw instruction encoding of assembly instructions.
 | |
| 
 | |
| --show-total-period:: Show a column with the sum of periods.
 | |
| 
 | |
| -I::
 | |
| --show-info::
 | |
| 	Display extended information about the perf.data file. This adds
 | |
| 	information which may be very large and thus may clutter the display.
 | |
| 	It currently includes: cpu and numa topology of the host system.
 | |
| 
 | |
| -b::
 | |
| --branch-stack::
 | |
| 	Use the addresses of sampled taken branches instead of the instruction
 | |
| 	address to build the histograms. To generate meaningful output, the
 | |
| 	perf.data file must have been obtained using perf record -b or
 | |
| 	perf record --branch-filter xxx where xxx is a branch filter option.
 | |
| 	perf report is able to auto-detect whether a perf.data file contains
 | |
| 	branch stacks and it will automatically switch to the branch view mode,
 | |
| 	unless --no-branch-stack is used.
 | |
| 
 | |
| --objdump=<path>::
 | |
|         Path to objdump binary.
 | |
| 
 | |
| --group::
 | |
| 	Show event group information together.
 | |
| 
 | |
| --demangle::
 | |
| 	Demangle symbol names to human readable form. It's enabled by default,
 | |
| 	disable with --no-demangle.
 | |
| 
 | |
| --demangle-kernel::
 | |
| 	Demangle kernel symbol names to human readable form (for C++ kernels).
 | |
| 
 | |
| --mem-mode::
 | |
| 	Use the data addresses of samples in addition to instruction addresses
 | |
| 	to build the histograms.  To generate meaningful output, the perf.data
 | |
| 	file must have been obtained using perf record -d -W and using a
 | |
| 	special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See
 | |
| 	'perf mem' for simpler access.
 | |
| 
 | |
| --percent-limit::
 | |
| 	Do not show entries which have an overhead under that percent.
 | |
| 	(Default: 0).
 | |
| 
 | |
| --percentage::
 | |
| 	Determine how to display the overhead percentage of filtered entries.
 | |
| 	Filters can be applied by --comms, --dsos and/or --symbols options and
 | |
| 	Zoom operations on the TUI (thread, dso, etc).
 | |
| 
 | |
| 	"relative" means it's relative to filtered entries only so that the
 | |
| 	sum of shown entries will be always 100%.  "absolute" means it retains
 | |
| 	the original value before and after the filter is applied.
 | |
| 
 | |
| --header::
 | |
| 	Show header information in the perf.data file.  This includes
 | |
| 	various information like hostname, OS and perf version, cpu/mem
 | |
| 	info, perf command line, event list and so on.  Currently only
 | |
| 	--stdio output supports this feature.
 | |
| 
 | |
| --header-only::
 | |
| 	Show only perf.data header (forces --stdio).
 | |
| 
 | |
| SEE ALSO
 | |
| --------
 | |
| linkperf:perf-stat[1], linkperf:perf-annotate[1]
 |