MEM_CGROUP_STAT_NSTATS is just a delimiter for cgroup1 statistics, not
an actual array entry. Reuse it for the first cgroup2 stat entry, like
in the event array.
Fixes: b2807f07f4 ("mm: memcontrol: add "sock" to cgroup2 memory.stat")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reduced testcase:
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <numaif.h>
#define SIZE 0x2000
int main()
{
int fd;
void *p;
fd = open("/dev/sg0", O_RDWR);
p = mmap(NULL, SIZE, PROT_EXEC, MAP_PRIVATE | MAP_LOCKED, fd, 0);
mbind(p, SIZE, 0, NULL, 0, MPOL_MF_MOVE);
return 0;
}
We shouldn't try to migrate pages in sg VMA as we don't have a way to
update Sg_scatter_hold::pages accordingly from mm core.
Let's mark the VMA as VM_IO to indicate to mm core that the VMA is not
migratable.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Shiraz Hashim <shashim@codeaurora.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit b76437579d ("procfs: mark thread stack correctly in
proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps.
Finding the task of a stack VMA requires walking the entire thread list,
turning this into quadratic behavior: a thousand threads means a
thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a
million combinations.
The cost is not in proportion to the usefulness as described in the
patch.
Drop the [stack:TID] annotation to make /proc/<pid>/maps (and
/proc/<pid>/numa_maps) usable again for higher thread counts.
The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as
identifying the stack VMA there is an O(1) operation.
Siddesh said:
"The end users needed a way to identify thread stacks programmatically and
there wasn't a way to do that. I'm afraid I no longer remember (or have
access to the resources that would aid my memory since I changed
employers) the details of their requirement. However, I did do this on my
own time because I thought it was an interesting project for me and nobody
really gave any feedback then as to its utility, so as far as I am
concerned you could roll back the main thread maps information since the
information is available in the thread-specific files"
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When working with hugetlbfs ptes (which are actually pmds) is not valid to
directly use pte functions like pte_present() because the hardware bit
layout of pmds and ptes can be different. This is the case on s390.
Therefore we have to convert the hugetlbfs ptes first into a valid pte
encoding with huge_ptep_get().
Currently the /proc/<pid>/numa_maps code uses hugetlbfs ptes without
huge_ptep_get(). On s390 this leads to the following two problems:
1) The pte_present() function returns false (instead of true) for
PROT_NONE hugetlb ptes. Therefore PROT_NONE vmas are missing
completely in the "numa_maps" output.
2) The pte_dirty() function always returns false for all hugetlb ptes.
Therefore these pages are reported as "mapped=xxx" instead of
"dirty=xxx".
Therefore use huge_ptep_get() to correctly convert the hugetlb ptes.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update/unify my contact info. The old email address will no longer work
soon.
Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o2hb_region_release currently doesn't free o2hb_debug_buf
hr_db_elapsed_time and hr_db_pinned malloced in o2hb_debug_create. Also
we should call debugfs_remove before freeing its data, to prevent the risk
accessing debugfs rightly after its data has been freed.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recently added commit 564b026fbd ("string_helpers: fix precision loss
for some inputs") fixed precision issues for string_get_size() and broke
tests.
Fix and improve them: test both STRING_UNITS_2 and STRING_UNITS_10 at a
time, better failure reporting, test small an huge values.
Fixes: 564b026fbd ("string_helpers: fix precision loss for some inputs")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we have a lot of pages in queue to be split, deferred_split_scan()
can spend unreasonable amount of time under spinlock with disabled
interrupts.
Let's cap number of pages to split on scan by sc->nr_to_scan.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I've got meaning of shrinker::count_objects() wrong: it should return
number of potentially freeable objects, which is not necessary correlate
with freeable memory.
Returning 256 per THP in queue is not reasonable:
shrinker::scan_objects() never called with nr_to_scan > 128 in my setup.
Let's return 1 per THP and correct scan_object accordingly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add 'L' key action to change the percent limit applied to both of hist
entries and callchains.
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1454508683-5735-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --percent-limit option was changed to be applied to callchains as
well as to hist entries recently, but it missed to update the doc.
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1454508683-5735-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The description of the memory sort key (used by --mem-mode) was
misplaced. Move it under the --sort option so that it can be referenced
properly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1454508683-5735-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With the hist object having the perf_hpp_list we can now iterate sort
format entries based in the hists object. Adding
hists__for_each_sort_list macro to do that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-27-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With the hist object having the perf_hpp_list we can now iterate output
format entries based in the hists object. Adding hists__for_each_format
macro to do that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-26-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding hpp_list into struct hists object.
Initializing struct hists_evsel hists object to carry global
perf_hpp_list list.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-25-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding struct perf_hpp_list argument to following helper functions:
void perf_hpp__setup_output_field(struct perf_hpp_list *list);
void perf_hpp__reset_output_field(struct perf_hpp_list *list);
void perf_hpp__append_sort_keys(struct perf_hpp_list *list);
so they could be used on hists's hpp_list.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-24-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Passing perf_hpp_list all the way through setup_output_list so the
output entry could be added on the arbitrary list.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-19-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding 2 perf_hpp_list register helpers:
perf_hpp_list__column_register()
perf_hpp_list__register_sort_field()
to be called within existing helpers:
perf_hpp__column_register()
perf_hpp__register_sort_field()
to register format entries within global perf_hpp_list object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-17-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introducing perf_hpp_list__init function to have an easy way to
initialize perf_hpp_list struct.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-16-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Gather output and sort lists under struct perf_hpp_list, so we could
have multiple instancies of sort/output format entries.
Replacing current perf_hpp__list and perf_hpp__sort_list lists with
single perf_hpp_list instance.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-15-git-send-email-jolsa@kernel.org
[ Renamed fields to .{fields,sorts} as suggested by Namhyung and acked by Jiri ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Separating output fields parsing into setup_output_list function, so
it's separated from field_order string setup and could be reused later
in following patches.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-14-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Separating sort fields parsing into setup_sort_list function, so it's
separated from sort_order string setup and could be reused later in
following patches.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-13-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With multiple list holding format entries, we need the support properly
releasing format output/sort fields.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Those functions are no longer needed. They operate over perf_hpp__format
array which is now used only as template for dynamic entries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-11-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we use static output fields, because we have single global
list of all sort/output fields.
We will add hists specific sort and output lists in following patches,
so we need all format entries to be dynamically allocated. Adding
support to allocate output sort field.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-10-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The ui initialization changes hpp format callbacks, based on the used
browser. Thus we need this init being processed before setup_sorting.
Replica of a patch by Jiri for 'perf report'.
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-9-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The ui initialization changes hpp format callbacks, based on the used
browser. Thus we need this init being processed before setup_sorting.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-9-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that we have the 'equal' method implemented for hpp format entries
we can ease up the logic in the following functions and make them
generic wrt comparing format entries:
perf_hpp__setup_output_field
perf_hpp__append_sort_keys
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-8-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding 'hpp__equal' callback function to compare hpp output format
entries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To easily compare format entries and make it available for all kinds of
format entries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We are going to add dynamic hpp format fields, so we need to make the
'len' change for the format itself, not in the perf_hpp__format
template.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently there's no way of comparing hpp format entries, which is
needed in following patches.
Adding _idx fields into struct perf_hpp_fmt to recognize and be able to
compare hpp format entries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding evsel specific function to sort hists_evsel based hists. The
hists__output_resort can be now used to sort common hists object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently hists__output_resort() depends on hists based on hists_evsel
struct, but we need to be able to sort common hists as well.
Cutting out the sorting base sorting code into output_resort
function, so it can be reused in following patch.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
User visible:
- Make --percent-limit apply to callchains also and fix some bugs
related to --percent-limit (Namhyung Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWr9ItAAoJENZQFvNTUqpAmDcP/A0DqDIsttDMrJtsTb+DeMEt
HaOjOZT9usZW8khAkEsk62i6lg/ewVqajalzFkfByG0gAJ5u/Xi1YQo64MzXpLjV
DZ0Nlqujhb6PowO9eRPra6UAEiq+88Gzn+y+XzYqVsPVLAK/d8Ck9ALWo33gIBhc
uq32fpp79zrCgfq8pOhvWMaMmRqmpyUiwCjiFCgUs1FD2NjdwGWSfH6XqxVdojVv
/s1agYu+E9WJ74Df2upoIUxiFcG4+aT6Y4li3N1XaATrWoiqrkSyp1uVwOZ9H4i0
9OyIhDzR0aar8z0aVJJmccqfGpC9LLWaf5YkYqK6A8vI0x5FyCu4TieeKCMJ5k7S
1AO2E6FGsQ/vOJx/LvVGrEAmUog/kZ8q4OmudpmGBcHJ9PGHpnUg/6uAij2Nwyxo
68oL4kgZFTrC5Cxdr1W+8Z/4Z9piNzArs2SSr5PfHWyzAB35WEKXCwoDy1uQ2q7d
XIUa+6Gvldc5iRjrulY8YCqwhltfx9LiCWdOYmEpS2BGIeWzTQIinYNzVwCTP7Av
tsLKaGx4/O5iZf1yuMaOXx9nXK6N87gb9il8sSQD2AZVPIkBTkE5mKYycqXblqUV
wFH4oZ4QKTPnbwV2gOHsjOKABhsm6Jop8vpgKZtF3May5K9lNx6Ivq4KyqA+uSht
BpYuVeCKwKHyT2uwSQf4
=3p/S
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-3' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core callchain fixes and improvements from Arnaldo Carvalho de Melo <acme@redhat.com:
User visible changes:
- Make --percent-limit apply to callchains also and fix some bugs
related to --percent-limit (Namhyung Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Port 'perf kvm stat' to PowerPC (Hemant Kumar)
Infrastructure:
- Use the 'feature-dump' target to do the feature checks just once and then
add code to reuse that in the tests/make makefile, speeding up the
'make -C tools/perf build-test' target (Wang Nan)
- Reduce the number of tests the 'build-test' target do to those that don't
pollute the source tree (Arnaldo Carvalho de Melo)
- Improve the output of the build tests a bit by aligning the name of the
tests, more can be done to filter out uninteresting info in the output
(Arnaldo Carvalho de Melo)
- Add perf_evlist pointer to *info_priv_size(), more prep work for
supporting the coresight architecture (Mathieu Poirier)
- Improve the 'perf test bp_signal' test (Wang Nan)
- Check environment before starting the BPF 'perf test', so that we can just
'Skip' older kernels instead of 'FAIL'ing them (Wang Nan)
- Fix cpumode of synthesized buildid event (Wang Nan)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWq9VUAAoJENZQFvNTUqpAi0QP/2i7itD/P9wkGLsPb+HbNlX+
umTxbhdKkyLc9WI8hGfdXXtdHcdkHUAmuZ/DzMbOcnGUE0mJdL6dphslsm1VFslP
Q9sAj43BjWddEKfka1ylos1u/nDhpdpRX7bkRaepA9Zl0P0BSPXv+S28GO3jttxX
uadXN9K7Amsa8tibKicxgLTUhZH05lmhPO00xGHuhQ6EQHcaw8VDYUlA+Wrh+NIa
jIVnRE5q/hBwOyFQR/1gal8N5w2vO0vCglQmGQTEDjgQVMf/cSZChUlVqtxcDxcu
FIDE42+jAnbESmVkBHq2n8ZvNxHOVlG6hTqZOqeiqs+tyfw7fYnGf+tkFPgBIEXP
hB/hwgCJVBbbYo5hzT12eBz7UeWwn1ljqTpTnBrCaOl05MwvN4bMAMFVBXPLQHtm
47AsyaOXEli9RaRwgdcYGVUhqIPTa2Ql2vPRb1PmQ3ugBqqLyUpYOox8WUYQv2g9
sd61KMoXxUiuNsoq0ZXXkjWBeEBz2joRQYrlBQ0tZR8m06UA8FXLUXFopAUZKHGh
7w8BTXRRCc9lEm/pWfHjVykObRlHew0qcDihybtMVsNGpUQzqKh7A8b2DmMvmRrJ
BmnUBQA8kFiE4BJSdOdqwH8PpDRYpTCg0a6cyK4RDlm7isX2ho40edstspEO1N4n
BUG1zE5SIPC1o1MSFxBn
=CFBX
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-2' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf tooling changes from Arnaldo Carvalho de Melo:
New features:
- Port 'perf kvm stat' to PowerPC (Hemant Kumar)
Infrastructure changes:
- Use the 'feature-dump' target to do the feature checks just once and then
add code to reuse that in the tests/make makefile, speeding up the
'make -C tools/perf build-test' target (Wang Nan)
- Reduce the number of tests the 'build-test' target do to those that don't
pollute the source tree (Arnaldo Carvalho de Melo)
- Improve the output of the build tests a bit by aligning the name of the
tests, more can be done to filter out uninteresting info in the output
(Arnaldo Carvalho de Melo)
- Add perf_evlist pointer to *info_priv_size(), more prep work for
supporting the coresight architecture (Mathieu Poirier)
- Improve the 'perf test bp_signal' test (Wang Nan)
- Check environment before starting the BPF 'perf test', so that we can just
'Skip' older kernels instead of 'FAIL'ing them (Wang Nan)
- Fix cpumode of synthesized buildid event (Wang Nan)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
User visible:
- Rename the "colors.code" ~/.perfconfig variable to "colors.jump_arrows",
as it controls just the that UI element in the annotate browser (Taeung Song)
- Avoid trying to read ELF symtabs from device files, noticed while doing
memory profiling work (Jiri Olsa)
- Improve context detection when offering options in the hists browser,
i.e. some options don't make sense when the browser is not working with
a perf.data file ('perf top' mode), only in 'perf report' mode, like
scripting (Namhyung Kim)
Infrastructure:
- Elliminate duplication in the hists browser filter functions, getting the
common part into a function that receives callbacks for filtering by
DSO, thread, etc (Namhyung Kim)
- Fix misleadingly indented assignment, found using
gcc6 -Wmisleading-indentation (Markus Trippelsdorf)
- Handle LLVM relocation oddities in libbpf, introducing a 'perf test' that
detects such problems and then fixing the problem, so that the test now
passes (Wang Nan)
- More improvements to the build infrastructure to allow reusing the
feature detection facilities (Wang Nan)
- Auto initialize the globals needed by cpu__max_{cpu,node}() routines
(Arnaldo Carvalho de Melo)
Documentation:
- Document the perf sysctls in Documentation/sysctl/kernel.txt (Ben Hutchings)
- Document a bunch more ~/.perfconfig knobs (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWp8VgAAoJENZQFvNTUqpA8gAQAJY4pLDDeK6rZAqAY5fD//JJ
aETuF0icXErkboef5usFk0MGVzK8HOWFOD5YnVAPXsGRqTHZ9ix3xw5sBFDaZKrP
zagySidfHxrPDvtSW6doCjtg571dFaEHWUL48kT8ZpH9vwGDs42Gl/hjEY2P91zK
uNktNoHvbHMUOxoMIp9zyCcV5WEWTog8RwCp53QrxxNrLYIT40wpADQIvuKNgqEP
wIQyC2pgLv9ra27fXThauDes+a/TWLfURtxoeGgiDaIFmOi2t5VeN8D+DxXskKIB
GtYF7Wxk5U+gELsAo5cZKS5Hyf13LqmwL4Jy/Th5jWaObyNXU2ZwnB3zXxZ3Dmvu
keiOY8EmoOoKqOhjUVfsdvsVy0tNhObIJYhlqyOfQg+EqizR0PVlkDxWvODEKkkA
T+dWXm183aXwCsHKM0EhAPgsVAJ/U9+lQjHro/lPq/i54oOogL/aBsVvUjNKo6Od
m6q2ezgFZRHuPMLmOYhJaxtpvOirQkxORZZx2wgzgs5AsJly+ydoR3ETdhAD76Sg
QGSKdTCziDA8KM0Vul6mjoqNlASpUM9cN6uLlv4c26pmf1krwleILMFqzaoYV3iE
3y/ebiRyj2luwKSXELNjcs/7GzCfN3h8sjP6AQ+q0fuWH3zU6+J9oKi+KevYBr8J
fFEX6MNxdxOY92mXDPZa
=cuZc
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Rename the "colors.code" ~/.perfconfig variable to "colors.jump_arrows",
as it controls just the that UI element in the annotate browser (Taeung Song)
- Avoid trying to read ELF symtabs from device files, noticed while doing
memory profiling work (Jiri Olsa)
- Improve context detection when offering options in the hists browser,
i.e. some options don't make sense when the browser is not working with
a perf.data file ('perf top' mode), only in 'perf report' mode, like
scripting (Namhyung Kim)
Infrastructure changes:
- Elliminate duplication in the hists browser filter functions, getting the
common part into a function that receives callbacks for filtering by
DSO, thread, etc. (Namhyung Kim)
- Fix misleadingly indented assignment, found using
gcc6 -Wmisleading-indentation (Markus Trippelsdorf)
- Handle LLVM relocation oddities in libbpf, introducing a 'perf test' that
detects such problems and then fixing the problem, so that the test now
passes (Wang Nan)
- More improvements to the build infrastructure to allow reusing the
feature detection facilities (Wang Nan)
- Auto initialize the globals needed by cpu__max_{cpu,node}() routines
(Arnaldo Carvalho de Melo)
Documentation changes:
- Document the perf sysctls in Documentation/sysctl/kernel.txt (Ben Hutchings)
- Document a bunch more ~/.perfconfig knobs (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Use the more current logging style pr_<level>(...) instead of the old
printk(KERN_<LEVEL> ...).
- Convert pr_warning() to pr_warn().
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454384702-21707-1-git-send-email-slaoub@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'perf probe' through debuginfo__find_probes() in util/probe-finder.c
checks for the functions' frame descriptions in either .eh_frame section
of an ELF or the .debug_frame.
The check is based on whether either one of these sections is present.
Depending on distro, toolchain defaults, architetcutre, build flags,
etc., CFI might be found in either .eh_frame and/or .debug_frame.
Sometimes, it may happen that, .eh_frame, even if present, may not be
complete and may miss some descriptions.
Therefore, to be sure, to find the CFI covering an address we will
always have to investigate both if available.
For e.g., in powerpc, this may happen:
$ gcc -g bin.c -o bin
$ objdump --dwarf ./bin
<1><145>: Abbrev Number: 7 (DW_TAG_subprogram)
<146> DW_AT_external : 1
<146> DW_AT_name : (indirect string, offset: 0x9e): main
<14a> DW_AT_decl_file : 1
<14b> DW_AT_decl_line : 39
<14c> DW_AT_prototyped : 1
<14c> DW_AT_type : <0x57>
<150> DW_AT_low_pc : 0x100007b8
If the .eh_frame and .debug_frame are checked for the same binary, we
will find that, .eh_frame (although present) doesn't contain a
description for "main" function.
But, .debug_frame has a description:
000000d8 00000024 00000000 FDE cie=00000000 pc=100007b8..10000838
DW_CFA_advance_loc: 16 to 100007c8
DW_CFA_def_cfa_offset: 144
DW_CFA_offset_extended_sf: r65 at cfa+16
...
Due to this (since, perf checks whether .eh_frame is present and goes on
searching for that address inside that frame), perf is unable to process
the probes:
# perf probe -x ./bin main
Failed to get call frame on 0x100007b8
Error: Failed to add events.
To avoid this issue, we need to check both the sections (.eh_frame and
.debug_frame), which is done in this patch.
Note that, we can always force everything into both .eh_frame and
.debug_frame by:
$ gcc bin.c -fasynchronous-unwind-tables -fno-dwarf2-cfi-asm -g -o bin
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Mark Wielaard <mjw@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1454426806-13974-1-git-send-email-hemant@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
intel_pt_process_auxtrace_info() creates a pt->unknown_thread thread
that eventually needs to be freed by the last thread__put() on it, when
its refcount hits zero, which may happen in
intel_pt_process_auxtrace_info() error handling path and triggers the
following segfault, which would happen as well at intel_pt_free, when
tools using this intel_pt codebase frees up resources:
# perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
0 a anaconda-ks.cfg bin perf.data perf.data.old perf-f23-bringup.todo
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.217 MB perf.data ]
#
# perf script -F event,comm,pid,tid,time,addr,ip,sym,dso,iregs
Samples for 'instructions:u' event do not have IREGS attribute set. Cannot print 'iregs' field.
intel_pt_synth_events: failed to synthesize 'instructions' event type
Segmentation fault (core dumped)
#
The problem is: there's a union in 'struct thread' combines a list_head
and a rb_node. The standard life cycle of a thread is: init rb_node in
the constructor, insert it into machine->threads rbtree using rb_node,
move it to machine->dead_threads using list_head, clean in the last
thread__put: list_del_init(&thread->node).
In the above command, it clean a thread before adding it into list,
causes the above segfault.
Since pt->unknown_thread will never live in an rbtree, initialize its
list node so that when list_del_init() is done on it we don't segfault.
After this patch:
# perf script -F event,comm,pid,tid,time,addr,ip,sym,dso,iregs
Samples for 'instructions:u' event do not have IREGS attribute set. Cannot print 'iregs' field.
intel_pt_synth_events: failed to synthesize 'instructions' event type
0x248 [0x88]: failed to process type: 70
#
Reported-by: Tong Zhang <ztong@vt.edu>
Reported-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lkml.kernel.org/r/1454296865-19749-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull networking fixes from David Miller:
"This looks like a lot but it's a mixture of regression fixes as well
as fixes for longer standing issues.
1) Fix on-channel cancellation in mac80211, from Johannes Berg.
2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
module, from Eric Dumazet.
3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
Dumazet.
4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
bound, from Craig Gallek.
5) GRO key comparisons don't take lightweight tunnels into account,
from Jesse Gross.
6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
Dumazet.
7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
register them, otherwise the NEWLINK netlink message is missing
the proper attributes. From Thadeu Lima de Souza Cascardo.
8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
Schimmel
9) Handle fragments properly in ipv4 easly socket demux, from Eric
Dumazet.
10) Don't ignore the ifindex key specifier on ipv6 output route
lookups, from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
tcp: avoid cwnd undo after receiving ECN
irda: fix a potential use-after-free in ircomm_param_request
net: tg3: avoid uninitialized variable warning
net: nb8800: avoid uninitialized variable warning
net: vxge: avoid unused function warnings
net: bgmac: clarify CONFIG_BCMA dependency
net: hp100: remove unnecessary #ifdefs
net: davinci_cpdma: use dma_addr_t for DMA address
ipv6/udp: use sticky pktinfo egress ifindex on connect()
ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
netlink: not trim skb for mmaped socket when dump
vxlan: fix a out of bounds access in __vxlan_find_mac
net: dsa: mv88e6xxx: fix port VLAN maps
fib_trie: Fix shift by 32 in fib_table_lookup
net: moxart: use correct accessors for DMA memory
ipv4: ipconfig: avoid unused ic_proto_used symbol
bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
bnxt_en: Ring free response from close path should use completion ring
net_sched: drr: check for NULL pointer in drr_dequeue
...
Pull libnvdimm fixes from Dan Williams:
"1/ Fixes to the libnvdimm 'pfn' device that establishes a reserved
area for storing a struct page array.
2/ Fixes for dax operations on a raw block device to prevent pagecache
collisions with dax mappings.
3/ A fix for pfn_t usage in vm_insert_mixed that lead to a null
pointer de-reference.
These have received build success notification from the kbuild robot
across 153 configs and pass the latest ndctl tests"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
phys_to_pfn_t: use phys_addr_t
mm: fix pfn_t to page conversion in vm_insert_mixed
block: use DAX for partition table reads
block: revert runtime dax control of the raw block device
fs, block: force direct-I/O for dax-enabled block devices
devm_memremap_pages: fix vmem_altmap lifetime + alignment handling
libnvdimm, pfn: fix restoring memmap location
libnvdimm: fix mode determination for e820 devices
When all callchains of a hist entry is percent-limited, do not add a
blank line at the end. It makes the entry look like it doesn't have
callchains.
Reported-and-Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20160128122454.GA27446@danjae.kornet
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When there's only a single callchain, perf doesn't print its percentage
in front of the symbols. This is because it assumes that the percentage
is same as parents. But if a percent limit is applied, it's possible
that there are actually a couple of child nodes but only one of them is
shown. In this case it should display the percent to prevent
misunderstanding of its percentage is same as the parent's.
For example, let's see the following callchain.
$ perf report --no-children --percent-limit 0.01 --tui
...
- 0.06% sleep [kernel.vmlinux] [k] kmem_cache_alloc_trace
kmem_cache_alloc_trace
- perf_event_mmap
- 0.04% mmap_region
do_mmap_pgoff
- vm_mmap_pgoff
+ 0.02% sys_mmap_pgoff
+ 0.02% vm_mmap
+ 0.02% mprotect_fixup
Current code omits the percent if 'mmap_region' becomes the only node
when percent limit is set to 0.03%, its percent is not 0.06% but users
will assume it incorrectly.
Before:
$ perf report --no-children --percent-limit 0.03 --tui
...
0.06% sleep [kernel.vmlinux] [k] kmem_cache_alloc_trace
kmem_cache_alloc_trace
- perf_event_mmap
- mmap_region
do_mmap_pgoff
vm_mmap_pgoff
After:
$ perf report --no-children --percent-limit 0.03 --tui
...
0.06% sleep [kernel.vmlinux] [k] kmem_cache_alloc_trace
kmem_cache_alloc_trace
- perf_event_mmap
- 0.04% mmap_region
do_mmap_pgoff
vm_mmap_pgoff
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-10-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>