- tracepoint_error() can receive e=NULL, robustify it, fixes a problem noticed
with a very specific combination: Machine with Intel PT (e.g. Broadwell),
kernel with no perf_event_attr.context_switch feature (e.g. 4.2) and unreadable
tracefs (for instance !root users), making the fallback from
perf_event_attr.context_switch to the sched:sched_switch tracepoint to fail
reading its info from tracefs, fix it. (Adrian Hunter)
- Fix segfault in intel pt, by making it follow the 'struct thread' lifetime cycle
checking expectations, noticed for instance, when processing perf.data files with
Intel PT data using 'perf script' and when exiting 'perf report' (Adrian Hunter)
- Fix CFI usage from .eh_frame and .debug_frame, which sometimes requires that we
fallback from .eh_frame to .debug_frame in architectures such as PowerPC (Hemant Kumar)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWsjJzAAoJENZQFvNTUqpAF2EP/2UJGz/mi7O3wu90YO7BnglD
WizS/Y4fTLDfoz9+hwiUscHMvBOpAYMRbGX73AyHCP5qPsv1fRyW5jPCyCqpvDWB
TU86CX0t9CEXAj3mKTwShqIXiY9Hmf0lOwxAxY+Y5I12utirqHzZreilBhNvHStz
ESYXpTKsDNdU08Zu7nLmKqIlFLnRvY+7sL55+rgcw7DWaIcivpAF8b8RX7iJyoJk
fL7dkebXDtZvQpBZ4A8TniACjqebfpg1BSiZ7c9NDIs7YMB+2VPDzXrySP2Oq3q6
u8rZtwn8/0idZ5Es2LWU68QXJL0Z6q7p74BZ+/IO1jSTviegu8CQTfIHRfyx+ur4
IZroUuEPDz9tFw7q8tUt/D48Qbh7rOIFBYUHtbq9e0g1WfW2g0NzP/EseNQxkica
uZdfn98cHZyeGiNLhRAjqTwGmZTlV7EoNh6282i7PwyJ9J5nOs36f3Tuo1bekVp+
qtugNbE2xebwwCiBSAHbsQcIrKnyL+bcgSrDzKAP5kBz9r58TQad+CUNQ/IHyhdr
q66RYEy3cdmotcPKtK5jxNMYoSoJzlGEpX3FKXZNHkpRIkNp1vZA6MlSIiHOo/A8
eUg6O55XBRJLYdZ/Q2Vb1t1X83g2789o4tgW0tOUtwBwCW7AIQq7w2m08aUCbQf5
2/HaTdhyxMx6ra34dzS1
=0cG5
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- tracepoint_error() can receive e=NULL, robustify it, fixes a problem noticed
with a very specific combination: Machine with Intel PT (e.g. Broadwell),
kernel with no perf_event_attr.context_switch feature (e.g. 4.2) and unreadable
tracefs (for instance !root users), making the fallback from
perf_event_attr.context_switch to the sched:sched_switch tracepoint to fail
reading its info from tracefs, fix it. (Adrian Hunter)
- Fix segfault in intel PT, by making it follow the 'struct thread' lifetime cycle
checking expectations, noticed for instance, when processing perf.data files with
Intel PT data using 'perf script' and when exiting 'perf report' (Adrian Hunter)
- Fix CFI usage from .eh_frame and .debug_frame, which sometimes requires that we
fallback from .eh_frame to .debug_frame in architectures such as PowerPC (Hemant Kumar)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We broke interval data displays with commit:
3f416f22d1 ("perf stat: Do not clean event's private stats")
This commit removed stats cleaning, which is important for '-r' option
to carry counters data over the whole run. But it's necessary to clean
it for interval mode, otherwise the displayed value is avg of all
previous values.
Before:
$ perf stat -e cycles -a -I 1000 record
# time counts unit events
1.000240796 75,216,287 cycles
2.000512791 107,823,524 cycles
$ perf stat report
# time counts unit events
1.000240796 75,216,287 cycles
2.000512791 91,519,906 cycles
Now:
$ perf stat report
# time counts unit events
1.000240796 75,216,287 cycles
2.000512791 107,823,524 cycles
Notice the second value being bigger (91,.. < 107,..).
This could be easily verified by using perf script which displays raw
stat data:
$ perf script
CPU THREAD VAL ENA RUN TIME EVENT
0 -1 23855779 1000209530 1000209530 1000240796 cycles
1 -1 33340397 1000224964 1000224964 1000240796 cycles
2 -1 15835415 1000226695 1000226695 1000240796 cycles
3 -1 2184696 1000228245 1000228245 1000240796 cycles
0 -1 97014312 2000514533 2000514533 2000512791 cycles
1 -1 46121497 2000543795 2000543795 2000512791 cycles
2 -1 32269530 2000543566 2000543566 2000512791 cycles
3 -1 7634472 2000544108 2000544108 2000512791 cycles
The sum of the first 4 values is the first interval aggregated value:
23855779 + 33340397 + 15835415 + 2184696 = 75,216,287
The sum of the second 4 values minus first value is the second interval
aggregated value:
97014312 + 46121497 + 32269530 + 7634472 - 75216287 = 107,823,524
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1454485436-20639-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge fixes from Andrew Morton:
"18 fixes"
[ The 18 fixes turned into 17 commits, because one of the fixes was a
fix for another patch in the series that I just folded in by editing
the patch manually - hopefully correctly - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix memory leak in copy_huge_pmd()
drivers/hwspinlock: fix race between radix tree insertion and lookup
radix-tree: fix race in gang lookup
mm/vmpressure.c: fix subtree pressure detection
mm: polish virtual memory accounting
mm: warn about VmData over RLIMIT_DATA
Documentation: cgroup-v2: add memory.stat::sock description
mm: memcontrol: drop superfluous entry in the per-memcg stats array
drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration
proc: revert /proc/<pid>/maps [stack:TID] annotation
numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390
MAINTAINERS: update Seth email
ocfs2/cluster: fix memory leak in o2hb_region_release
lib/test-string_helpers.c: fix and improve string_get_size() tests
thp: limit number of object to scan on deferred_split_scan()
thp: change deferred_split_count() to return number of THP in queue
thp: make split_queue per-node
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iEYEABECAAYFAlayLO0ACgkQIXnXXONXERcDKwCeMsgPWCtDA8tIOawNY9MhJFsE
F30AoJFCqUny4Sc/T6/+bBB5l3zPTlUy
=Zzrb
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.5-2' of git://git.code.sf.net/p/openipmi/linux-ipmi
Pull IPMI fix from Corey Minyard:
"Fix a compile error on IPMI when ACPI is disabled"
* tag 'for-linus-4.5-2' of git://git.code.sf.net/p/openipmi/linux-ipmi:
ipmi: put acpi.h with the other headers
- Fix build error with *_OF_DECLARE() when used in modules
- Add missing platform maintainers for dts files in MAINTAINERS
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWshtIAAoJEPr7XbWNvGHDtjIP/36JASb3UE12IzuWuDLjE1Hn
OJktKPKie7GTXSWmidNPny/0rw3Z5teOVY4H8muyok1vGL9YGuW9G9lLumIPN+bG
zIxCXBTqMhLs4AW43KAdJ7FgS81CaoLfPl9eOKNPI+LNfwbq/2weUiLad5FXI/p3
jAY+uuCMVzrT1sQDqRcQXz4UvdDe161KeQMLj/ocyHBEiWkZLJxAP+gS7x4EKon5
DeevujH+KDNO53QWvjvhbprPtOQqVLBnCTqOHGABsM5M+4szpTPIlzXIyVwCW4yO
IbePv6pReqApefzAe/20UiyMEAscezsa7fgcVYY+j1S6uxB6kZKzLaroH7hMGk8+
d7why5xqhsPoCMOGW8N7egg7e4GNwdZ/qlL9Tx3Q33TTnbL7HOClpzyTRaX6nXbg
hYA4ermgW5uRlFdqp1hoBUirioaHjj/6xIYGByJ69jxZhZ3a2ZzwV6dl/NiBhlDW
JcpA359Qo2sMNqnX2zbvHznFYwlLp1zQ4XMO7Qad3AvXgaY5ePkg30yHlqLBQ3/H
TkC0xakqgrWl3WxWOPlQM29ivodHmI1tVX6EcpOX3WYQyNrgSAvkQF8Rx+aoWZLG
yzopt/34COFHaGpqqcHKyvFp4Nsb6Px5ryfrcJGucoBDJg22q9KOsLc1GkrGTbx2
0LYKk/z33SiZwjX0QVHD
=rXlf
-----END PGP SIGNATURE-----
Merge tag 'devicetree-fixes-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
- Fix build error with *_OF_DECLARE() when used in modules
- Add missing platform maintainers for dts files in MAINTAINERS
* tag 'devicetree-fixes-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: drop symbols declared by _OF_DECLARE() from modules
MAINTAINERS: Add missing platform maintainers for dts files
Here's a simple fix to correct that issue.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWr9ZIAAoJEKKk/i67LK/8Zw4H/jTSM58YqENMrXLkfL1DbzXR
UsJM+tnJX1BjYDy57yAj3HXYYWKB9h+T9Fku4CMxzRqkFHA3Vu95YIJN8hpQ0fqT
R4/nvetq214bH27DNFuDHzBwVJL368De0Kcmqy83FB5G89G8JXoxiY6nvDkmQUIq
mzYU9duCbCRvXrOSDCVSVf/hVg71Ek/erZMVfYwSf56yy8ICOoiW8Fyv6kludBAu
/71ztEWPlIXJWijIQsH2fWsdOln7N/Ej5+9wtotSlbHtTuhJJi2xr817WwOLNUBN
HC5OM5K6mWqnLveZZLTp6o77Ap6BYw2vCyElvARt23Eywz3iUE1ZzeRGSPtQwI8=
=s+F8
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"A cleanup to the stack tracer broke stack tracing on s390. Here's a
simple fix to correct that issue"
* tag 'trace-v4.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/stacktrace: Show entire trace if passed in function not found
Trinity is now hitting the WARN_ON_ONCE we added in v3.15 commit
cda540ace6 ("mm: get_user_pages(write,force) refuse to COW in shared
areas"). The warning has served its purpose, nobody was harmed by that
change, so just remove the warning to generate less noise from Trinity.
Which reminds me of the comment I wrongly left behind with that commit
(but was spotted at the time by Kirill), which has since moved into a
separate function, and become even more obscure: delete it.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enclosing '#include <linux/acpi.h>' within '#ifdef CONFIG_ACPI' is
unnecessary, since it has its own conditional compile for CONFIG_ACPI.
Commit 0fbcf4af7c ("ipmi: Convert the IPMI SI ACPI handling to a
platform device") exposed this as a problem for platforms that do not
support ACPI when it introduced a call to ACPI_PTR() macro outside of
the CONFIG_ACPI conditional compile. This would have been perfectly
acceptable if acpi.h were not conditionally excluded for the non-acpi
platform, because the conditional compile within acpi.h defines
ACPI_PTR() to return NULL when compiled for non acpi platforms.
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Fixed commit reference in header to conform to standard.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
We allocate a pgtable but do not attach it to anything if the PMD is in
a DAX VMA, causing it to leak.
We certainly try to not free pgtables associated with the huge zero page
if the zero page is in a DAX VMA, so I think this is the right solution.
This needs to be properly audited.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
of_hwspin_lock_get_id() is protected by the RCU lock, which means that
insertions can occur simultaneously with the lookup. If the radix tree
transitions from a height of 0, we can see a slot with the indirect_ptr
bit set, which will cause us to at least read random memory, and could
cause other havoc.
Fix this by using the newly introduced radix_tree_iter_retry().
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the indirect_ptr bit is set on a slot, that indicates we need to redo
the lookup. Introduce a new function radix_tree_iter_retry() which
forces the loop to retry the lookup by setting 'slot' to NULL and
turning the iterator back to point at the problematic entry.
This is a pretty rare problem to hit at the moment; the lookup has to
race with a grow of the radix tree from a height of 0. The consequences
of hitting this race are that gang lookup could return a pointer to a
radix_tree_node instead of a pointer to whatever the user had inserted
in the tree.
Fixes: cebbd29e1c ("radix-tree: rewrite gang lookup using iterator")
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When vmpressure is called for the entire subtree under pressure we
mistakenly use vmpressure->scanned instead of vmpressure->tree_scanned
when checking if vmpressure work is to be scheduled. This results in
suppressing all vmpressure events in the legacy cgroup hierarchy. Fix it.
Fixes: 8e8ae64524 ("mm: memcontrol: hook up vmpressure to socket pressure")
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* add VM_STACK as alias for VM_GROWSUP/DOWN depending on architecture
* always account VMAs with flag VM_STACK as stack (as it was before)
* cleanup classifying helpers
* update comments and documentation
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch provides a way of working around a slight regression
introduced by commit 8463833590 ("mm: rework virtual memory
accounting").
Before that commit RLIMIT_DATA have control only over size of the brk
region. But that change have caused problems with all existing versions
of valgrind, because it set RLIMIT_DATA to zero.
This patch fixes rlimit check (limit actually in bytes, not pages) and
by default turns it into warning which prints at first VmData misuse:
"mmap: top (795): VmData 516096 exceed data ulimit 512000. Will be forbidden soon."
Behavior is controlled by boot param ignore_rlimit_data=y/n and by sysfs
/sys/module/kernel/parameters/ignore_rlimit_data. For now it set to "y".
[akpm@linux-foundation.org: tweak kernel-parameters.txt text[
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Link: http://lkml.kernel.org/r/20151228211015.GL2194@uranus
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kees Cook <keescook@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MEM_CGROUP_STAT_NSTATS is just a delimiter for cgroup1 statistics, not
an actual array entry. Reuse it for the first cgroup2 stat entry, like
in the event array.
Fixes: b2807f07f4 ("mm: memcontrol: add "sock" to cgroup2 memory.stat")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reduced testcase:
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <numaif.h>
#define SIZE 0x2000
int main()
{
int fd;
void *p;
fd = open("/dev/sg0", O_RDWR);
p = mmap(NULL, SIZE, PROT_EXEC, MAP_PRIVATE | MAP_LOCKED, fd, 0);
mbind(p, SIZE, 0, NULL, 0, MPOL_MF_MOVE);
return 0;
}
We shouldn't try to migrate pages in sg VMA as we don't have a way to
update Sg_scatter_hold::pages accordingly from mm core.
Let's mark the VMA as VM_IO to indicate to mm core that the VMA is not
migratable.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Shiraz Hashim <shashim@codeaurora.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit b76437579d ("procfs: mark thread stack correctly in
proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps.
Finding the task of a stack VMA requires walking the entire thread list,
turning this into quadratic behavior: a thousand threads means a
thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a
million combinations.
The cost is not in proportion to the usefulness as described in the
patch.
Drop the [stack:TID] annotation to make /proc/<pid>/maps (and
/proc/<pid>/numa_maps) usable again for higher thread counts.
The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as
identifying the stack VMA there is an O(1) operation.
Siddesh said:
"The end users needed a way to identify thread stacks programmatically and
there wasn't a way to do that. I'm afraid I no longer remember (or have
access to the resources that would aid my memory since I changed
employers) the details of their requirement. However, I did do this on my
own time because I thought it was an interesting project for me and nobody
really gave any feedback then as to its utility, so as far as I am
concerned you could roll back the main thread maps information since the
information is available in the thread-specific files"
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When working with hugetlbfs ptes (which are actually pmds) is not valid to
directly use pte functions like pte_present() because the hardware bit
layout of pmds and ptes can be different. This is the case on s390.
Therefore we have to convert the hugetlbfs ptes first into a valid pte
encoding with huge_ptep_get().
Currently the /proc/<pid>/numa_maps code uses hugetlbfs ptes without
huge_ptep_get(). On s390 this leads to the following two problems:
1) The pte_present() function returns false (instead of true) for
PROT_NONE hugetlb ptes. Therefore PROT_NONE vmas are missing
completely in the "numa_maps" output.
2) The pte_dirty() function always returns false for all hugetlb ptes.
Therefore these pages are reported as "mapped=xxx" instead of
"dirty=xxx".
Therefore use huge_ptep_get() to correctly convert the hugetlb ptes.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update/unify my contact info. The old email address will no longer work
soon.
Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o2hb_region_release currently doesn't free o2hb_debug_buf
hr_db_elapsed_time and hr_db_pinned malloced in o2hb_debug_create. Also
we should call debugfs_remove before freeing its data, to prevent the risk
accessing debugfs rightly after its data has been freed.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recently added commit 564b026fbd ("string_helpers: fix precision loss
for some inputs") fixed precision issues for string_get_size() and broke
tests.
Fix and improve them: test both STRING_UNITS_2 and STRING_UNITS_10 at a
time, better failure reporting, test small an huge values.
Fixes: 564b026fbd ("string_helpers: fix precision loss for some inputs")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we have a lot of pages in queue to be split, deferred_split_scan()
can spend unreasonable amount of time under spinlock with disabled
interrupts.
Let's cap number of pages to split on scan by sc->nr_to_scan.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I've got meaning of shrinker::count_objects() wrong: it should return
number of potentially freeable objects, which is not necessary correlate
with freeable memory.
Returning 256 per THP in queue is not reasonable:
shrinker::scan_objects() never called with nr_to_scan > 128 in my setup.
Let's return 1 per THP and correct scan_object accordingly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
User visible:
- Make --percent-limit apply to callchains also and fix some bugs
related to --percent-limit (Namhyung Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWr9ItAAoJENZQFvNTUqpAmDcP/A0DqDIsttDMrJtsTb+DeMEt
HaOjOZT9usZW8khAkEsk62i6lg/ewVqajalzFkfByG0gAJ5u/Xi1YQo64MzXpLjV
DZ0Nlqujhb6PowO9eRPra6UAEiq+88Gzn+y+XzYqVsPVLAK/d8Ck9ALWo33gIBhc
uq32fpp79zrCgfq8pOhvWMaMmRqmpyUiwCjiFCgUs1FD2NjdwGWSfH6XqxVdojVv
/s1agYu+E9WJ74Df2upoIUxiFcG4+aT6Y4li3N1XaATrWoiqrkSyp1uVwOZ9H4i0
9OyIhDzR0aar8z0aVJJmccqfGpC9LLWaf5YkYqK6A8vI0x5FyCu4TieeKCMJ5k7S
1AO2E6FGsQ/vOJx/LvVGrEAmUog/kZ8q4OmudpmGBcHJ9PGHpnUg/6uAij2Nwyxo
68oL4kgZFTrC5Cxdr1W+8Z/4Z9piNzArs2SSr5PfHWyzAB35WEKXCwoDy1uQ2q7d
XIUa+6Gvldc5iRjrulY8YCqwhltfx9LiCWdOYmEpS2BGIeWzTQIinYNzVwCTP7Av
tsLKaGx4/O5iZf1yuMaOXx9nXK6N87gb9il8sSQD2AZVPIkBTkE5mKYycqXblqUV
wFH4oZ4QKTPnbwV2gOHsjOKABhsm6Jop8vpgKZtF3May5K9lNx6Ivq4KyqA+uSht
BpYuVeCKwKHyT2uwSQf4
=3p/S
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-3' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core callchain fixes and improvements from Arnaldo Carvalho de Melo <acme@redhat.com:
User visible changes:
- Make --percent-limit apply to callchains also and fix some bugs
related to --percent-limit (Namhyung Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Port 'perf kvm stat' to PowerPC (Hemant Kumar)
Infrastructure:
- Use the 'feature-dump' target to do the feature checks just once and then
add code to reuse that in the tests/make makefile, speeding up the
'make -C tools/perf build-test' target (Wang Nan)
- Reduce the number of tests the 'build-test' target do to those that don't
pollute the source tree (Arnaldo Carvalho de Melo)
- Improve the output of the build tests a bit by aligning the name of the
tests, more can be done to filter out uninteresting info in the output
(Arnaldo Carvalho de Melo)
- Add perf_evlist pointer to *info_priv_size(), more prep work for
supporting the coresight architecture (Mathieu Poirier)
- Improve the 'perf test bp_signal' test (Wang Nan)
- Check environment before starting the BPF 'perf test', so that we can just
'Skip' older kernels instead of 'FAIL'ing them (Wang Nan)
- Fix cpumode of synthesized buildid event (Wang Nan)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWq9VUAAoJENZQFvNTUqpAi0QP/2i7itD/P9wkGLsPb+HbNlX+
umTxbhdKkyLc9WI8hGfdXXtdHcdkHUAmuZ/DzMbOcnGUE0mJdL6dphslsm1VFslP
Q9sAj43BjWddEKfka1ylos1u/nDhpdpRX7bkRaepA9Zl0P0BSPXv+S28GO3jttxX
uadXN9K7Amsa8tibKicxgLTUhZH05lmhPO00xGHuhQ6EQHcaw8VDYUlA+Wrh+NIa
jIVnRE5q/hBwOyFQR/1gal8N5w2vO0vCglQmGQTEDjgQVMf/cSZChUlVqtxcDxcu
FIDE42+jAnbESmVkBHq2n8ZvNxHOVlG6hTqZOqeiqs+tyfw7fYnGf+tkFPgBIEXP
hB/hwgCJVBbbYo5hzT12eBz7UeWwn1ljqTpTnBrCaOl05MwvN4bMAMFVBXPLQHtm
47AsyaOXEli9RaRwgdcYGVUhqIPTa2Ql2vPRb1PmQ3ugBqqLyUpYOox8WUYQv2g9
sd61KMoXxUiuNsoq0ZXXkjWBeEBz2joRQYrlBQ0tZR8m06UA8FXLUXFopAUZKHGh
7w8BTXRRCc9lEm/pWfHjVykObRlHew0qcDihybtMVsNGpUQzqKh7A8b2DmMvmRrJ
BmnUBQA8kFiE4BJSdOdqwH8PpDRYpTCg0a6cyK4RDlm7isX2ho40edstspEO1N4n
BUG1zE5SIPC1o1MSFxBn
=CFBX
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-2' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf tooling changes from Arnaldo Carvalho de Melo:
New features:
- Port 'perf kvm stat' to PowerPC (Hemant Kumar)
Infrastructure changes:
- Use the 'feature-dump' target to do the feature checks just once and then
add code to reuse that in the tests/make makefile, speeding up the
'make -C tools/perf build-test' target (Wang Nan)
- Reduce the number of tests the 'build-test' target do to those that don't
pollute the source tree (Arnaldo Carvalho de Melo)
- Improve the output of the build tests a bit by aligning the name of the
tests, more can be done to filter out uninteresting info in the output
(Arnaldo Carvalho de Melo)
- Add perf_evlist pointer to *info_priv_size(), more prep work for
supporting the coresight architecture (Mathieu Poirier)
- Improve the 'perf test bp_signal' test (Wang Nan)
- Check environment before starting the BPF 'perf test', so that we can just
'Skip' older kernels instead of 'FAIL'ing them (Wang Nan)
- Fix cpumode of synthesized buildid event (Wang Nan)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
User visible:
- Rename the "colors.code" ~/.perfconfig variable to "colors.jump_arrows",
as it controls just the that UI element in the annotate browser (Taeung Song)
- Avoid trying to read ELF symtabs from device files, noticed while doing
memory profiling work (Jiri Olsa)
- Improve context detection when offering options in the hists browser,
i.e. some options don't make sense when the browser is not working with
a perf.data file ('perf top' mode), only in 'perf report' mode, like
scripting (Namhyung Kim)
Infrastructure:
- Elliminate duplication in the hists browser filter functions, getting the
common part into a function that receives callbacks for filtering by
DSO, thread, etc (Namhyung Kim)
- Fix misleadingly indented assignment, found using
gcc6 -Wmisleading-indentation (Markus Trippelsdorf)
- Handle LLVM relocation oddities in libbpf, introducing a 'perf test' that
detects such problems and then fixing the problem, so that the test now
passes (Wang Nan)
- More improvements to the build infrastructure to allow reusing the
feature detection facilities (Wang Nan)
- Auto initialize the globals needed by cpu__max_{cpu,node}() routines
(Arnaldo Carvalho de Melo)
Documentation:
- Document the perf sysctls in Documentation/sysctl/kernel.txt (Ben Hutchings)
- Document a bunch more ~/.perfconfig knobs (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWp8VgAAoJENZQFvNTUqpA8gAQAJY4pLDDeK6rZAqAY5fD//JJ
aETuF0icXErkboef5usFk0MGVzK8HOWFOD5YnVAPXsGRqTHZ9ix3xw5sBFDaZKrP
zagySidfHxrPDvtSW6doCjtg571dFaEHWUL48kT8ZpH9vwGDs42Gl/hjEY2P91zK
uNktNoHvbHMUOxoMIp9zyCcV5WEWTog8RwCp53QrxxNrLYIT40wpADQIvuKNgqEP
wIQyC2pgLv9ra27fXThauDes+a/TWLfURtxoeGgiDaIFmOi2t5VeN8D+DxXskKIB
GtYF7Wxk5U+gELsAo5cZKS5Hyf13LqmwL4Jy/Th5jWaObyNXU2ZwnB3zXxZ3Dmvu
keiOY8EmoOoKqOhjUVfsdvsVy0tNhObIJYhlqyOfQg+EqizR0PVlkDxWvODEKkkA
T+dWXm183aXwCsHKM0EhAPgsVAJ/U9+lQjHro/lPq/i54oOogL/aBsVvUjNKo6Od
m6q2ezgFZRHuPMLmOYhJaxtpvOirQkxORZZx2wgzgs5AsJly+ydoR3ETdhAD76Sg
QGSKdTCziDA8KM0Vul6mjoqNlASpUM9cN6uLlv4c26pmf1krwleILMFqzaoYV3iE
3y/ebiRyj2luwKSXELNjcs/7GzCfN3h8sjP6AQ+q0fuWH3zU6+J9oKi+KevYBr8J
fFEX6MNxdxOY92mXDPZa
=cuZc
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Rename the "colors.code" ~/.perfconfig variable to "colors.jump_arrows",
as it controls just the that UI element in the annotate browser (Taeung Song)
- Avoid trying to read ELF symtabs from device files, noticed while doing
memory profiling work (Jiri Olsa)
- Improve context detection when offering options in the hists browser,
i.e. some options don't make sense when the browser is not working with
a perf.data file ('perf top' mode), only in 'perf report' mode, like
scripting (Namhyung Kim)
Infrastructure changes:
- Elliminate duplication in the hists browser filter functions, getting the
common part into a function that receives callbacks for filtering by
DSO, thread, etc. (Namhyung Kim)
- Fix misleadingly indented assignment, found using
gcc6 -Wmisleading-indentation (Markus Trippelsdorf)
- Handle LLVM relocation oddities in libbpf, introducing a 'perf test' that
detects such problems and then fixing the problem, so that the test now
passes (Wang Nan)
- More improvements to the build infrastructure to allow reusing the
feature detection facilities (Wang Nan)
- Auto initialize the globals needed by cpu__max_{cpu,node}() routines
(Arnaldo Carvalho de Melo)
Documentation changes:
- Document the perf sysctls in Documentation/sysctl/kernel.txt (Ben Hutchings)
- Document a bunch more ~/.perfconfig knobs (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'perf probe' through debuginfo__find_probes() in util/probe-finder.c
checks for the functions' frame descriptions in either .eh_frame section
of an ELF or the .debug_frame.
The check is based on whether either one of these sections is present.
Depending on distro, toolchain defaults, architetcutre, build flags,
etc., CFI might be found in either .eh_frame and/or .debug_frame.
Sometimes, it may happen that, .eh_frame, even if present, may not be
complete and may miss some descriptions.
Therefore, to be sure, to find the CFI covering an address we will
always have to investigate both if available.
For e.g., in powerpc, this may happen:
$ gcc -g bin.c -o bin
$ objdump --dwarf ./bin
<1><145>: Abbrev Number: 7 (DW_TAG_subprogram)
<146> DW_AT_external : 1
<146> DW_AT_name : (indirect string, offset: 0x9e): main
<14a> DW_AT_decl_file : 1
<14b> DW_AT_decl_line : 39
<14c> DW_AT_prototyped : 1
<14c> DW_AT_type : <0x57>
<150> DW_AT_low_pc : 0x100007b8
If the .eh_frame and .debug_frame are checked for the same binary, we
will find that, .eh_frame (although present) doesn't contain a
description for "main" function.
But, .debug_frame has a description:
000000d8 00000024 00000000 FDE cie=00000000 pc=100007b8..10000838
DW_CFA_advance_loc: 16 to 100007c8
DW_CFA_def_cfa_offset: 144
DW_CFA_offset_extended_sf: r65 at cfa+16
...
Due to this (since, perf checks whether .eh_frame is present and goes on
searching for that address inside that frame), perf is unable to process
the probes:
# perf probe -x ./bin main
Failed to get call frame on 0x100007b8
Error: Failed to add events.
To avoid this issue, we need to check both the sections (.eh_frame and
.debug_frame), which is done in this patch.
Note that, we can always force everything into both .eh_frame and
.debug_frame by:
$ gcc bin.c -fasynchronous-unwind-tables -fno-dwarf2-cfi-asm -g -o bin
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Mark Wielaard <mjw@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1454426806-13974-1-git-send-email-hemant@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
intel_pt_process_auxtrace_info() creates a pt->unknown_thread thread
that eventually needs to be freed by the last thread__put() on it, when
its refcount hits zero, which may happen in
intel_pt_process_auxtrace_info() error handling path and triggers the
following segfault, which would happen as well at intel_pt_free, when
tools using this intel_pt codebase frees up resources:
# perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
0 a anaconda-ks.cfg bin perf.data perf.data.old perf-f23-bringup.todo
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.217 MB perf.data ]
#
# perf script -F event,comm,pid,tid,time,addr,ip,sym,dso,iregs
Samples for 'instructions:u' event do not have IREGS attribute set. Cannot print 'iregs' field.
intel_pt_synth_events: failed to synthesize 'instructions' event type
Segmentation fault (core dumped)
#
The problem is: there's a union in 'struct thread' combines a list_head
and a rb_node. The standard life cycle of a thread is: init rb_node in
the constructor, insert it into machine->threads rbtree using rb_node,
move it to machine->dead_threads using list_head, clean in the last
thread__put: list_del_init(&thread->node).
In the above command, it clean a thread before adding it into list,
causes the above segfault.
Since pt->unknown_thread will never live in an rbtree, initialize its
list node so that when list_del_init() is done on it we don't segfault.
After this patch:
# perf script -F event,comm,pid,tid,time,addr,ip,sym,dso,iregs
Samples for 'instructions:u' event do not have IREGS attribute set. Cannot print 'iregs' field.
intel_pt_synth_events: failed to synthesize 'instructions' event type
0x248 [0x88]: failed to process type: 70
#
Reported-by: Tong Zhang <ztong@vt.edu>
Reported-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lkml.kernel.org/r/1454296865-19749-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull networking fixes from David Miller:
"This looks like a lot but it's a mixture of regression fixes as well
as fixes for longer standing issues.
1) Fix on-channel cancellation in mac80211, from Johannes Berg.
2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
module, from Eric Dumazet.
3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
Dumazet.
4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
bound, from Craig Gallek.
5) GRO key comparisons don't take lightweight tunnels into account,
from Jesse Gross.
6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
Dumazet.
7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
register them, otherwise the NEWLINK netlink message is missing
the proper attributes. From Thadeu Lima de Souza Cascardo.
8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
Schimmel
9) Handle fragments properly in ipv4 easly socket demux, from Eric
Dumazet.
10) Don't ignore the ifindex key specifier on ipv6 output route
lookups, from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
tcp: avoid cwnd undo after receiving ECN
irda: fix a potential use-after-free in ircomm_param_request
net: tg3: avoid uninitialized variable warning
net: nb8800: avoid uninitialized variable warning
net: vxge: avoid unused function warnings
net: bgmac: clarify CONFIG_BCMA dependency
net: hp100: remove unnecessary #ifdefs
net: davinci_cpdma: use dma_addr_t for DMA address
ipv6/udp: use sticky pktinfo egress ifindex on connect()
ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
netlink: not trim skb for mmaped socket when dump
vxlan: fix a out of bounds access in __vxlan_find_mac
net: dsa: mv88e6xxx: fix port VLAN maps
fib_trie: Fix shift by 32 in fib_table_lookup
net: moxart: use correct accessors for DMA memory
ipv4: ipconfig: avoid unused ic_proto_used symbol
bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
bnxt_en: Ring free response from close path should use completion ring
net_sched: drr: check for NULL pointer in drr_dequeue
...
Pull libnvdimm fixes from Dan Williams:
"1/ Fixes to the libnvdimm 'pfn' device that establishes a reserved
area for storing a struct page array.
2/ Fixes for dax operations on a raw block device to prevent pagecache
collisions with dax mappings.
3/ A fix for pfn_t usage in vm_insert_mixed that lead to a null
pointer de-reference.
These have received build success notification from the kbuild robot
across 153 configs and pass the latest ndctl tests"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
phys_to_pfn_t: use phys_addr_t
mm: fix pfn_t to page conversion in vm_insert_mixed
block: use DAX for partition table reads
block: revert runtime dax control of the raw block device
fs, block: force direct-I/O for dax-enabled block devices
devm_memremap_pages: fix vmem_altmap lifetime + alignment handling
libnvdimm, pfn: fix restoring memmap location
libnvdimm: fix mode determination for e820 devices
When all callchains of a hist entry is percent-limited, do not add a
blank line at the end. It makes the entry look like it doesn't have
callchains.
Reported-and-Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20160128122454.GA27446@danjae.kornet
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When there's only a single callchain, perf doesn't print its percentage
in front of the symbols. This is because it assumes that the percentage
is same as parents. But if a percent limit is applied, it's possible
that there are actually a couple of child nodes but only one of them is
shown. In this case it should display the percent to prevent
misunderstanding of its percentage is same as the parent's.
For example, let's see the following callchain.
$ perf report --no-children --percent-limit 0.01 --tui
...
- 0.06% sleep [kernel.vmlinux] [k] kmem_cache_alloc_trace
kmem_cache_alloc_trace
- perf_event_mmap
- 0.04% mmap_region
do_mmap_pgoff
- vm_mmap_pgoff
+ 0.02% sys_mmap_pgoff
+ 0.02% vm_mmap
+ 0.02% mprotect_fixup
Current code omits the percent if 'mmap_region' becomes the only node
when percent limit is set to 0.03%, its percent is not 0.06% but users
will assume it incorrectly.
Before:
$ perf report --no-children --percent-limit 0.03 --tui
...
0.06% sleep [kernel.vmlinux] [k] kmem_cache_alloc_trace
kmem_cache_alloc_trace
- perf_event_mmap
- mmap_region
do_mmap_pgoff
vm_mmap_pgoff
After:
$ perf report --no-children --percent-limit 0.03 --tui
...
0.06% sleep [kernel.vmlinux] [k] kmem_cache_alloc_trace
kmem_cache_alloc_trace
- perf_event_mmap
- 0.04% mmap_region
do_mmap_pgoff
vm_mmap_pgoff
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-10-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pass parent node's total period to callchain print functions. This info
is needed by later patch to determine whether it can omit percent or not
correctly.
No functional change intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-9-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The commit 8c430a3486 ("perf hists browser: Support folded
callchains") missed to update hist_browser__dump() so it always shows
graph-style callchains regardless of current setting.
To fix that, factor out callchain printing code and rename the existing
function which prints graph-style callchain.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 8c430a3486 ("perf hists browser: Support folded callchains")
Link: http://lkml.kernel.org/r/1453909257-26015-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When there's only a single callchain, perf doesn't print its percentage
in front of the symbols. This is because it assumes that the percentage
is same as parents. But if a percent limit is applied, it's possible
that there are actually a couple of child nodes but only one of them is
shown. In this case it should display the percent to prevent
misunderstanding of its percentage is same as the parent's.
For example, let's see the following callchain.
$ perf report -s comm --percent-limit 0.01 --stdio
...
9.95% swapper
|
|--7.57%--intel_idle
| cpuidle_enter_state
| cpuidle_enter
| call_cpuidle
| cpu_startup_entry
| |
| |--4.89%--start_secondary
| |
| --2.68%--rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
|
|--0.15%--__schedule
| |
| |--0.13%--schedule
| | schedule_preempt_disable
| | cpu_startup_entry
| | |
| | |--0.09%--start_secondary
| | |
| | --0.04%--rest_init
| | start_kernel
| | x86_64_start_reservations
| | x86_64_start_kernel
| |
| --0.01%--schedule_preempt_disabled
| cpu_startup_entry
...
Current code omits the percent if 'intel_idle' becomes the only node
when percent limit is set to 0.5%, its percent is not 9.95% but users
will assume it incorrectly.
Before:
$ perf report --percent-limit 0.5 --stdio
...
9.95% swapper
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--4.89%--start_secondary
|
--2.68%--rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
After:
$ perf report --percent-limit 0.5 --stdio
...
9.95% swapper
|
--7.57%--intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--4.89%--start_secondary
|
--2.68%--rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-7-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pass hist entry's period to graph callchain print function. This info
is needed by later patch to determine whether it can omit percentage of
top-level node or not.
No functional change intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's just a wrapper function to align the start position ofcallchains to
'comm' of each thread if it's a first sort key. But it doesn't not work
with tracepoint events and also with upcoming hierarchy view.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently --percent-limit option only works for hist entries. However
it'd be better to have same effect to callchains as well
Requested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the hist entry addition path doesn't update total_period of
hists and it's calculated during 'resort' path. But the resort path
needs to know the total period before doing its job because it's used
for calculating percent limit of callchains in hist entries.
So this patch update the total period during the addition path. It
makes the percent limit of callchains working (again).
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The total period should be get using hists__total_period() since it
takes filtered entries into account. In addition, if callchain mode is
'fractal', the total period should be the entry's period.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes segmentation fault using, for instance:
(gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
(gdb) bt
#0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
#1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
at util/parse-events.c:433
#2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
at util/parse-events.c:498
#3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
at util/parse-events.c:936
#4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391
#5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361
#6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401
#7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253
#8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364
#9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664
#10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539
#11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264
#12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390
#13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451
#14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495
#15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618
(gdb)
Intel PT attempts to find the sched:sched_switch tracepoint but that seg
faults if tracefs is not readable, because the error reporting structure
is null, as errors are not reported when automatically adding
tracepoints. Fix by checking before using.
Committer note:
This doesn't take place in a kernel that supports
perf_event_attr.context_switch, that is the default way that will be
used for tracking context switches, only in older kernels, like 4.2, in
a machine with Intel PT (e.g. Broadwell) for non-priviledged users.
Further info from a similar patch by Wang:
The error is in tracepoint_error: it assumes the 'e' parameter is valid.
However, there are many situation a parse_event() can be called without
parse_events_error. See result of
$ grep 'parse_events(.*NULL)' ./tools/perf/ -r'
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tong Zhang <ztong@vt.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: 196581717d ("perf tools: Enhance parsing events tracepoint error output")
Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Here are some small USB fixes and new device ids for 4.5-rc2. Nothing
major here, full details are in the shortlog, and all of these have been
in linux-next successfully.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlauV48ACgkQMUfUDdst+ynoAgCgn5XyA+Nhizh74VUCK953YzQf
gg8AoMyIn9vglivzcWAOCbmz2T9UO5P4
=HoFp
-----END PGP SIGNATURE-----
Merge tag 'usb-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some small USB fixes and new device ids for 4.5-rc2. Nothing
major here, full details are in the shortlog, and all of these have
been in linux-next successfully"
* tag 'usb-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: option: fix Cinterion AHxx enumeration
USB: mxu11x0: fix memory leak on usb_serial private data
USB: serial: ftdi_sio: add support for Yaesu SCU-18 cable
USB: serial: option: Adding support for Telit LE922
USB: serial: visor: fix crash on detecting device without write_urbs
USB: visor: fix null-deref at probe
USB: cp210x: add ID for IAI USB to RS485 adaptor
usb: hub: do not clear BOS field during reset device
cdc-acm:exclude Samsung phone 04e8:685d
usb: cdc-acm: send zero packet for intel 7260 modem
usb: cdc-acm: handle unlinked urb in acm read callback
Here are some small tty/serial driver fixes for 4.5-rc2.
They resolve a number of reported problems (the ioctl one specifically
has been pointed out by numerous people) and one patch adds some new
device ids for the 8250_pci driver. All have been in linux-next
successfully.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlauWWkACgkQMUfUDdst+ykujQCfUSpPMRs3yagM24SI8ITnbEJQ
7H0An0utvQBUhgf10WA7trJ+uyzq4SsQ
=uUEE
-----END PGP SIGNATURE-----
Merge tag 'tty-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty/serial driver fixes for 4.5-rc2.
They resolve a number of reported problems (the ioctl one specifically
has been pointed out by numerous people) and one patch adds some new
device ids for the 8250_pci driver. All have been in linux-next
successfully"
* tag 'tty-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_pci: Add Intel Broadwell ports
staging/speakup: Use tty_ldisc_ref() for paste kworker
n_tty: Fix unsafe reference to "other" ldisc
tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)
tty: Retry failed reopen if tty teardown in-progress
tty: Wait interruptibly for tty lock on reopen