Commit graph

535096 commits

Author SHA1 Message Date
Wang Nan
d325d7887b perf tools: Auto detecting kernel build directory
This patch detects kernel build directory by checking the existence of
include/generated/autoconf.h.

clang working directory is changed to kbuild directory if it is found,
to help user use relative include path. Following patch will detect
kernel include directory, which contains relative include patch so this
workdir changing is needed.

Users are allowed to set 'kbuild-dir = ""' manually to disable this
checking.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/n/tip-owyfwfbemrjn0tlj6tgk2nf5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:57:16 -03:00
Wang Nan
4cea3a9cb3 perf tools: Call clang to compile C source to object code
This is the core patch for supporting eBPF on-the-fly compiling, does
the following work:

 1. Search clang compiler using search_program().

 2. Run command template defined in llvm-bpf-cmd-template option in
    [llvm] config section using read_from_pipe(). Patch of clang and
    source code path is injected into shell command using environment
    variable using force_set_env().

  Commiter notice:

  When building with DEBUG=1 we get a compiler error that gets fixed with
  the same approach described in commit b236512280:

    perf kmem: Fix compiler warning about may be accessing uninitialized variable

    The last argument to strtok_r doesn't need to be initialized, its
    just a placeholder to make this routine reentrant, but gcc doesn't know
    about that and complains, breaking the build, fix it by setting it to
    NULL.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/n/1436445342-1402-14-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:59 -03:00
Wang Nan
aa61fd05ca perf tools: Introduce llvm config options
This patch introduces [llvm] config section with 5 options. Following
patches will use then to config llvm dynamica compiling.

'llvm-utils.[ch]' is introduced in this patch for holding all
llvm/clang related stuffs.

Example:

  [llvm]
        # Path to clang. If omit, search it from $PATH.
	clang-path = "/path/to/clang"

        # Cmdline template. Following line shows its default value.
        # Environment variable is used to passing options.
        #
        # *NOTE*: -D__KERNEL__ MUST appears before $CLANG_OPTIONS,
        # so user have a chance to use -U__KERNEL__ in $CLANG_OPTIONS
        # to cancel it.
	clang-bpf-cmd-template = "$CLANG_EXEC -D__KERNEL__ $CLANG_OPTIONS \
				  $KERNEL_INC_OPTIONS -Wno-unused-value \
				  -Wno-pointer-sign -working-directory \
				  $WORKING_DIR  -c $CLANG_SOURCE -target \
				  bpf -O2 -o -"

        # Options passed to clang, will be passed to cmdline by
        # $CLANG_OPTIONS.
	clang-opt = "-Wno-unused-value -Wno-pointer-sign"

        # kbuild directory. If not set, use /lib/modules/`uname -r`/build.
        # If set to "" deliberately, skip kernel header auto-detector.
	kbuild-dir = "/path/to/kernel/build"

        # Options passed to 'make' when detecting kernel header options.
	kbuild-opts = "ARCH=x86_64"

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1437477214-149684-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:59 -03:00
Wang Nan
9a208effd1 bpf tools: Link all bpf objects onto a list
To allow enumeration of all bpf_objects, keep them in a list (hidden to
caller). bpf_object__for_each_safe() is introduced to do this iteration.
It is safe even user close the object during iteration.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-23-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:59 -03:00
Wang Nan
aa9b1ac33c bpf tools: Introduce accessors for struct bpf_program
This patch introduces accessors for user of libbpf to retrieve section
name and fd of a opened/loaded eBPF program. 'struct bpf_prog_handler'
is used for that purpose. Accessors of programs section name and file
descriptor are provided. Set/get private data are also impelmented.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1435716878-189507-21-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:59 -03:00
Wang Nan
55cffde2e1 bpf tools: Load eBPF programs in object files into kernel
This patch utilizes previous introduced bpf_load_program to load
programs in the ELF file into kernel. Result is stored in 'fd' field in
'struct bpf_program'.

During loading, it allocs a log buffer and free it before return.  Note
that that buffer is not passed to bpf_load_program() if the first
loading try is successful. Doesn't use a statically allocated log buffer
to avoid potention multi-thread problem.

Instructions collected during opening is cleared after loading.

load_program() is created for loading a 'struct bpf_insn' array into
kernel, bpf_program__load() calls it. By this design we have a function
loads instructions into kernel. It will be used by further patches,
which creates different instances from a program and load them into
kernel.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-20-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:59 -03:00
Wang Nan
7bf98369a7 bpf tools: Introduce bpf_load_program() to bpf.c
bpf_load_program() can be used to load bpf program into kernel. To make
loading faster, first try to load without logbuf. Try again with logbuf
if the first try failed.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-19-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:58 -03:00
Wang Nan
8a47a6c522 bpf tools: Relocate eBPF programs
If an eBPF program accesses a map, LLVM generates a load instruction
which loads an absolute address into a register, like this:

  ld_64   r1, <MCOperand Expr:(mymap)>
  ...
  call    2

That ld_64 instruction will be recorded in relocation section.
To enable the usage of that map, relocation must be done by replacing
the immediate value by real map file descriptor so it can be found by
eBPF map functions.

This patch to the relocation work based on information collected by
patches:

'bpf tools: Collect symbol table from SHT_SYMTAB section',
'bpf tools: Collect relocation sections from SHT_REL sections'
and
'bpf tools: Record map accessing instructions for each program'.

For each instruction which needs relocation, it inject corresponding
file descriptor to imm field. As a part of protocol, src_reg is set to
BPF_PSEUDO_MAP_FD to notify kernel this is a map loading instruction.

This is the final part of map relocation patch. The principle of map
relocation is described in commit message of 'bpf tools: Collect symbol
table from SHT_SYMTAB section'.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-18-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:58 -03:00
Wang Nan
52d3352e79 bpf tools: Create eBPF maps defined in an object file
This patch creates maps based on 'map' section in object file using
bpf_create_map(), and stores the fds into an array in 'struct
bpf_object'.

Previous patches parse ELF object file and collects required data, but
doesn't play with the kernel. They belong to the 'opening' phase. This
patch is the first patch in 'loading' phase. The 'loaded' field is
introduced in 'struct bpf_object' to avoid loading an object twice,
because the loading phase clears resources collected during the opening
which becomes useless after loading. In this patch, maps_buf is cleared.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-17-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:58 -03:00
Wang Nan
e3ed2fef22 bpf tools: Add bpf.c/h for common bpf operations
This patch introduces bpf.c and bpf.h, which hold common functions
issuing bpf syscall. The goal of these two files is to hide syscall
completely from user. Note that bpf.c and bpf.h deal with kernel
interface only. Things like structure of 'map' section in the ELF object
is not cared by of bpf.[ch].

We first introduce bpf_create_map().

Note that, since functions in bpf.[ch] are wrapper of sys_bpf, they
don't use OO style naming.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:58 -03:00
Wang Nan
340909152a bpf tools: Record map accessing instructions for each program
This patch records the indices of instructions which are needed to be
relocated. That information is saved in the 'reloc_desc' field in
'struct bpf_program'. In the loading phase (this patch takes effect in
the opening phase), the collected instructions will be replaced by map
loading instructions.

Since we are going to close the ELF file and clear all data at the end
of the 'opening' phase, the ELF information will no longer be valid in
the 'loading' phase. We have to locate the instructions before maps are
loaded, instead of directly modifying the instruction.

'struct bpf_map_def' is introduced in this patch to let us know how many
maps are defined in the object.

This is the third part of map relocation. The principle of map relocation
is described in commit message of 'bpf tools: Collect symbol table from
SHT_SYMTAB section'.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-15-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:58 -03:00
Wang Nan
b62f06e81b bpf tools: Collect relocation sections from SHT_REL sections
This patch collects relocation sections into 'struct object'.  Such
sections are used for connecting maps to bpf programs. 'reloc' field in
'struct bpf_object' is introduced for storing such information.

This patch simply store the data into 'reloc' field. Following patch
will parse them to know the exact instructions which are needed to be
relocated.

Note that the collected data will be invalid after ELF object file is
closed.

This is the second patch related to map relocation. The first one is
'bpf tools: Collect symbol table from SHT_SYMTAB section'. The
principle of map relocation is described in its commit message.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-14-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:57 -03:00
Wang Nan
a5b8bd47dc bpf tools: Collect eBPF programs from their own sections
This patch collects all programs in an object file into an array of
'struct bpf_program' for further processing. That structure is for
representing each eBPF program. 'bpf_prog' should be a better name, but
it has been used by linux/filter.h. Although it is a kernel space name,
I still prefer to call it 'bpf_program' to prevent possible confusion.

bpf_object__add_program() creates a new 'struct bpf_program' object.
It first init a variable in stack using bpf_program__init(), then if
success, enlarges obj->programs array and copy the new object in.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-13-git-send-email-wangnan0@huawei.com
[ Made bpf_object__add_program() propagate the error (-EINVAL or -ENOMEM) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:57 -03:00
Wang Nan
bec7d68cb5 bpf tools: Collect symbol table from SHT_SYMTAB section
This patch collects symbols section. This section is useful when linking
BPF maps.

What 'bpf_map_xxx()' functions actually require are map's file
descriptors (and the internal verifier converts fds into pointers to
'struct bpf_map'), which we don't know when compiling. Therefore, we
should make compiler generate a 'ldr_64 r1, <imm>' instruction, and
fill the 'imm' field with the actual file descriptor when loading in
libbpf.

BPF programs should be written in this way:

 struct bpf_map_def SEC("maps") my_map = {
    .type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(unsigned long),
    .value_size = sizeof(unsigned long),
    .max_entries = 1000000,
 };

 SEC("my_func=sys_write")
 int my_func(void *ctx)
 {
     ...
     bpf_map_update_elem(&my_map, &key, &value, BPF_ANY);
     ...
 }

Compiler should convert '&my_map' into a 'ldr_64, r1, <imm>'
instruction, where imm should be the address of 'my_map'. According to
the address, libbpf knows which map it actually referenced, and then
fills the imm field with the 'fd' of that map created by it.

However, since we never really 'link' the object file, the imm field is
only a record in relocation section. Therefore libbpf should do the
relocation:

 1. In relocation section (type == SHT_REL), positions of each such
    'ldr_64' instruction are recorded with a reference of an entry in
    symbol table (SHT_SYMTAB);

 2. From records in symbol table we can find the indics of map
    variables.

Libbpf first record SHT_SYMTAB and positions of each instruction which
required bu such operation. Then create file descriptor. Finally, after
map creation complete, replace the imm field.

This is the first patch of BPF map related stuff. It records SHT_SYMTAB
into object's efile field for further use.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-12-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:57 -03:00
Wang Nan
0b3d1efade bpf tools: Collect map definitions from 'maps' section
If maps are used by eBPF programs, corresponding object file(s) should
contain a section named 'map'. Which contains map definitions. This
patch copies the data of the whole section. Map data parsing should be
acted just before map loading.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-11-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:57 -03:00
Wang Nan
cb1e5e9619 bpf tools: Collect version and license from ELF sections
Expand bpf_obj_elf_collect() to collect license and kernel version
information in eBPF object file. eBPF object file should have a section
named 'license', which contains a string. It should also have a section
named 'version', contains a u32 LINUX_VERSION_CODE.

bpf_obj_validate() is introduced to validate object file after loaded.
Currently it only check existence of 'version' section.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-10-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:57 -03:00
Wang Nan
296036653a bpf tools: Iterate over ELF sections to collect information
bpf_obj_elf_collect() is introduced to iterate over each elf sections to
collection information in eBPF object files. This function will futher
enhanced to collect license, kernel version, programs, configs and map
information.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-9-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Wang Nan
cc4228d57c bpf tools: Check endianness and make libbpf fail early
Check endianness according to EHDR. Code is taken from
tools/perf/util/symbol-elf.c.

Libbpf doesn't magically convert missmatched endianness. Even if we swap
eBPF instructions to correct byte order, we are unable to deal with
endianness in code logical generated by LLVM.

Therefore, libbpf should simply reject missmatched ELF object, and let
LLVM to create good code.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-8-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Wang Nan
6c956392b0 bpf tools: Read eBPF object from buffer
To support dynamic compiling, this patch allows caller to pass a
in-memory buffer to libbpf by bpf_object__open_buffer(). libbpf calls
elf_memory() to open it as ELF object file.

Because __bpf_object__open() collects all required data and won't need
that buffer anymore, libbpf uses that buffer directly instead of clone a
new buffer. Caller of libbpf can free that buffer or use it do other
things after bpf_object__open_buffer() return.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-7-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Wang Nan
1a5e3fb1e9 bpf tools: Open eBPF object file and do basic validation
This patch defines basic interface of libbpf. 'struct bpf_object' will
be the handler of each object file. Its internal structure is hide to
user. eBPF object files are compiled by LLVM as ELF format. In this
patch, libelf is used to open those files, read EHDR and do basic
validation according to e_type and e_machine.

All elf related staffs are grouped together and reside in efile field of
'struct bpf_object'. bpf_object__elf_finish() is introduced to clear it.

After all eBPF programs in an object file are loaded, related ELF
information is useless. Close the object file and free those memory.

The zfree() and zclose() functions are introduced to ensure setting NULL
pointers and negative file descriptors after resources are released.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-6-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Wang Nan
b3f59d66e2 bpf tools: Allow caller to set printing function
By libbpf_set_print(), users of libbpf are allowed to register he/she
own debug, info and warning printing functions. Libbpf will use those
functions to print messages. If not provided, default info and warning
printing functions are fprintf(stderr, ...); default debug printing
is NULL.

This API is designed to be used by perf, enables it to register its own
logging functions to make all logs uniform, instead of separated
logging level control.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-5-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Wang Nan
1b76c13e4b bpf tools: Introduce 'bpf' library and add bpf feature check
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.

Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.

Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.

To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Daniel Vetter
209e4dbc8d drm/vblank: Use u32 consistently for vblank counters
In

commit 99264a61df
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Apr 15 19:34:43 2015 +0200

    drm/vblank: Fixup and document timestamp update/read barriers

I've switched vblank->count from atomic_t to unsigned long and
accidentally created an integer comparison bug in
drm_vblank_count_and_time since vblanke->count might overflow the u32
local copy and hence the retry loop never succeed.

Fix this by consistently using u32.

Cc: Michel Dänzer <michel@daenzer.net>
Reported-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-08-07 14:35:53 +02:00
Takashi Iwai
6ac7ada210 ASoC: Fixes for v4.2
There are a couple of small driver specific fixes here but the
 overwhelming bulk of these changes are fixes to the topology ABI that
 has been newly introduced in v4.2.  Once this makes it into a release we
 will have to firm this up but for now getting enhancements in before
 they've made it into a release is the most expedient thing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVxJsBAAoJECTWi3JdVIfQHUIH/2NeQyG7bL3TSIm0qK9RdmaT
 UrWyGm6AwhOrg9j9jnZkXag1v4Fsmo4nnslaDbmCKaWJPs7ol94NuWJK7z3He3nr
 47yxl2m8OhxsS7aBDACa4gJFQce+JSjKs1ADnKR7QzY9c1/nTAWLO/LE/I5Ca18u
 tZg77hviip2ftfLYdNXDZvz3HzuCcRnIouc4izTs16DXwU3aYpqcctFrcqU3nNmp
 /+5zPwIoLA7KDwPqdYhSPNRuCnFtjVbI8hTzq+aWFlxFBZYlKmSCTSwpCN2aQDgF
 1oCOP4dlv7s9Xavd2TOMJ8g3Fuyb/gjX2DhIEm7QCYxwBxdkpwm0na6gMrRJqsg=
 =NEOq
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v4.2

There are a couple of small driver specific fixes here but the
overwhelming bulk of these changes are fixes to the topology ABI that
has been newly introduced in v4.2.  Once this makes it into a release we
will have to firm this up but for now getting enhancements in before
they've made it into a release is the most expedient thing.
2015-08-07 13:53:41 +02:00
Haozhong Zhang
d7add05458 KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUST
When kvm_set_msr_common() handles a guest's write to
MSR_IA32_TSC_ADJUST, it will calcuate an adjustment based on the data
written by guest and then use it to adjust TSC offset by calling a
call-back adjust_tsc_offset(). The 3rd parameter of adjust_tsc_offset()
indicates whether the adjustment is in host TSC cycles or in guest TSC
cycles. If SVM TSC scaling is enabled, adjust_tsc_offset()
[i.e. svm_adjust_tsc_offset()] will first scale the adjustment;
otherwise, it will just use the unscaled one. As the MSR write here
comes from the guest, the adjustment is in guest TSC cycles. However,
the current kvm_set_msr_common() uses it as a value in host TSC
cycles (by using true as the 3rd parameter of adjust_tsc_offset()),
which can result in an incorrect adjustment of TSC offset if SVM TSC
scaling is enabled. This patch fixes this problem.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: stable@vger.linux.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-08-07 13:28:03 +02:00
Paolo Bonzini
18c3626e3d KVM: x86: zero IDT limit on entry to SMM
The recent BlackHat 2015 presentation "The Memory Sinkhole"
mentions that the IDT limit is zeroed on entry to SMM.

This is not documented, and must have changed some time after 2010
(see http://www.ssi.gouv.fr/uploads/IMG/pdf/IT_Defense_2010_final.pdf).
KVM was not doing it, but the fix is easy.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-08-07 12:46:32 +02:00
Vineet Gupta
1097163870 ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff
The increment of delay counter was 2 instructions:
Arithmatic Shfit Left (ASL) + set to 1 on overflow

This can be done in 1 using ROtate Left (ROL)

Suggested-by: Nigel Topham <ntopham@synopsys.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-07 13:56:16 +05:30
Jason Wang
48900cb6af virtio-net: drop NETIF_F_FRAGLIST
virtio declares support for NETIF_F_FRAGLIST, but assumes
that there are at most MAX_SKB_FRAGS + 2 fragments which isn't
always true with a fraglist.

A longer fraglist in the skb will make the call to skb_to_sgvec overflow
the sg array, leading to memory corruption.

Drop NETIF_F_FRAGLIST so we only get what we can handle.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07 00:13:10 -07:00
Mathieu Olivari
4f7eb70f7b stmmac: dwmac-ipq806x: fix static checker warning
The patch b1c17215d7: "stmmac: add ipq806x glue layer", leads to the
following static checker warning:

.../stmmac/dwmac-ipq806x.c:314 ipq806x_gmac_probe()
warn: double left shift '1 << (1 << gmac->id)'

The NSS_COMMON_CLK_SRC_CTRL_OFFSET macro is used once as an offset, and
once as a mask, which is a bug indeed. We'll fix it by defining the
offset as the real offset value and computing the mask from it when
required.

Tested on IPQ806x ref designs AP148 & DB149.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mathieu Olivari <mathieu@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07 00:12:04 -07:00
Ingo Molnar
1354ac6ad8 perf/core improvements and fixes:
User visible:
 
 - IPC and cycle accounting in 'perf annotate' (Andi Kleen)
 
 - Display cycles in branch sort mode in 'perf report' (Andi Kleen)
 
 - Add total time column to 'perf trace' syscall stats summary (Milian Woff)
 
 Infrastructure:
 
 - PMU helpers to use in Intel PT (Adrian Hunter)
 
 - Fix perf-with-kcore script not to split args with spaces (Adrian Hunter)
 
 - Add empty Build files for some more architectures (Ben Hutchings)
 
 - Move 'perf stat' config variables to a struct to allow using some
   of its functions in more places (Jiri Olsa)
 
 - Add DWARF register names for 'xtensa' arch (Max Filippov)
 
 - Implement BPF programs attached to uprobes (Wang Nan)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVxA9EAAoJENZQFvNTUqpAP88P/0I8l7DJrD4e2PpwSIuwUPII
 kq8fYMJ4OpR2XiGjyny68iZmnASIon+cV6AoidZ27eqG/+qmhzgv9nCjMlPUIpAT
 KcilxUXjOc2xba8nUdrNRKHKdcxcvp5iuw1dXkfJuf5U5l7cSTv5tko6vDaA6ngH
 bpmn8wa73ajRwtTErgBSJAwQVMPzo9Ods/FLeZK6t0hYNYNN9ISp7pq+0RhEnNVb
 gtlE1/DgGccsTs9NDWQqi3bmvCVsVMhaeWLDyCBjx/cwkwuhdhYHAfs8Llmse+51
 7adaqFQ7BZMS/8wXCUwCNnIMBBURpQodW/3H//GQ8CtBSNRt+EX8u0zHs+aJj/NR
 JqlRVOxhFFJU3E/67HDnU1IM9ANQYZq2JomQ1B+PJTUBZvUaBQtKfFlfKhI36o21
 2Sv/fsOjcZLJePBPeUVgjCmBvc0vAUBHPN23wHMyP8o6I6NTmTb3LvomZGaO7Af5
 HuebGfd92ahVPT1/h3y5lVDnjiNYikoNKJdDh8JiTTbuj8LtvhHN/o5AeAP3Ig2H
 kJEWMbSuDCdUPRGYeW4z3aDDP0/vxEH8+kXWoTSwORVZcXXbg38eoYcOPqfQQWQt
 80+Rf3sTNt8RCoKWJ0AS+nS/S2HWHrJ5G4DIdc1ldm+zL6ElkOkPVm1W7EqvCBd6
 tZP13miwhMxritYGX7pM
 =b2ho
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

  - IPC and cycle accounting in 'perf annotate'. (Andi Kleen)

  - Display cycles in branch sort mode in 'perf report'. (Andi Kleen)

  - Add total time column to 'perf trace' syscall stats summary. (Milian Woff)

Infrastructure changes:

  - PMU helpers to use in Intel PT. (Adrian Hunter)

  - Fix perf-with-kcore script not to split args with spaces. (Adrian Hunter)

  - Add empty Build files for some more architectures. (Ben Hutchings)

  - Move 'perf stat' config variables to a struct to allow using some
    of its functions in more places. (Jiri Olsa)

  - Add DWARF register names for 'xtensa' arch. (Max Filippov)

  - Implement BPF programs attached to uprobes. (Wang Nan)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-07 09:11:30 +02:00
WingMan Kwok
866b8b18e3 net: netcp: fix unused interface rx buffer size configuration
Prior to this patch, rx buffer size for each rx queue
of an interface is configurable through dts bindings.
But for an interface, the first rx queue's rx buffer
size is always the usual MTU size (plus usual overhead)
and page size for the remaining rx queues (if they are
enabled by specifying a non-zero rx queue depth dts
binding of the corresponding interface).  This patch
removes the rx buffer size configuration capability.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Acked-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07 00:02:36 -07:00
Ian Campbell
22f54bf932 net: thunderx: remove effective "default y" from Kconfig if ARCH_THUNDER=y
As well as for kernels built only for ThunderX ARCH_THUNDERX is also enabled
for kernels which support multiple platforms (such as distro kernels). Thus
"default ARCH_THUNDER" is inappropriate.

I believe default m is equally frowned upon, so remove the line completely
rather than "default m if ARCH_THUNDER".

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: Sunil Goutham <sgoutham@cavium.com>
Cc: Robert Richter <rric@kernel.org>
Cc: Derek Chickles <derek.chickles@caviumnetworks.com>
Cc: Satanand Burla <satananda.burla@caviumnetworks.com>
Cc: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Cc: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07 00:01:37 -07:00
Ivan Vecera
7ebc482202 r8169: enforce RX_MULTI_EN on rtl8168ep/8111ep chips
Enforcing this flag in RxConfig for the mentioned chips fixes netdev
watchdog issues prepended with AMD IOMMU message(s) like:
AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x001d address=0x0000000000003000 flags=0x0050]

Note that this flag is also set in Realtek's own driver for these chips.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Alexander Lindqvist <alexander@bitspace.se>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07 00:00:10 -07:00
Nikolay Aleksandrov
786c2077ec bridge: netlink: account for the IFLA_BRPORT_PROXYARP_WIFI attribute size and policy
The attribute size wasn't accounted for in the get_slave_size() callback
(br_port_get_slave_size) when it was introduced, so fix it now. Also add
a policy entry for it in br_port_policy.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 842a9ae08a ("bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 23:54:42 -07:00
Nikolay Aleksandrov
355b9f9df1 bridge: netlink: account for the IFLA_BRPORT_PROXYARP attribute size and policy
The attribute size wasn't accounted for in the get_slave_size() callback
(br_port_get_slave_size) when it was introduced, so fix it now. Also add
a policy entry for it in br_port_policy.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 958501163d ("bridge: Add support for IEEE 802.11 Proxy ARP")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 23:54:42 -07:00
David S. Miller
50e18af16f iwlwifi:
* a fix for the stuck TFD queue mechanism - it was producing
   noisy false alarms
 * a fix for the NIC prepare flow that prevented the driver
   from being able to access the device on certain systems
 * a fix for the scan prority handling which allows the
   regular scan to run even if a scheduled scan is already
   running
 
 rsi:
 
 * fix firmware load DMA regression
 
 b43:
 
 * fix extpa_gain check for 2GHz
 
 rtlwifi:
 
 * fix NULL dereference when PCI driver used as an AP
 * add missing module parameter declaration for rtl8723be_mod_params.msi_support
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJVwOsmAAoJEG4XJFUm622bWSAIAJDlW0ZKnxI+tmsbU2CYuoQS
 yfiK+oTuQE5+eB+5bNaJqSI2QSzVo11JBxnf4014wu+/gjKy2OHQ48ufXbkFfHB2
 t2o+rIx7WL5zvoy67fIifgIQSdg542e68xPc8Wz6MV58szuYscBT78dy2ZWvthqB
 S6DK8K+ohHzHYMV5fw65xkcmsXB8BEKEoUckm8ZxjsLF4Pj+ZzAgpqf+i85JYkFM
 LwktfKRxg2un9u51IhBCPZUUxe7MhLgZbSuHSP/7ltxWIGctseMAFMlCMrmi9+2B
 zXPR+2EMWLy0rgmQ04/sG6LjNdW6tmF1rCtcuKnS7hiOngubAj3UEj9d5r98/LI=
 =SqUX
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-for-davem-2015-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
iwlwifi:

* a fix for the stuck TFD queue mechanism - it was producing
  noisy false alarms
* a fix for the NIC prepare flow that prevented the driver
  from being able to access the device on certain systems
* a fix for the scan prority handling which allows the
  regular scan to run even if a scheduled scan is already
  running

rsi:

* fix firmware load DMA regression

b43:

* fix extpa_gain check for 2GHz

rtlwifi:

* fix NULL dereference when PCI driver used as an AP
* add missing module parameter declaration for rtl8723be_mod_params.msi_support
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 23:53:34 -07:00
Oleg Nesterov
7ba8bd75dd net: pktgen: don't abuse current->state in pktgen_thread_worker()
Commit 1fbe4b46ca "net: pktgen: kill the Wait for kthread_stop
code in pktgen_thread_worker()" removed (in particular) the final
__set_current_state(TASK_RUNNING) and I didn't notice the previous
set_current_state(TASK_INTERRUPTIBLE). This triggers the warning
in __might_sleep() after return.

Afaics, we can simply remove both set_current_state()'s, and we
could do this a long ago right after ef87979c27 "pktgen: better
scheduler friendliness" which changed pktgen_thread_worker() to
use wait_event_interruptible_timeout().

Reported-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 23:52:44 -07:00
Ross Lagerwall
57b229063a xen/netback: Wake dealloc thread after completing zerocopy work
Waking the dealloc thread before decrementing inflight_packets is racy
because it means the thread may go to sleep before inflight_packets is
decremented. If kthread_stop() has already been called, the dealloc
thread may wait forever with nothing to wake it. Instead, wake the
thread only after decrementing inflight_packets.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 23:42:48 -07:00
Herbert Xu
a0a2a66024 net: Fix skb_set_peeked use-after-free bug
The commit 738ac1ebb9 ("net: Clone
skb before setting peeked flag") introduced a use-after-free bug
in skb_recv_datagram.  This is because skb_set_peeked may create
a new skb and free the existing one.  As it stands the caller will
continue to use the old freed skb.

This patch fixes it by making skb_set_peeked return the new skb
(or the old one if unchanged).

Fixes: 738ac1ebb9 ("net: Clone skb before setting peeked flag")
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Brenden Blanco <bblanco@plumgrid.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:55:47 -07:00
Linus Torvalds
49d7c6559b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fix from David Miller:
 "FPU register corruption bug fix"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix userspace FPU register corruptions.
2015-08-07 05:28:24 +03:00
Linus Torvalds
8664b90bae Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "21 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
  writeback: fix initial dirty limit
  mm/memory-failure: set PageHWPoison before migrate_pages()
  mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*
  mm/memory-failure: give up error handling for non-tail-refcounted thp
  mm/memory-failure: fix race in counting num_poisoned_pages
  mm/memory-failure: unlock_page before put_page
  ipc: use private shmem or hugetlbfs inodes for shm segments.
  mm: initialize hotplugged pages as reserved
  ocfs2: fix shift left overflow
  kthread: export kthread functions
  fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()
  lib/iommu-common.c: do not use 0xffffffffffffffffl for computing align_mask
  mm/slub: allow merging when SLAB_DEBUG_FREE is set
  signalfd: fix information leak in signalfd_copyinfo
  signal: fix information leak in copy_siginfo_to_user
  signal: fix information leak in copy_siginfo_from_user32
  ocfs2: fix BUG in ocfs2_downconvert_thread_do_work()
  fs, file table: reinit files_stat.max_files after deferred memory initialisation
  mm, meminit: replace rwsem with completion
  mm, meminit: allow early_pfn_to_nid to be used during runtime
  ...
2015-08-07 05:20:40 +03:00
David S. Miller
44922150d8 sparc64: Fix userspace FPU register corruptions.
If we have a series of events from userpsace, with %fprs=FPRS_FEF,
like follows:

ETRAP
	ETRAP
		VIS_ENTRY(fprs=0x4)
		VIS_EXIT
		RTRAP (kernel FPU restore with fpu_saved=0x4)
	RTRAP

We will not restore the user registers that were clobbered by the FPU
using kernel code in the inner-most trap.

Traps allocate FPU save slots in the thread struct, and FPU using
sequences save the "dirty" FPU registers only.

This works at the initial trap level because all of the registers
get recorded into the top-level FPU save area, and we'll return
to userspace with the FPU disabled so that any FPU use by the user
will take an FPU disabled trap wherein we'll load the registers
back up properly.

But this is not how trap returns from kernel to kernel operate.

The simplest fix for this bug is to always save all FPU register state
for anything other than the top-most FPU save area.

Getting rid of the optimized inner-slot FPU saving code ends up
making VISEntryHalf degenerate into plain VISEntry.

Longer term we need to do something smarter to reinstate the partial
save optimizations.  Perhaps the fundament error is having trap entry
and exit allocate FPU save slots and restore register state.  Instead,
the VISEntry et al. calls should be doing that work.

This bug is about two decades old.

Reported-by: James Y Knight <jyknight@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 19:13:25 -07:00
Lucas Stach
14d2b7c1a9 net: fec: fix initial runtime PM refcount
The clocks are initially active and thus the device is marked active.
This still keeps the PM refcount at 0, the pm_runtime_put_autosuspend()
call at the end of probe then leaves us with an invalid refcount of -1,
which in turn leads to the device staying in suspended state even though
netdev open had been called.

Fix this by initializing the refcount to be coherent with the initial
device status.

Fixes:
8fff755e9f (net: fec: Ensure clocks are enabled while using mdio bus)

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 18:53:25 -07:00
Linus Torvalds
a58997e1a6 Merge branch 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux
Pull amdgpu fixes from Alex Deucher:
 "Just a few amdgpu fixes to make sure we report the proper firmware
  information and number of render buffers to userspace and a typo in a
  debugging function"

[ Pulling directly from Alex since Dave Airlie is on vacation  - Linus ]

* 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: set fw_version and feature_version for smu fw loading
  drm/amdgpu: add feature version for SDMA ucode
  drm/amdgpu: add feature version for RLC and MEC v2
  drm/amdgpu: increment queue when iterating on this variable.
  drm/amdgpu: fix rb setting for CZ
2015-08-07 04:51:14 +03:00
Linus Torvalds
ebc90be6b9 Merge branch 'drm-tda998x-fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull TDA998x i2c driver fixes from Russell King:
 "This fixes the double-checksumming of the AVI infoframe which was
  resulting in the checksum always being zero.  It went unnoticed as
  none of my HDMI devices had a problem with this"

[ Pulling directly from rmk since Dave Airlie is on vacation  - Linus ]

* 'drm-tda998x-fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  drm/i2c: tda998x: fix bad checksum of the HDMI AVI infoframe
2015-08-07 04:48:46 +03:00
Rabin Vincent
a50fcb512d writeback: fix initial dirty limit
The initial value of global_wb_domain.dirty_limit set by
writeback_set_ratelimit() is zeroed out by the memset in
wb_domain_init().

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:42 +03:00
Naoya Horiguchi
4491f71260 mm/memory-failure: set PageHWPoison before migrate_pages()
Now page freeing code doesn't consider PageHWPoison as a bad page, so by
setting it before completing the page containment, we can prevent the
error page from being reused just after successful page migration.

I added TTU_IGNORE_HWPOISON for try_to_unmap() to make sure that the
page table entry is transformed into migration entry, not to hwpoison
entry.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:42 +03:00
Naoya Horiguchi
f4c18e6f7b mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*
The race condition addressed in commit add05cecef ("mm: soft-offline:
don't free target page in successful page migration") was not closed
completely, because that can happen not only for soft-offline, but also
for hard-offline.  Consider that a slab page is about to be freed into
buddy pool, and then an uncorrected memory error hits the page just
after entering __free_one_page(), then VM_BUG_ON_PAGE(page->flags &
PAGE_FLAGS_CHECK_AT_PREP) is triggered, despite the fact that it's not
necessary because the data on the affected page is not consumed.

To solve it, this patch drops __PG_HWPOISON from page flag checks at
allocation/free time.  I think it's justified because __PG_HWPOISON
flags is defined to prevent the page from being reused, and setting it
outside the page's alloc-free cycle is a designed behavior (not a bug.)

For recent months, I was annoyed about BUG_ON when soft-offlined page
remains on lru cache list for a while, which is avoided by calling
put_page() instead of putback_lru_page() in page migration's success
path.  This means that this patch reverts a major change from commit
add05cecef about the new refcounting rule of soft-offlined pages, so
"reuse window" revives.  This will be closed by a subsequent patch.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:42 +03:00
Naoya Horiguchi
98ed2b0052 mm/memory-failure: give up error handling for non-tail-refcounted thp
"non anonymous thp" case is still racy with freeing thp, which causes
panic due to put_page() for refcount-0 page.  It seems that closing up
this race might be hard (and/or not worth doing,) so let's give up the
error handling for this case.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Naoya Horiguchi
a209ef09af mm/memory-failure: fix race in counting num_poisoned_pages
When memory_failure() is called on a page which are just freed after
page migration from soft offlining, the counter num_poisoned_pages is
raised twi= ce.  So let's fix it with using TestSetPageHWPoison.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00