samples/bpf currently cherry-picks object files from tools/lib/bpf
to link against. Just compile the full library and link statically
against it.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Both tools/lib/bpf/libbpf.h and samples/bpf/bpf_load.h define their
own version of struct bpf_map_def. The version in bpf_load.h has
more fields. libbpf does not support inner maps and its definition
of struct bpf_map_def lacks the related fields. Rename the definition
in bpf_load.h (samples/bpf) to avoid conflicts.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Ask the kernel to include sample time in each even instead of
reading the clock. This is also more accurate because our
clock reading was done when user space would dump the buffer,
not when sample was produced.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sync the header from include/uapi/linux/bpf.h which was updated to add
fib lookup helper function. This fixes selftests/bpf build failure.
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
'|& ...' is a bash 4.0+ construct which is not guaranteed to be available
when using '$(shell ...)' in a Makefile. Fall back to the more portable
'2>&1 | ...'.
Fixes the following warning during compilation:
/bin/sh: 1: Syntax error: "&" unexpected
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The HW doesn't support matching on frag first/later, return error if we are
asked to offload that.
Fixes: 3f7d0eb42d ("net/mlx5e: Offload TC matching on packets being IP fragments")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The host side reporting of VF vport statistics didn't include the VF
RDMA traffic.
Fixes: 3b751a2a41 ("net/mlx5: E-Switch, Introduce get vf statistics")
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reported-by: Ariel Almog <ariela@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Some platforms require IRQs to be free'd in the shutdown path. Otherwise
they will fail to be reallocated after a kexec.
Fixes: 8812c24d28 ("net/mlx5: Add fast unload support in shutdown flow")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Adds a "pci=noats" boot parameter. When supplied, all ATS related
functions fail immediately and the IOMMU is configured to not use
device-IOTLB.
Any function that checks for ATS capabilities directly against the devices
should also check this flag. Currently, such functions exist only in IOMMU
drivers, and they are covered by this patch.
The motivation behind this patch is the existence of malicious devices.
Lots of research has been done about how to use the IOMMU as protection
from such devices. When ATS is supported, any I/O device can access any
physical address by faking device-IOTLB entries. Adding the ability to
ignore these entries lets sysadmins enhance system security.
Signed-off-by: Gil Kupfer <gilkup@cs.technion.ac.il>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Fix the kernel call initiation to set the minimum security level for kernel
initiated calls (such as from kAFS) from the sockopt value.
Fixes: 19ffa01c9c ("rxrpc: Use structs to hold connection params and protocol info")
Signed-off-by: David Howells <dhowells@redhat.com>
AF_RXRPC tries to turn on IP_RECVERR and IP_MTU_DISCOVER on the UDP socket
it just opened for communications with the outside world, regardless of the
type of socket. Unfortunately, this doesn't work with an AF_INET6 socket.
Fix this by turning on IPV6_RECVERR and IPV6_MTU_DISCOVER instead if the
socket is of the AF_INET6 family.
Without this, kAFS server and address rotation doesn't work correctly
because the algorithm doesn't detect received network errors.
Fixes: 75b54cb57c ("rxrpc: Add IPv6 support")
Signed-off-by: David Howells <dhowells@redhat.com>
The expect_rx_by call timeout is supposed to be set when a call is started
to indicate that we need to receive a packet by that point. This is
currently put back every time we receive a packet, but it isn't started
when we first send a packet. Without this, the call may wait forever if
the server doesn't deign to reply.
Fix this by setting the timeout upon a successful UDP sendmsg call for the
first DATA packet. The timeout is initiated only for initial transmission
and not for subsequent retries as we don't want the retry mechanism to
extend the timeout indefinitely.
Fixes: a158bdd324 ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
David Ahern says:
====================
Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet is expected to continue up the stack
for full processing.
The response from a FIB and neighbor lookup is either the egress index
with the bpf_fib_lookup struct filled in with dmac and gateway or
0 meaning the packet should continue up the stack. In time we can
revisit this to return the FIB lookup result errno if it is one of the
special RTN_'s such as RTN_BLACKHOLE (-EINVAL) so that the XDP
programs can do an early drop if desired.
Patches 1-6 do some more refactoring to IPv6 with the end goal of
extracting a FIB lookup function that aligns with fib_lookup for IPv4,
basically returning a fib6_info without creating a dst based entry.
Patch 7 adds lookup functions to the ipv6 stub. These are needed since
bpf is built into the kernel and ipv6 may not be built or loaded.
Patch 8 adds the bpf helper and 9 adds a sample program.
v3
- remove ETH_ALEN and in6_addr from uapi header
v2
- removed pkt_access from bpf_func_proto as noticed by Daniel
- added check in that IPv6 forwarding is enabled
- added DaveM's ack on patches 1-7 and 9 based on v1 response and
fact that no changes were made to them in v2
v1
- updated commit messages and cover letter
- added comment to sample program noting lack of verification on
egress device supporting XDP
RFC v2
- fixed use of foward helper from cls_act as noted by Daniel
- in patch 1 rename fib6_lookup_1 as well for consistency
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Simple example of fast-path forwarding. It has a serious flaw
in not verifying the egress device index supports XDP forwarding.
If the egress device does not packets are dropped.
Take this only as a simple example of fast-path forwarding.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet continues up the stack.
If it is to be forwarded, the forwarding can be done directly if the
neighbor is already known. If the neighbor does not exist, the first
few packets go up the stack for neighbor resolution. Once resolved, the
xdp program provides the fast path.
On successful lookup the nexthop dmac, current device smac and egress
device index are returned.
The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6
are implemented in this patch. The API includes layer 4 parameters if
the XDP program chooses to do deep packet inspection to allow compare
against ACLs implemented as FIB rules.
Header rewrite is left to the XDP program.
The lookup takes 2 flags:
- BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes
straight to the table associated with the device (expert setting for
those looking to maximize throughput)
- BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective.
Default is an ingress lookup.
Initial performance numbers collected by Jesper, forwarded packets/sec:
Full stack XDP FIB lookup XDP Direct lookup
IPv4 1,947,969 7,074,156 7,415,333
IPv6 1,728,000 6,165,504 7,262,720
These number are single CPU core forwarding on a Broadwell
E5-1650 v4 @ 3.60GHz.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table,
a stub to do a lookup in a specific table, fib6_table_lookup, and
a stub for a full route lookup.
The stubs are needed for core bpf code to handle the case when the
IPv6 module is not builtin.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Similar to IPv4, IPv6 should use the FIB lookup result in the
tracepoint.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add IPv6 equivalent to fib_lookup. Does a fib lookup, including rules,
but returns a FIB entry, fib6_info, rather than a dst based rt6_info.
fib6_lookup is any where from 140% (MULTIPLE_TABLES config disabled)
to 60% faster than any of the dst based lookup methods (without custom
rules) and 25% faster with custom rules (e.g., l3mdev rule).
Since the lookup function has a completely different signature,
fib6_rule_action is split into 2 paths: the existing one is
renamed __fib6_rule_action and a new one for the fib6_info path
is added. fib6_rule_action decides which to call based on the
lookup_ptr. If it is fib6_table_lookup then the new path is taken.
Caller must hold rcu lock as no reference is taken on the returned
fib entry.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Move source address lookup from fib6_rule_action to a helper. It will be
used in a later patch by a second variant for fib6_rule_action.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
ip6_pol_route is used for ingress and egress FIB lookups. Refactor it
moving the table lookup into a separate fib6_table_lookup that can be
invoked separately and export the new function.
ip6_pol_route now calls fib6_table_lookup and uses the result to generate
a dst based rt6_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Rename rt6_multipath_select to fib6_multipath_select and export it.
A later patch wants access to it similar to IPv4's fib_select_path.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Rename fib6_lookup to fib6_node_lookup to better reflect what it
returns. The fib6_lookup name will be used in a later patch for
an IPv6 equivalent to IPv4's fib_lookup.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For me, as a reader whose mother language isn't English, the
old words bring a little difficulty to catch the meaning, this
patch rewords the subsection in a more clarificatory way.
This patch also add blank lines as separator at two places
to improve readability.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEENrCndlB/VnAEWuH5k9IU1zQoZfEFAlr0dbcTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRCT0hTXNChl8Vj0CAC3JNk7QXU+WIEwtdKZU8GW1z+gBtXb
xpaidY91djsj/2L3xzgIF+gRL6BHFQnK+0ylqmHsk38QVijl6SsWp8LPOy35u1wN
yGdlEZNvnajguWENUr8cnNAtkICa7b1JR6Eyqt8ZY5Ugns2G+js6tqX3FCxkpu2I
ZRMheSJ6tQcw1SjTQgC6rhsYFipSxOEdqNzdLDo3K4Ttmb2osoHnBcVZllUwZeTu
iSUktcCjrbv24JNkyf1HE5wt8X3zT5nnSiAPs6JW3xwcS/kw4QOsZsFx7JwoVssB
oulQ1YGz/uT13rPh40MsPNyiG65BpQCCu3JKmgJGF4Kbmd8CO3NyzA17
=G0cT
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-4.17-20180510' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
this is a pull request for net/master consisting of 2 patches.
Both patches are from Lukas Wunner and fix two problems found in the
hi311x CAN driver under high load situations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to spelling mistake in DP_VERBOSE message text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add sg table initialization to fix a BUG_ON encountered when enabling
CONFIG_DEBUG_SG.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix the wrong conversion where 1 Mbps was converted to
1024 Kbps.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: Support VLAN devices in mirroring offloads
Petr says:
When offloading "tc action mirred mirror", there are several scenarios
where VLAN devices can show up, that mlxsw can offload on Spectrum
machines.
I) A direct mirror to a VLAN device on top of a front-panel port device
(commonly referred to as "RSPAN")
II) VLAN device in egress path of a packet when resolving a mirror to
gretap or ip6gretap netdevice.
Specifically in the latter case, the following are the cases that can be
offloaded:
IIa) VLAN device directly above a physical device.
IIb) A VLAN-unaware bridge where the egress device is as in IIa.
IIc) VLAN device on top of a VLAN-aware bridge where the egress device
is a physical device.
This patch set implements all the above cases.
First, in patch #1, br_vlan_get_info() is extended to allow bridge
master argument.
Case I is then implemented in patches #2 and #3, case II in patch #4.
Note that handling of VLAN protocol is not included. In case I, mirrored
packets may end up being double-tagged, and it might be reasonable for
the outer tag to be an 802.1ad. However, the protocol type configuration
would have to be put on the same VLAN netdevice that represents normal
VLAN traffic, and mlxsw currently ignores this setting in that case. Thus
this support was left out and the encapsulation always uses 802.1q
protocol.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When mirroring to a gretap or ip6gretap device, allow the underlay
packet path to include VLAN devices. The following configurations are
supported in underlay:
- vlan over phys
- vlan-unaware bridge where the egress device is vlan over phys
- vlan over vlan-aware bridge where the egress device is phys
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Offload "tc action mirred mirror" to a device that is a vlan device on
top of a front-panel port device. The hardware encapsulates the mirrored
packets in a VLAN tag. That includes the case that the mirrored traffic
is already VLAN-tagged--in that case the monitor traffic will be
double-tagged, just like in the software path.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH to support VLAN-encapsulated
port mirroring.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mirroring offload in mlxsw needs to check that a given VLAN is allowed
to ingress the bridge device. br_vlan_get_info() is the function that is
used for this, however currently it only supports bridge port devices.
Extend it to support bridge masters as well.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Commit 1f45f78f8e ("sctp: allow GSO frags to access the chunk too"),
it held the chunk in sctp_ulpevent_make_rcvmsg to access it safely later
in recvmsg. However, it also added sctp_chunk_put in fail_mark err path,
which is only triggered before holding the chunk.
syzbot reported a use-after-free crash happened on this err path, where
it shouldn't call sctp_chunk_put.
This patch simply removes this call.
Fixes: 1f45f78f8e ("sctp: allow GSO frags to access the chunk too")
Reported-by: syzbot+141d898c5f24489db4aa@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The infrastructure that applies PCI quirks was buried in the middle of the
quirks themselves (at one time it was probably at the end of the file, but
new quirks tend to be added at the end of the file). Move it all to the
top of the file so it's easy to find. No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If an error occurs, 'mlx4_en_destroy_netdev()' is called.
It then calls 'mlx4_en_free_resources()' which does the needed resources
cleanup.
So, doing some explicit kfree in the error handling path would lead to
some double kfree.
Simplify code to avoid such a case.
Fixes: 67f8b1dcb9 ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This version has some suggestions by Eric Dumazet:
- Use a local variable for the mark in IPv6 instead of ctl_sk to avoid SMP
races.
- Use the more elegant "IP4_REPLY_MARK(net, skb->mark) ?: sk->sk_mark"
statement.
- Factorize code as sk_fullsock() check is not necessary.
Aidan McGurn from Openwave Mobility systems reported the following bug:
"Marked routing is broken on customer deployment. Its effects are large
increase in Uplink retransmissions caused by the client never receiving
the final ACK to their FINACK - this ACK misses the mark and routes out
of the incorrect route."
Currently marks are added to sk_buffs for replies when the "fwmark_reflect"
sysctl is enabled. But not for TW sockets that had sk->sk_mark set via
setsockopt(SO_MARK..).
Fix this in IPv4/v6 by adding tw->tw_mark for TIME_WAIT sockets. Copy the the
original sk->sk_mark in __inet_twsk_hashdance() to the new tw->tw_mark location.
Then progate this so that the skb gets sent with the correct mark. Do the same
for resets. Give the "fwmark_reflect" sysctl precedence over sk->sk_mark so that
netfilter rules are still honored.
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
INET_CSK_DEBUG is always set and only is used for 2 pr_debug calls.
EXPORT_SYMBOL(inet_csk_timer_bug_msg) is only used by these 2
pr_debug calls and is also unnecessary as the exported string can
be used directly by these calls.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The location of the dt bindings file is wrong: it was probably
badly renamed by some script.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The original shell script works, but:
1) it is too slow;
2) it is hard to exclude rejex patterns
Convert it to perl.
Here, the new version is able to check the entire tree in
less than a second (after cached):
real 0m0,284s
user 0m0,668s
sys 0m0,778s
The old version takes more than a minute to complete (also
after cached):
real 1m17,905s
user 0m25,583s
sys 0m55,334s
It also produce less false-positives (if any).
The new script also contains an auto-fix mode.
Usually, file references get lost when they're moved to some other
place and/or renamed to .rst.
Add an experimental mode to auto-fix those.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Add a blank line to avoid this Sphinx warning:
./drivers/w1/w1_io.c:197: WARNING: Definition list ends without a blank line; unexpected unindent.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Trivial patch to adjust the text formatting to wrap at 80 columns. No
actual content has changed.
Signed-off-by: Justin Skists <justin.skists@juzza.co.uk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The cgroup-v2.txt is already in ReST format. So, move it to the
admin-guide, where it belongs.
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The removal of this file appears to have been premature; it's not a feature
enabled by Kconfig, but it's a arch-level feature regardless. Put it back
for now until some happy future time when we decide how we really want to
document such features.
This reverts commit 2bef69a385.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>