Refactor code to avoid separate indirections for single-path
and multipath transports. All transports (both single and mp-capable)
will get a pointer to the rds_conn_path, and can trivially derive
the rds_connection from the ->cp_conn.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On 64bits kernels, device stats are 64bits wide, not 32bits.
Fixes: 80105befdb ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid recursions of dev_queue_xmit() to the wrong net device when
frames are unprotected, since at that time skb->dev still points to
our own macsec dev and unlike macsec_encrypt_finish() dev pointer
doesn't get updated to real underlying device.
Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau says:
====================
cgroup: bpf: cgroup2 membership test on skb
This series is to implement a bpf-way to
check the cgroup2 membership of a skb (sk_buff).
It is similar to the feature added in netfilter:
c38c4597e4 ("netfilter: implement xt_cgroup cgroup2 path match")
The current target is the tc-like usage.
v3:
- Remove WARN_ON_ONCE(!rcu_read_lock_held())
- Stop BPF_MAP_TYPE_CGROUP_ARRAY usage in patch 2/4
- Avoid mounting bpf fs manually in patch 4/4
- Thanks for Daniel's review and the above suggestions
- Check CONFIG_SOCK_CGROUP_DATA instead of CONFIG_CGROUPS. Thanks to
the kbuild bot's report.
Patch 2/4 only needs CONFIG_CGROUPS while patch 3/4 needs
CONFIG_SOCK_CGROUP_DATA. Since a single bpf cgrp2 array alone is
not useful for now, CONFIG_SOCK_CGROUP_DATA is also used in
patch 2/4. We can fine tune it later if we find other use cases
for the cgrp2 array.
- Return EAGAIN instead of ENOENT if the cgrp2 array entry is
NULL. It is to distinguish these two cases: 1) the userland has
not populated this array entry yet. or 2) not finding cgrp2 from the skb.
- Be-lated thanks to Alexei and Tejun on reviewing v1 and giving advice on
this work.
v2:
- Fix two return cases in cgroup_get_from_fd()
- Fix compilation errors when CONFIG_CGROUPS is not used:
- arraymap.c: avoid registering BPF_MAP_TYPE_CGROUP_ARRAY
- filter.c: tc_cls_act_func_proto() returns NULL on BPF_FUNC_skb_in_cgroup
- Add comments to BPF_FUNC_skb_in_cgroup and cgroup_get_from_fd()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
test_cgrp2_array_pin.c:
A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
pouplates/updates it with a cgroup2's backed fd and pins it to a
bpf-fs's file. The pinned file can be loaded by tc and then used
by the bpf prog later. This program can also update an existing pinned
array and it could be useful for debugging/testing purpose.
test_cgrp2_tc_kern.c:
A bpf prog which should be loaded by tc. It is to demonstrate
the usage of bpf_skb_in_cgroup.
test_cgrp2_tc.sh:
A script that glues the test_cgrp2_array_pin.c and
test_cgrp2_tc_kern.c together. The idea is like:
1. Load the test_cgrp2_tc_kern.o by tc
2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
with a cgroup fd
3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
dropped because of a match on the cgroup
Most of the lines in test_cgrp2_tc.sh is the boilerplate
to setup the cgroup/bpf-fs/net-devices/netns...etc. It is
not bulletproof on errors but should work well enough and
give enough debug info if things did not go well.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a bpf helper, bpf_skb_in_cgroup, to decide if a skb->sk
belongs to a descendant of a cgroup2. It is similar to the
feature added in netfilter:
commit c38c4597e4 ("netfilter: implement xt_cgroup cgroup2 path match")
The user is expected to populate a BPF_MAP_TYPE_CGROUP_ARRAY
which will be used by the bpf_skb_in_cgroup.
Modifications to the bpf verifier is to ensure BPF_MAP_TYPE_CGROUP_ARRAY
and bpf_skb_in_cgroup() are always used together.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations.
To update an element, the caller is expected to obtain a cgroup2 backed
fd by open(cgroup2_dir) and then update the array with that fd.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper function to get a cgroup2 from a fd. It will be
stored in a bpf array (BPF_MAP_TYPE_CGROUP_ARRAY) which will
be introduced in the later patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to commit 9b368814b3 ("net: fix bridge multicast packet checksum validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.
The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.
But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.
Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports. This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.
Reported-by: Eric Leblond <eric@regit.org>
Tested-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
Further robustify putting BPF progs
This series addresses a potential issue reported to us by Jann Horn
with regards to putting progs. First patch moves progs generally under
RCU destruction and second patch refactors getting of progs to simplify
code a bit. For details, please see individual patches. Note, we think
that addressing this one in net-next should be sufficient.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since bpf_prog_get() and program type check is used in a couple of places,
refactor this into a small helper function that we can make use of. Since
the non RO prog->aux part is not used in performance critical paths and a
program destruction via RCU is rather very unlikley when doing the put, we
shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
check, but actually not taking the ref at all (due to being in fdget() /
fdput() section of the bpf fd) is even cleaner and makes the diff smaller
as well, so just go for that. Callsites are changed to make use of the new
helper where possible.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jann Horn reported following analysis that could potentially result
in a very hard to trigger (if not impossible) UAF race, to quote his
event timeline:
- Set up a process with threads T1, T2 and T3
- Let T1 set up a socket filter F1 that invokes another filter F2
through a BPF map [tail call]
- Let T1 trigger the socket filter via a unix domain socket write,
don't wait for completion
- Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
- Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
- Let T3 close the file descriptor for F2, dropping the reference
count of F2 to 2
- At this point, T1 should have looked up F2 from the map, but not
finished executing it
- Let T3 remove F2 from the BPF map, dropping the reference count of
F2 to 1
- Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
via schedule_work()
- At this point, the BPF program could be freed
- BPF execution is still running in a freed BPF program
While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
event fd we're doing the syscall on doesn't disappear from underneath us
for whole syscall time, it may not be the case for the bpf fd used as
an argument only after we did the put. It needs to be a valid fd pointing
to a BPF program at the time of the call to make the bpf_prog_get() and
while T2 gets preempted, F2 must have dropped reference to 1 on the other
CPU. The fput() from the close() in T3 should also add additionally delay
to the reference drop via exit_task_work() when bpf_prog_release() gets
called as well as scheduling bpf_prog_free_deferred().
That said, it makes nevertheless sense to move the BPF prog destruction
generally after RCU grace period to guarantee that such scenario above,
but also others as recently fixed in ceb5607035 ("bpf, perf: delay release
of BPF prog after grace period") with regards to tail calls won't happen.
Integrating bpf_prog_free_deferred() directly into the RCU callback is
not allowed since the invocation might happen from either softirq or
process context, so we're not permitted to block. Reviewing all bpf_prog_put()
invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
their destruction) with call_rcu() look good to me.
Since we don't know whether at the time of attaching the program, we're
already part of a tail call map, we need to use RCU variant. However, due
to this, there won't be severely more stress on the RCU callback queue:
situations with above bpf_prog_get() and bpf_prog_put() combo in practice
normally won't lead to releases, but even if they would, enough effort/
cycles have to be put into loading a BPF program into the kernel already.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are a few small staging and iio driver fixes for 4.7-rc6.
Nothing major here, just a number of small fixes, all have been in
linux-next for a while, and the full details are in the shortlog.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iFYEABECABYFAld2lKYPHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfsp7EkAn2tK
KAXXdlnEs9jud1JteSphbTwFAJ4zqSIZXrNMS1vsBgujQiLP6r1xXA==
=eYY6
-----END PGP SIGNATURE-----
Merge tag 'staging-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO fixes from Greg KH:
"Here are a few small staging and iio driver fixes for 4.7-rc6.
Nothing major here, just a number of small fixes, all have been in
linux-next for a while, and the full details are in the shortlog"
* tag 'staging-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio:ad7266: Fix probe deferral for vref
iio:ad7266: Fix support for optional regulators
iio:ad7266: Fix broken regulator error handling
iio: accel: kxsd9: fix the usage of spi_w8r8()
staging: iio: accel: fix error check
staging: iio: ad5933: fix order of cycle conditions
staging: iio: fix ad7606_spi regression
iio: inv_mpu6050: Fix use-after-free in ACPI code
Here are two tty fixes for some reported issues. One resolves a crash
in devpts, and the other resolves a problem with the fbcon cursor blink
causing lockups.
Both have been in linux-next with no reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iFYEABECABYFAld2lcUPHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspo5MAnjdF
aMlMp7JeHrv9KoEa+lP6rOT6AJ4rrc/tQ0zHCksxMqEDLPNwjnVPbg==
=iFXo
-----END PGP SIGNATURE-----
Merge tag 'tty-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fixes from Greg KH:
"Here are two tty fixes for some reported issues. One resolves a crash
in devpts, and the other resolves a problem with the fbcon cursor
blink causing lockups.
Both have been in linux-next with no reported problems"
* tag 'tty-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
devpts: fix null pointer dereference on failed memory allocation
tty: vt: Fix soft lockup in fbcon cursor blink timer.
Here are a number of small USB and PHY driver fixes for 4.7-rc6.
Nothing major here, all are described in the shortlog below. All have
been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iFYEABECABYFAld2ljcPHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspN+sAn28Z
+GbDzyY5XwZ7Qs5/5uPYoGONAKCObDpgZr1A+/EMvDd57gfHWmFqTw==
=uYl3
-----END PGP SIGNATURE-----
Merge tag 'usb-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB and PHY fixes from Greg KH:
"Here are a number of small USB and PHY driver fixes for 4.7-rc6.
Nothing major here, all are described in the shortlog below. All have
been in linux-next with no reported issues"
* tag 'usb-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: don't free bandwidth_mutex too early
USB: EHCI: declare hostpc register as zero-length array
phy-sun4i-usb: Fix irq free conditions to match request conditions
phy: bcm-ns-usb2: checking the wrong variable
phy-sun4i-usb: fix missing __iomem *
phy: phy-sun4i-usb: Fix optional gpios failing probe
phy: rockchip-dp: fix return value check in rockchip_dp_phy_probe()
phy: rcar-gen3-usb2: fix unexpected repeat interrupts of VBUS change
usb: common: otg-fsm: add license to usb-otg-fsm
Three fixes:
* Fix use of smp_processor_id() in preemptible code in the IOVA
allocation code. This got introduced with the scalability
improvements in this release cycle.
* A VT-d fix for out-of-bounds access of the iommu->domains
array. The bug showed during suspend/resume.
* AMD IOMMU fix to print the correct device id in the ACPI
parsing code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJXdoyGAAoJECvwRC2XARrj4s4QANfX5U108HnHvSNZL7dOhQlQ
RkghYuAgK3Bzgo89WNoNOVat+Q0t+sdqz+4aktbEydrHXgn+8EOLfId1ixzBb0+O
SWQ1myTtfCMM0quQTG8KMPZli18aEFyWY0P+K/OGZd8YXU9u6xoKArlz8AyBxfIF
DBuHbYdc646aO763MpkX1WSjnagBEBT/bp90k26smHGoqJNBroXXAo5KEhFdXxNB
CARM99R9PjatnggU/xSXaUXv3oC+J2Qo1MgWxrARQOIJPkIXuAzS8fgItvUgcxnx
NZeUpSRBRRe+za0H3vFC7K6D8HlTQ7lhlODzP82gTzzRyG7N5A4+n7zu9p1qeT8w
fi7Hak7tjp+xQsYp2xDVcKsL3ghLVjqmDrjCBLKGRX/7CeEOIG7eTKV9hsxVUO+q
Bvjx+Lc+XKuDdLJmhGCueFxZDfBJJBLDSP7HBFRHdhVuovvhxvALik5Hd3pgC+iz
08zcHzyiLVYKI58t3K3q1OTKo0cIlsk6SWPJHFUR/B3ghcGT9eLgQjXVQIC+jaAD
RK2/6yl/zfg1mZv5tyX7kGBGB2FfUXQ48eCR/TZnzh2AmEK1P10kc6XJUXfj/hrV
eJkKnuh1FaRG/PnHzxic/T20aAYhFjp8fmKDqUgoJla8q6h73LZSOfRqgoAknDTa
0+EASaDbIsU9nHtHxEz8
=ogRx
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Three fixes:
- Fix use of smp_processor_id() in preemptible code in the IOVA
allocation code. This got introduced with the scalability
improvements in this release cycle.
- A VT-d fix for out-of-bounds access of the iommu->domains array.
The bug showed during suspend/resume.
- AMD IOMMU fix to print the correct device id in the ACPI parsing
code"
* tag 'iommu-fixes-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Initialize devid variable before using it
iommu/vt-d: Fix overflow of iommu->domains array
iommu/iova: Disable preemption around use of this_cpu_ptr()
For some reason, the FRAME_RELEASE message handling for the
default queue ended up being in the only/default queue for
non-RSS devices; fix that and handle FRAME_RELEASE properly
on the default queue for RSS devices.
Fixes: 585a6fccf5 ("iwlwifi: mvm: infrastructure for frame-release message")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Add support CSA countdown offloading. When CSA starts, the driver
specifies the offsets to the eCSA and CSA IEs in the beacon template
command and the fw performs the countdown.
The fw notifies the driver when the channel switch flow
should be performed.
Beacon sent notifications are not used anymore.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Add a warning in case packet didn't end up in the HW
destined queue.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We now have 9000 devices that support multiple frames in
a single RB. Enable it.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
For 9000 devices we can have PCIe bus for discrete
devices and IOSF bus for integrated devices.
PCIe supports maximum transfer size of 128B while IOSF
bus supports maximum transfer size of 64B.
Configure RB size accordingly.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Integrated 9000 devices have a bug with shadow registers
value retention.
If driver writes RBD registers while MAC is asleep the
values are stored in shadow registers to be copied whenever
MAC wakes up.
However, in 9000 devices a MAC wakeup is not triggered
and when the bus powers down due to inactivity the shadow
values and dirty bits are lost.
Turn on the chicken-bits that cause MAC wakeup for RX-related
values as well when the device is in D0.
When the device is in low power mode turn the RX wakeup chicken
bits off since driver is idle and this W/A is not needed.
Remove previous W/A which was ineffective.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Support queue removal in DQA mode in iwl_mvm_rm_sta() also when
the device isn't a STA connected to an AP.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
rx_phy notification is no longer sent in devices with
multiple rx queues.
All the needed data is now set in the metadata - update
code accordingly to reflect all the features as in the
previous RX path.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In multiple RX queues architecture, the RX_PHY notification
is no longer useful as it is received in the default queue
even for packets that are received on RSS queue, and cannot
be accessed without locking.
All the needed data is in the new RX packet metadata and
firmware will no longer send this notification for 9000
devices. Remove support of it.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Ucode capability bit 26 indicates support for UAPSD on P2P interface
even with a simultaneous BSS station interface, as long as both
interfaces are in the same binding. Change the name of the
capability bit to reflect that.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
This option was removed in commit 47dcf0cb10 ("[NET]: Rethink mark field
in struct flowi").
Signed-off-by: Moritz Sichert <moritz+linux@sichert.me>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There are code duplications of a masked ethernet address comparison here
so make it a separate function instead.
Miscellanea:
o Neaten alignment of FWINV macro uses to make it clearer for the reader
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
No need for a special case to handle NF_INET_POST_ROUTING, this is
basically the same handling as for prerouting, input, forward.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(Another one for the f_path debacle.)
ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.
The reason is that generic_add_lease() used filp->f_path.dentry->inode
while all the others use file_inode(). This makes a difference for files
opened on overlayfs since the former will point to the overlay inode the
latter to the underlying inode.
So generic_add_lease() added the lease to the overlay inode and
generic_delete_lease() removed it from the underlying inode. When the file
was released the lease remained on the overlay inode's lock list, resulting
in use after free.
Reported-by: Eryu Guan <eguan@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Somehow we ended up without leading spaces here, fix that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We cannot trust NSSN for AMSDU sub-frames that are not the
last.
The reason is that NSSN advances on the first sub-frame,
and may cause the reorder buffer to advance before all the
sub-frames arrive.
Example:
Reorder buffer contains SN 0 & 2.
We receive AMSDU with SN 1 and NSSN for first sub frame 3.
The result us that driver releases SN 0,1, 2.
When sub-frame 1 arrives - reorder buffer is already ahead and
it will be dropped.
If the last sub-frame is not on this queue - we will get frame
release notification with up to date NSSN.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
The new hardware that supports multiple queue also
de-aggregates A-MSDUs. This means that we can advertise
the maximal size of A-MSDUs regardless of the receive
buffer's size.
In order to be able to forcefully use a lower A-MSDU size,
add a default value for the module parameter. Pre-9000
will have a default of 4K, and 9000 will have 12K.
Setting the amsdu_size module parameter to 4K will limit
the A-MSDU on 9000 as well.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Saeed Mahameed says:
====================
Mellanox 100G mlx5 resiliency and xmit path fixes
This series provides two set of fixes to the mlx5 driver:
- Resiliency fixes for reset flow and internal pci errors
- xmit path fixes
Please consider queuing those patches for -stable (4.6).
Reset flow fixes for core driver:
- Add more commands to the list of error simulated commands
when pci errors occur
- Avoid calling sleeping function by the health poll thread
- Fix incorrect page count when in internal error
- Fix timeout in wait vital for VFs
- Deadlock fix and Timeout handling in commands interface
Reset flow and resiliency fixes for mlx5e netdev driver:
- Handle RQ flush in error cases
- Implement ndo_tx_timeout callback
- Timeout if SQ doesn't flush during close
- Log link state changes
- Validate BW weight values of ETS
xmit path fixes:
- Fix wrong fallback assumption in select queue callback
- Account for all L2 headers when copying headers into inline segment
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Link UP/Down prints to kernel log when link state changes
Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Valid weight assigned to ETS TClass values are 1-100
Fixes: 08fb1dacdd ('net/mlx5e: Support DCBNL IEEE ETS')
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The default fallback function used by mlx5e select queue can return
any TX queues in range [0..dev->num_real_tx_queues).
The current implementation assumes that the fallback function returns
a number in the range [0.. number of channels). Actually
dev->num_real_tx_queues = (number of channels) * dev->num_tc;
which is more than the expected range if num_tc is configured and could
lead to crashes.
To fix this we test if num_tc is not configured we can safely return the
fallback suggestion, if not we will reciprocal_scale the fallback
result and normalize it to the desired range.
Fixes: 08fb1dacdd ('net/mlx5e: Support DCBNL IEEE ETS')
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ConnectX4-Lx uses an inline wqe mode that currently defaults to
requiring the entire L2 header be included in the wqe.
This patch fixes mlx5e_get_inline_hdr_size() to account for
all L2 headers (VLAN, QinQ, etc) using skb_network_offset(skb).
Fixes: e586b3b0ba ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a timeout to avoid an infinite loop waiting for RQ's to flush. This
occurs during AER/EEH and will also happen if the device stops posting
completions due to internal error or reset, or if moving the RQ to the
error state fails. Also cleanup posted receive resources when closing
the RQ.
Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add callback to handle TX timeouts.
Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid an infinite loop by timing out waiting for the SQ to flush. Also
clean up the TX descriptors if that happens.
Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation does not handle timeout in case of command
with callback request, and this can lead to deadlock if the command
doesn't get fw response.
Add delayed callback timeout work before posting the command to fw.
In case of real fw command completion we will cancel the delayed work.
In case of fw command timeout the callback timeout handler will be
called and it will simulate fw completion with timeout error.
Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call command completion handler in case of timeout when working in
interrupts mode.
Avoid flushing the commands workqueue after acquiring the semaphores to
prevent a potential deadlock.
Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device ID for VFs is in a different location than PFs. This results
in the poll always timing out for VFs. There's no good way to read the
VF device ID without using the PF's configuration space. Switch to waiting
for the health poll to start incrementing. Also remove the 1s sleep
at the beginning.
fixes: 89d44f0a6c ('net/mlx5_core: Add pci error handlers to mlx5_core
driver')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>