linux-pinenote

Author	SHA1	Message	Date
Yuchung Cheng	ce3cf4ec03	tcp: record TLP and ER timer stats in v6 stats The v6 tcp stats scan do not provide TLP and ER timer information correctly like the v4 version . This patch fixes that. Fixes: `6ba8a3b19e` ("tcp: Tail loss probe (TLP)") Fixes: `eed530b6c6` ("tcp: early retransmit") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 17:12:22 -07:00
Daniel Borkmann	92c075dbde	net: sched: fix tc_should_offload for specific clsact classes When offloading classifiers such as u32 or flower to hardware, and the qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes, since not all of them handle ingress, therefore we must leave those in software path. Add a .tcf_cl_offload() callback, so we can generically handle them, tested on ixgbe. Fixes: `10cbc68434` ("net/sched: cls_flower: Hardware offloaded filters statistics support") Fixes: `5b33f48842` ("net/flower: Introduce hardware offload support") Fixes: `a1b7c5fd7f` ("net: sched: add cls_u32 offload hooks for netdevs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:59:53 -07:00
WANG Cong	a03e6fe569	act_police: fix a crash during removal The police action is using its own code to initialize tcf hash info, which makes us to forgot to initialize a->hinfo correctly. Fix this by calling the helper function tcf_hash_create() directly. This patch fixed the following crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91 PGD d3c34067 PUD d3e18067 PMD 0 Oops: 0000 [#1] SMP CPU: 2 PID: 853 Comm: tc Not tainted 4.6.0+ #87 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800d3e28040 ti: ffff8800d3f6c000 task.ti: ffff8800d3f6c000 RIP: 0010:[<ffffffff810c099f>] [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91 RSP: 0000:ffff88011b203c80 EFLAGS: 00010002 RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028 RBP: ffff88011b203d40 R08: 0000000000000001 R09: 0000000000000000 R10: ffff88011b203d58 R11: ffff88011b208000 R12: 0000000000000001 R13: ffff8800d3e28040 R14: 0000000000000028 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 00000000d4be1000 CR4: 00000000000006e0 Stack: ffff8800d3e289c0 0000000000000046 000000001b203d60 ffffffff00000000 0000000000000000 ffff880000000000 0000000000000000 ffffffff00000000 ffffffff8187142c ffff88011b203ce8 ffff88011b203ce8 ffffffff8101dbfc Call Trace: <IRQ> [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1 [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35 [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35 [<ffffffff810a9604>] ? sched_clock_local+0x11/0x78 [<ffffffff810bf6a1>] ? mark_lock+0x24/0x201 [<ffffffff810c1dbd>] lock_acquire+0x120/0x1b4 [<ffffffff810c1dbd>] ? lock_acquire+0x120/0x1b4 [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1 [<ffffffff81aad89f>] _raw_spin_lock_bh+0x3c/0x72 [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1 [<ffffffff8187142c>] __tcf_hash_release+0x77/0xd1 [<ffffffff81871a27>] tcf_action_destroy+0x49/0x7c [<ffffffff81870b1c>] tcf_exts_destroy+0x20/0x2d [<ffffffff8189273b>] u32_destroy_key+0x1b/0x4d [<ffffffff81892788>] u32_delete_key_freepf_rcu+0x1b/0x1d [<ffffffff810de3b8>] rcu_process_callbacks+0x610/0x82e [<ffffffff8189276d>] ? u32_destroy_key+0x4d/0x4d [<ffffffff81ab0bc1>] __do_softirq+0x191/0x3f4 Fixes: `ddf97ccdd7` ("net_sched: add network namespace support for tc actions") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:38:59 -07:00
David S. Miller	34fe76abbe	Merge branch 'net-sched-fast-stats' Eric Dumazet says: ==================== net: sched: faster stats gathering A while back, I sent one RFC patch using lockless stats gathering on 64bit arches. This patch series does it more cleanly, using a seqcount. Since qdisc/class stats are written at dequeue() time, we can ask the dequeue to change the seqcount, so that stats readers can avoid taking the root qdisc lock, and instead the typical read_seqcount_{begin\|retry} guarded loop. This does not change fast path costs, as the seqcount increments are not more expensive than the bit manipulation, and allows readers to not freeze the fast path anymore. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:37:14 -07:00
Eric Dumazet	edb09eb17e	net: sched: do not acquire qdisc spinlock in qdisc/class stats dump Large tc dumps (tc -s {qdisc\|class} sh dev ethX) done by Google BwE host agent [1] are problematic at scale : For each qdisc/class found in the dump, we currently lock the root qdisc spinlock in order to get stats. Sampling stats every 5 seconds from thousands of HTB classes is a challenge when the root qdisc spinlock is under high pressure. Not only the dumps take time, they also slow down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases. An audit of existing qdiscs showed that sch_fq_codel is the only qdisc that might need the qdisc lock in fq_codel_dump_stats() and fq_codel_dump_class_stats() In v2 of this patch, I now use the Qdisc running seqcount to provide consistent reads of packets/bytes counters, regardless of 32/64 bit arches. I also changed rate estimators to use the same infrastructure so that they no longer need to lock root qdisc lock. [1] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kevin Athey <kda@google.com> Cc: Xiaotian Pei <xiaotian@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:37:14 -07:00
Eric Dumazet	f9eb8aea2a	net_sched: transform qdisc running bit into a seqcount Instead of using a single bit (__QDISC___STATE_RUNNING) in sch->__state, use a seqcount. This adds lockdep support, but more importantly it will allow us to sample qdisc/class statistics without having to grab qdisc root lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:37:13 -07:00
Eric Dumazet	aafddbf0cf	fq_codel: return non zero qlen in class dumps We properly scan the flow list to count number of packets, but John passed 0 to gnet_stats_copy_queue() so we report a zero value to user space instead of the result. Fixes: `6401585366` ("net: sched: restrict use of qstats qlen") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:28:11 -07:00
David S. Miller	064d5e6f8e	Merge branch 'u32-hwoffload-fixes' Jakub Kicinski says: ==================== cls_u32 hardware offload fixes This set fixes two small issues with error codes I noticed in cls_u32. Second patch could be viewed as user space API change but that portion of API is not part of any release, yet. Compile tested only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:27:15 -07:00
Jakub Kicinski	d47a0f387f	net: cls_u32: be more strict about skip-sw flag Return an error if user requested skip-sw and the underlaying hardware cannot handle tc offloads (or offloads are disabled). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:27:14 -07:00
Jakub Kicinski	1a0f7d2984	net: cls_u32: fix error code for invalid flags 'err' variable is not set in this test, we would return whatever previous test set 'err' to. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:27:14 -07:00
Colin Ian King	7b01b8e847	gtp: #define _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__ Fix clang build warning: ./include/uapi/linux/gtp.h:1:9: warning: '_UAPI_LINUX_GTP_H_' is used as a header guard here, followed by #define of a different macro [-Wheader-guard] fix by defining _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__ Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:25:49 -07:00
Linus Torvalds	2051877c4c	This finally removes the CLK_IS_ROOT flag by picking up the last few stragglers that didn't get merged by anyone this time around. Better to do it now than wait for another one to pop up. There's also a minor maintainers update and a Kconfig fix. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXVoIpAAoJEK0CiJfG5JUlk/wP/Ro3dPTTJW8tf1kabMWwYRym PRsKeBNUbiAbbiDdYFcDVgVrxMpkeRQX+qoTPT37FypMbyDnu+rIEeWqHyyNCdzR 4+di548c8XzStBMPNGaKG+WWVDOU/rRWGrun1vc2NR8JohgWFBx8ciV9Kht4g+Ss 5ggm0E/ZKV5Hj7SuiBVbzMsZ/jufDM/V9NeIHy5Gnz6dPuRBkzrvwu9obJ/QLCWE mh7eRug4C+6xYaQrPbXzgxTXqRJQkk/M27ArodVhvZy16gPr70HC+oNGUJwHk+Fs yiqx9wicQuxNQqibgOC087RjUTDfFcGLdV71ouIQQhWuZFdQlHr9RfKaq+v1g/DB s3n8whjHJAukU4i34btG3Mq1UcoLTL4vkOYMW+2yjvUfdUdY5BtKGphrkPO5xKMP 4hpAKkNW3ViTLn3cJQMuk5OgzPr0XrVjd++GtU7XjczzDKx8j9vTbhyZL0mRl+6s jx8GU4hGuEkuhBIfWENNe2W2rf4TBrfQeiLsJt9nLFY4yqJRNByplkMmL75in/cD PzbF647286PJYJdhjP0n70E2jyZbfyGYaUdZ9rbuwbEtA3XpOq4ZiWG0ZqPi7aOf UickP3QW0AoY4Y0QhZ+thTcNxAZPPq6IfEzFNvzGXArR6msQLYzF9Y1dQ/HuXmZP +tyYKKBCZbKObv463cM3 =JewH -----END PGP SIGNATURE----- Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "This finally removes the CLK_IS_ROOT flag by picking up the last few stragglers that didn't get merged by anyone this time around. Better to do it now than wait for another one to pop up. There's also a minor maintainers update and a Kconfig fix" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: nxp: Select MFD_SYSCON for creg driver MAINTAINERS: Add file patterns for clock device tree bindings clk: Remove CLK_IS_ROOT flag clk: microchip: Remove CLK_IS_ROOT powerpc/512x: clk: Remove CLK_IS_ROOT vexpress/spc: Remove CLK_IS_ROOT	2016-06-07 16:24:44 -07:00
David S. Miller	64151ae36e	Merge branch 'be2net-noncrit-fixes' Sathya Perla says: ==================== be2net: patch set Hi David, the following patch set contains three non-critical fixes that can go into the net-next tree. Patch 1 fixes the logic for provisioning queue pairs on VFs to take into account the limit on number of TXQs too as in some profiles the number of TXQs is less than that of RXQs. Patch 2 enables WoL support from shutdown on Skyhawk. Patch 3 enhances the logic for provisioning queue pairs on VFs on SR-IOV over multi-partition configs. Each PF (partition) on a port has to compute the number of RSS tables it's VFs can use. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:18:20 -07:00
Somnath Kotur	de2b1e0366	be2net: Fix provisioning of RSS for VFs in multi-partition configurations Currently, we do not distribute queue resources to enable RSS for VFs in multi-channel/partition configurations. Fix this by having each PF(SRIOV capable) calculate it's share of the 15 RSS Policy Tables available per port before provisioning resources for all the VFs. This proportional share calculation is done based on division of the PF's MAX VFs with the Total MAX VFs on that port. It also needs to learn about the no: of NIC PFs on the port and subtract that from the 15 RSS Policy Tables on the port. Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:18:20 -07:00
Sriharsha Basavapatna	45f13df75f	be2net: Enable Wake-On-LAN from shutdown for Skyhawk Skyhawk does support wake-up from ACPI shutdown state - S5, provided the platform supports it (like Auxiliary power source etc). The changes listed below are done to fix this. 1) There's no need to defer the HW configuration of WOL to be_suspend(). Remove this in be_suspend() and move it to be_set_wol() ethtool function so it is configured directly in the context of ethtool. This automatically takes care of the shutdown case. 2) The driver incorrectly uses WOL_CAP field in the FW response to get_acpi_wol_cap() command, to determine if WOL is enabled. Instead the driver must rely on the macaddr field in the response to infer WOL state. 3) In be_get_config() during init, if we find that WOL is enabled in FW, call pci_enable_wake() to enable pmcsr.pme_en bit. This is needed to support persistent WOL configuration provided by the FW in some platforms. 4) Remove code in be_set_wol() that writes to PCICFG_PM_CONTROL_OFFSET to set pme_en bit; pci_enable_wake() sets that. Fixes: `028991e49` ("Enabling Wake-on-LAN is not supported in S5 state") Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:18:19 -07:00
Suresh Reddy	b9263cbf21	be2net: use max-TXQs limit too while provisioning VF queue pairs When the PF driver provisions resources for VFs, it currently only looks at max RSS queues available to calculate the number of VF queue pairs. This logic breaks when there are less number of TX-queues than RSS-queues. This patch fixes this problem by using the max-TXQs available in the PF-pool in the calculations. As a part of this change the be_calculate_vf_qs() routine is renamed as be_calculate_vf_res() and the code that calculates limits on other related resources is moved here to contain all resource calculation code inside one routine. Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:18:19 -07:00
Colin Ian King	9f647a6de9	net: fec: fix spelling mistakes and add missing newline trivial fix to spelling mistakes and add missing newline in pr_err messages Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:15:59 -07:00
David S. Miller	71743ffa15	Merge branch 'bnxt_en-fixes' Michael Chan says: ==================== bnxt_en: Bug fixes. Fix a race condition and VLAN rx acceleration logic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:02:04 -07:00
Michael Chan	8852ddb4dc	bnxt_en: Simplify VLAN receive logic. Since both CTAG and STAG rx acceleration must be enabled together, we only need to check one feature flag (NETIF_F_HW_VLAN_CTAG_RX) before calling __vlan_hwaccel_put_tag(). Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:02:03 -07:00
Michael Chan	5a9f6b238e	bnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together. The hardware can only be set to strip or not strip both the VLAN CTAG and STAG. It cannot strip one and not strip the other. Add logic to bnxt_fix_features() to toggle both feature flags when the user is toggling one of them. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:02:03 -07:00
Michael Chan	b9a8460a08	bnxt_en: Fix tx push race condition. Set the is_push flag in the software BD before the tx data is pushed to the chip. It is possible to get the tx interrupt as soon as the tx data is pushed. The tx handler will not handle the event properly if the is_push flag is not set and it will crash. Signed-off-by: Michael Chan <michael.chan@broadocm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:02:03 -07:00
Zhao Qiang	c19b6d246a	drivers/net: support hdlc function for QE-UCC The driver add hdlc support for Freescale QUICC Engine. It support NMSI and TSA mode. Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:56:31 -07:00
Zhao Qiang	35ef1c20fd	fsl/qe: Add QE TDM lib QE has module to support TDM, some other protocols supported by QE are based on TDM. add a qe-tdm lib, this lib provides functions to the protocols using TDM to configurate QE-TDM. Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:56:31 -07:00
Zhao Qiang	19163ac312	fsl/qe: Make regs resouce_size_t Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:56:31 -07:00
Zhao Qiang	bb8b2062af	fsl/qe: setup clock source for TDM mode Add tdm clock configuration in both qe clock system and ucc fast controller. Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:56:30 -07:00
Zhao Qiang	68f047e3d6	fsl/qe: add rx_sync and tx_sync for TDM mode Rx_sync and tx_sync are used by QE-TDM mode, add them to struct ucc_fast_info. Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:56:30 -07:00
Jamal Hadi Salim	0b0f43fe2e	net sched: indentation and other OCD stylistic fixes Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>	2016-06-07 15:53:54 -07:00
David S. Miller	be11991368	Merge branch 'sch-action-tstamp' Jamal Hadi Salim says: ==================== net sched action timestamp improvements Various aggregations of duplicated code, fixes and introduction of firstused timestamp v2: add const for source time info per suggestion from Cong ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:53:44 -07:00
Jamal Hadi Salim	48d8ee1694	net sched actions: aggregate dumping of actions timeinfo Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:53:43 -07:00
Jamal Hadi Salim	53eb440f4a	net sched actions: introduce timestamp for firsttime use Useful to know when the action was first used for accounting (and debugging) Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:53:43 -07:00
Jamal Hadi Salim	9c4a4e488b	net sched: actions use tcf_lastuse_update for consistency Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:53:43 -07:00
Amir Vadai	e69985c67c	net/sched: cls_flower: Introduce support in SKIP SW flag In order to make a filter processed only by hardware, skip_sw flag should be supplied. This is an addition to the already existing skip_hw flag (filter will be processed by software only). If no flag is specified, filter will be processed by both software and hardware. If only hardware offloaded filters exist, fl_classify() will return without doing anything. A following userspace patch will be sent once kernel patch is accepted. Example: tc filter add dev enp0s9 protocol ip prio 20 parent ffff: \ flower \ ip_proto 6 \ indev enp0s9 \ skip_sw \ action skbedit mark 0x1234 Signed-off-by: Amir Vadai <amirva@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:49:53 -07:00
David S. Miller	919f274fd6	Merge branch 'qed-iov-fw-reqs' Yuval Mintz says: ==================== qed: IOV series - relax firmware requirements In order for VFs to work, current implementation demands that the VF's requried storm firmware would be exactly the version that was loaded by the PF, which is a very harsh requirement. This patch series is intended to relax this - the recently submitted firmware is intended to be forward/backward compatible in its fastpath [slowpath is configured by PF on behalf of VF], and so VFs would only be required of having the same major faspath HSI in order to work. Most of the other patches in this series extend current forward compatibilty of driver to reduce chance of breaking PF/VF compatibility in the future. A few are unrelated IOV changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:40:12 -07:00
Yuval Mintz	54fdd80f6f	qed: PF to reply to unknown messages If a future VF would send the PF an unknown message, the PF today would not send a reply. This would have 2 bad effects: a. VF would have to timeout on the request. b. If VF were to send an additional message to PF, firmware would mark it as malicious. Instead, if there's some valid reply-address on the message - let the PF answer and tell the VF it doesn't know the message. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:40:12 -07:00
Yuval Mintz	8246d0b48b	qed: PF enforce MAC limitation of VFs The only limitation relating to MACs the PF enforce today on its VFs is in case it has a forced-unicast MAC address for them, in which case they can't configure other unicast addresses. Specifically, the PF isn't enforcing the number of MAC addresse a VF can configure regardless of the nubmer of such filters agreed upon by PF and VF during the acquisition process. PF's shadow-config is now extended to also contain information about its VFs' unicast addresses configuration, allowing such enforcement. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:40:12 -07:00
Yuval Mintz	5040acf537	qed: Move doorbell calculation from VF to PF Today, the VF is aware of its queues context-ids, and calculates the doorbell address when opening its queues on its own. The configuration of doorbells in HW can sometime in the future be changed by the PF [hw has several configurable features that might affect doorbell addresses, e.g., dpm support], this would break compatibility with older VFs as their calculated doorbell addresses would be incorrect for such a configuration. In order to avoid such a backward compatibility failure, let the PF make the calculation of the doorbell offset based on the context-id, and pass that to the VF. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:40:11 -07:00
Yuval Mintz	41086467d6	qed: Make PF more robust against malicious VF There are several requests the VF can make toward the PF which the driver would pass to firmware without checking the validity first - specifically, opening queues and updating vports. Such configurations might cause the firmware to assert. This adds validation of the legality of said configurations on the PF side before passing it onward via ramrod to firmware. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:40:11 -07:00
Yuval Mintz	1cf2b1a971	qed: PF-VF resource negotiation One of the goals of the vf's first message to the PF [acquire] is to learn about the number of resources available to it [macs, vlans, etc.]. This is done via negotiation - the VF requires a set of resources, which the PF either approves or disaproves and sends a smaller set of resources as alternative. In this later case, the VF is then expected to either abort the probe or re-send the acquire message with less required resources. While this infrastructure exists since the initial submision of qed SRIOV support, it's in fact completely inoperational - PF isn't really looking into the resources the VF has asked for and is never going to reply to the VF that it lacks resources. This patch addresses this flow, fixing it and allowing the PF and VF to actually agree on a set of resources. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:40:11 -07:00
Yuval Mintz	1fe614d10f	qed: Relax VF firmware requirements Current driver require an exact match between VF and PF storm firmware; Any difference would fail the VF acquire message, causing the VF probe to be aborted. While there's still dependencies between the two, the recent FW submission has relaxed the match requirement - instead of an exact match, there's now a 'fastpath' HSI major/minor scheme, where VFs and PFs that match in their major number can co-exist even if their minor is different. In order to accomadate this change some changes in the vf-start init flow had to be made, as the VF start ramrod now has to be sent only after PF learns which fastpath HSI its VF is requiring. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:40:11 -07:00
Eric Dumazet	3bcb846ca4	net: get rid of spin_trylock() in net_tx_action() Note: Tom Herbert posted almost same patch 3 months back, but for different reasons. The reasons we want to get rid of this spin_trylock() are : 1) Under high qdisc pressure, the spin_trylock() has almost no chance to succeed. 2) We loop multiple times in softirq handler, eventually reaching the max retry count (10), and we schedule ksoftirqd. Since we want to adhere more strictly to ksoftirqd being waked up in the future (https://lwn.net/Articles/687617/), better avoid spurious wakeups. 3) calls to __netif_reschedule() dirty the cache line containing q->next_sched, slowing down the owner of qdisc. 4) RT kernels can not use the spin_trylock() here. With help of busylock, we get the qdisc spinlock fast enough, and the trylock trick brings only performance penalty. Depending on qdisc setup, I observed a gain of up to 19 % in qdisc performance (1016600 pps instead of 853400 pps, using prio+tbf+fq_codel) ("mpstat -I SCPU 1" is much happier now) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:32:03 -07:00
Wu Fengguang	fa54cc70ed	rxrpc: fix ptr_ret.cocci warnings net/rxrpc/rxkad.c:1165:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci CC: David Howells <dhowells@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:30:21 -07:00
David S. Miller	29a36611e9	Merge branch 'rds-packet-assembly-fixes' Sowmini Varadhan says: ==================== RDS: TCP: socket locking RDS packet assembly fixes This three part patchset fixes bugs in synchronization between rds_tcp_accept_one() and the rds-tcp send/recv path. Patch 1 ensures that the lock_sock() is taken appropriately and the RDS datagram reassembly state is reset to synchronize with the receive path. Patch 2 ensures that partially sent RDS datagrams will get retransmitted after rds_tcp_accept_one() switches sockets. Patch 3 fixes a race window which would prematurely re-enable rds_send_xmit() before the rds_tcp_connection setup has been completed in rds_tcp_accept_one(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:10:16 -07:00
Sowmini Varadhan	9c79440e2c	RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one() The send path needs to be quiesced before resetting callbacks from rds_tcp_accept_one(), and commit `eb19284026` ("RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves this using the c_state and RDS_IN_XMIT bit following the pattern used by rds_conn_shutdown(). However this leaves the possibility of a race window as shown in the sequence below take t_conn_lock in rds_tcp_conn_connect send outgoing syn to peer drop t_conn_lock in rds_tcp_conn_connect incoming from peer triggers rds_tcp_accept_one, conn is marked CONNECTING wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads call rds_tcp_reset_callbacks [.. race-window where incoming syn-ack can cause the conn to be marked UP from rds_tcp_state_change ..] lock_sock called from rds_tcp_reset_callbacks, and we set t_sock to null As soon as the conn is marked UP in the race-window above, rds_send_xmit() threads will proceed to rds_tcp_xmit and may encounter a null-pointer deref on the t_sock. Given that rds_tcp_state_change() is invoked in softirq context, whereas rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT after lock_sock could result in a deadlock with tcp_sendmsg, this commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which will prevent a transition to RDS_CONN_UP from rds_tcp_state_change(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:10:15 -07:00
Sowmini Varadhan	0b6f760cff	RDS: TCP: Retransmit half-sent datagrams when switching sockets in rds_tcp_reset_callbacks When we switch a connection's sockets in rds_tcp_rest_callbacks, any partially sent datagram must be retransmitted on the new socket so that the receiver can correctly reassmble the RDS datagram. Use rds_send_reset() which is designed for this purpose. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:10:15 -07:00
Sowmini Varadhan	335b48d980	RDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safely When rds_tcp_accept_one() has to replace the existing tcp socket with a newer tcp socket (duelling-syn resolution), it must lock_sock() to suppress the rds_tcp_data_recv() path while callbacks are being changed. Also, existing RDS datagram reassembly state must be reset, so that the next datagram on the new socket does not have corrupted state. Similarly when resetting the newly accepted socket, appropriate locks and synchronization is needed. This commit ensures correct synchronization by invoking kernel_sock_shutdown to reset a newly accepted sock, and by taking appropriate lock_sock()s (for old and new sockets) when resetting existing callbacks. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:10:15 -07:00
Eric Dumazet	80e509db54	fq_codel: fix NET_XMIT_CN behavior My prior attempt to fix the backlogs of parents failed. If we return NET_XMIT_CN, our parents wont increase their backlog, so our qdisc_tree_reduce_backlog() should take this into account. v2: Florian Westphal pointed out that we could drop the packet, so we need to save qdisc_pkt_len(skb) in a temp variable before calling fq_codel_drop() Fixes: `9d18562a22` ("fq_codel: add batch ability to fq_codel_drop()") Fixes: `2ccccf5fb4` ("net_sched: update hierarchical backlog too") Reported-by: Stas Nichiporovich <stasn77@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 14:49:56 -07:00
Daniel Borkmann	5b6c1b4d46	bpf, trace: use READ_ONCE for retrieving file ptr In bpf_perf_event_read() and bpf_perf_event_output(), we must use READ_ONCE() for fetching the struct file pointer, which could get updated concurrently, so we must prevent the compiler from potential refetching. We already do this with tail calls for fetching the related bpf_prog, but not so on stored perf events. Semantics for both are the same with regards to updates. Fixes: `a43eec3042` ("bpf: introduce bpf_perf_event_output() helper") Fixes: `35578d7984` ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 14:48:03 -07:00
Jason Wang	8241a1e466	vhost_net: stop polling socket during rx processing We don't stop rx polling socket during rx processing, this will lead unnecessary wakeups from under layer net devices (E.g sock_def_readable() form tun). Rx will be slowed down in this way. This patch avoids this by stop polling socket during rx processing. A small drawback is that this introduces some overheads in light load case because of the extra start/stop polling, but single netperf TCP_RR does not notice any change. In a super heavy load case, e.g using pktgen to inject packet to guest, we get about ~8.8% improvement on pps: before: ~1240000 pkt/s after: ~`1350000` pkt/s Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 14:46:11 -07:00
Linus Torvalds	43c082e727	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns fixes from Eric Biederman: "This contains two small but significant fixes to fs/namespace.c. The first adds a filesystem refcount drop on error. The second corrects a test in fs_fully_visible which could be abused to allow mounting of proc or sysfs, when that should not be allowed. To keep myself honest I have tested to ensure the incorrect test in fs_fully_visible actually allows improper mounting of proc before the fix and that when fixed the improper mounting is not allowed" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mnt: fs_fully_visible test the proper mount for MNT_LOCKED mnt: If fs_fully_visible fails call put_filesystem.	2016-06-07 10:04:35 -07:00
Erez Shitrit	61c78eea95	IB/IPoIB: Don't update neigh validity for unresolved entries ipoib_neigh_get unconditionally updates the "alive" variable member on any packet send. This prevents the neighbor garbage collection from cleaning out a dead neighbor entry if we are still queueing packets for it. If the queue for this neighbor is full, then don't update the alive timestamp. That way the neighbor can time out even if packets are still being queued as long as none of them are being sent. Fixes: `b63b70d877` ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-07 10:49:48 -04:00

... 3 4 5 6 7 ...

602392 commits