linux-pinenote

Author	SHA1	Message	Date
Jarno Rajahalme	394e910e90	openvswitch: Update the CT state key only after nf_conntrack_in(). Only a successful nf_conntrack_in() call can effect a connection state change, so it suffices to update the key only after the nf_conntrack_in() returns. This change is needed for the later NAT patches. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-14 23:47:28 +01:00
Jarno Rajahalme	9f13ded8d3	openvswitch: Add commentary to conntrack.c This makes the code easier to understand and the following patches more focused. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-14 23:47:27 +01:00
Jarno Rajahalme	264619055b	netfilter: Allow calling into nat helper without skb_dst. NAT checksum recalculation code assumes existence of skb_dst, which becomes a problem for a later patch in the series ("openvswitch: Interface with NAT."). Simplify this by removing the check on skb_dst, as the checksum will be dealt with later in the stack. Suggested-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-14 23:47:27 +01:00
Jarno Rajahalme	bfa3f9d7f3	netfilter: Remove IP_CT_NEW_REPLY definition. Remove the definition of IP_CT_NEW_REPLY from the kernel as it does not make sense. This allows the definition of IP_CT_NUMBER to be simplified as well. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-14 23:47:27 +01:00
Linus Torvalds	d37a14bb5f	Merge branch 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull ram resource handling changes from Ingo Molnar: "Core kernel resource handling changes to support NVDIMM error injection. This tree introduces a new I/O resource type, IORESOURCE_SYSTEM_RAM, for System RAM while keeping the current IORESOURCE_MEM type bit set for all memory-mapped ranges (including System RAM) for backward compatibility. With this resource flag it no longer takes a strcmp() loop through the resource tree to find "System RAM" resources. The new resource type is then used to extend ACPI/APEI error injection facility to also support NVDIMM" * 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ACPI/EINJ: Allow memory error injection to NVDIMM resource: Kill walk_iomem_res() x86/kexec: Remove walk_iomem_res() call with GART type x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search resource: Add walk_iomem_res_desc() memremap: Change region_intersects() to take @flags and @desc arm/samsung: Change s3c_pm_run_res() to use System RAM type resource: Change walk_system_ram() to use System RAM type drivers: Initialize resource entry to zero xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM kexec: Set IORESOURCE_SYSTEM_RAM for System RAM arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM ia64: Set System RAM type and descriptor x86/e820: Set System RAM type and descriptor resource: Add I/O resource descriptor resource: Handle resource flags properly resource: Add System RAM resource type	2016-03-14 15:15:51 -07:00
Doug Ledford	76b0640279	Merge branches 'ib_core', 'ib_ipoib', 'srpt', 'drain-cq-v4' and 'net/9p' into k.o/for-4.6	2016-03-14 17:42:57 -04:00
Bryn M. Reeves	98dbc9c6c6	dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request() An "old" (.request_fn) DM 'struct request' stores a pointer to the associated 'struct dm_rq_target_io' in rq->special. dm_requeue_original_request(), previously named dm_requeue_unmapped_original_request(), called dm_unprep_request() to reset rq->special to NULL. But rq_end_stats() would go on to hit a NULL pointer deference because its call to tio_from_request() returned NULL. Fix this by calling rq_end_stats() _before_ dm_unprep_request() Signed-off-by: Bryn M. Reeves <bmr@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: `e262f34741` ("dm stats: add support for request-based DM devices") Cc: stable@vger.kernel.org # 4.2+	2016-03-14 17:04:34 -04:00
David S. Miller	005bcea933	Merge branch 'dsa-finers-bridging-control' Vivien Didelot says: ==================== net: dsa: finer bridging control This patchset renames the bridging routines of the DSA layer, make the unbridging routine return void, and rework the DSA netdev notifier handler, similar to what the Mellanox Spectrum driver does. Changes RFC -> v1: - drop unused NETDEV_PRECHANGEUPPER case - add Andrew's Tested-by tag ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 16:05:32 -04:00
Vivien Didelot	6debb68a2d	net: dsa: refine netdev event notifier Rework the netdev event handler, similar to what the Mellanox Spectrum driver does, to easily welcome more events later (for example NETDEV_PRECHANGEUPPER) and use netdev helpers (such as netif_is_bridge_master). Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 16:05:32 -04:00
Vivien Didelot	16bfa7024e	net: dsa: make port_bridge_leave return void netdev_upper_dev_unlink() which notifies NETDEV_CHANGEUPPER, returns void, as well as del_nbp(). So there's no advantage to catch an eventual error from the port_bridge_leave routine at the DSA level. Make this routine void for the DSA layer and its existing drivers. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 16:05:31 -04:00
Vivien Didelot	71327a4e7d	net: dsa: rename port_*_bridge routines Rename DSA port_join_bridge and port_leave_bridge routines to respectively port_bridge_join and port_bridge_leave in order to respect an implicit Port::Bridge namespace. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 16:05:31 -04:00
Maciej S. Szmigiero	1e1589ad8b	mISDN: Support DR6 indication in mISDNipac driver According to figure 39 in PEB3086 data sheet, version 1.4 this indication replaces DR when layer 1 transition source state is F6. This fixes mISDN layer 1 getting stuck in F6 state in TE mode on Dialogic Diva 2.02 card (and possibly others) when NT deactivates it. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Acked-by: Karsten Keil <keil@b1-systems.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:51:29 -04:00
Maciej S. Szmigiero	464be1e0be	mISDN: Order IPAC register defines It looks like IPAC/ISAC chips register defines weren't in any particular order. Order them by their number to make it easier to spot holes. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Acked-by: Karsten Keil <keil@b1-systems.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:51:29 -04:00
Sergei Shtylyov	4fa8c3cc70	sh_eth: kill useless initializers Some of the local variable intializers in the driver turned out to be pointless, kill 'em. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:50:27 -04:00
James Bottomley	7ee7895c93	MAINTAINERS: use new email address for James Bottomley The @odin.com one has been bouncing for a while now, so replace with new Employer email. Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2016-03-14 12:49:38 -07:00
David S. Miller	d98654a7f3	Merge branch 'mvneta-fixes' Gregory CLEMENT says: ==================== Few mvneta fixes In this second version I split the last patch in two parts as requested. For the record the initial cover letter was: "here is a patch set of few fixes. Without the first one, a kernel configured with debug features ended to hang when the driver is built as a module and is removed. This is quite is annoying for debugging! The second patch fix a forgotten flag at the initial submission of the driver. The third patch is only really a cosmetic one so I have no problem to not apply it for 4.5 and wait for 4.6. I really would like to see the first one applied for 4.5 and for the second I let you judge if it something needed for now or that should wait the next release." ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:48:53 -04:00
Dmitri Epshtein	a3703fb31a	net: mvneta: replace magic numbers by existing macros Some literal values are actually already defined by macros, so let's use them. [gregory.clement@free-electrons.com: split intial commit in two individual changes] Signed-off-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:48:52 -04:00
Dmitri Epshtein	0838abb3c0	net: mvneta: fix error messages in mvneta_port_down function This commit corrects error printing when shutting down the port. [gregory.clement@free-electrons.com: split initial commit in two individual changes] Signed-off-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:48:52 -04:00
Dmitri Epshtein	928b6519af	net: mvneta: enable change MAC address when interface is up Function eth_prepare_mac_addr_change() is called as part of MAC address change. This function check if interface is running. To enable change MAC address when interface is running: IFF_LIVE_ADDR_CHANGE flag must be set to dev->priv_flags field Fixes: `c5aff18204` ("net: mvneta: driver for Marvell Armada 370/XP network unit") Cc: stable@vger.kernel.org Signed-off-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:48:51 -04:00
Gregory CLEMENT	1c2722a975	net: mvneta: Fix spinlock usage In the previous patch, the spinlock was not initialized. While it didn't cause any trouble yet it could be a problem to use it uninitialized. The most annoying part was the critical section protected by the spinlock in mvneta_stop(). Some of the functions could sleep as pointed when activated CONFIG_DEBUG_ATOMIC_SLEEP. Actually, in mvneta_stop() we only need to protect the is_stopped flagged, indeed the code of the notifier for CPU online is protected by the same spinlock, so when we get the lock, the notifer work is done. Reported-by: Patrick Uiterwijk <patrick@puiterwijk.org> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:48:51 -04:00
Florian Westphal	8626c56c82	bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict Zefir Kurtisi reported kernel panic with an openwrt specific patch. However, it turns out that mainline has a similar bug waiting to happen. Once NF_HOOK() returns the skb is in undefined state and must not be used. Moreover, the okfn must consume the skb to support async processing (NF_QUEUE). Current okfn in this spot doesn't consume it and caller assumes that NF_HOOK return value tells us if skb was freed or not, but thats wrong. It "works" because no in-tree user registers a NFPROTO_BRIDGE hook at LOCAL_IN that returns STOLEN or NF_QUEUE verdicts. Once we add NF_QUEUE support for nftables bridge this will break -- NF_QUEUE holds the skb for async processing, caller will erronoulsy return RX_HANDLER_PASS and on reinject netfilter will access free'd skb. Fix this by pushing skb up the stack in the okfn instead. NB: It also seems dubious to use LOCAL_IN while bypassing PRE_ROUTING completely in this case but this is how its been forever so it seems preferable to not change this. Cc: Felix Fietkau <nbd@openwrt.org> Cc: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:46:41 -04:00
David S. Miller	180a2c542c	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-03-12 Here's the last bluetooth-next pull request for the 4.6 kernel. - New USB ID for AR3012 in btusb - New BCM2E55 ACPI ID - Buffer overflow fix for the Add Advertising command - Support for a new Bluetooth LE limited privacy mode - Fix for firmware activation in btmrvl_sdio - Cleanups to mac802154 & 6lowpan code Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:44:56 -04:00
David S. Miller	f7f58ae0f8	Merge branch 'dsa-cleanups' Andrew Lunn says: ==================== DSA cleanup and fixes The RFC patchset for re-architecturing DSA probing contains a few standalone patches, either clean up or fixes. This pulls them out for submission. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:43:12 -04:00
Andrew Lunn	5bcbe0f35f	phy: fixed: Fix removal of phys. The fixed phys delete function simply removed the fixed phy from the internal linked list and freed the memory. It however did not unregister the associated phy device. This meant it was still possible to find the phy device on the mdio bus. Make fixed_phy_del() an internal function and add a fixed_phy_unregister() to unregisters the phy device and then uses fixed_phy_del() to free resources. Modify DSA to use this new API function, so we don't leak phys. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:43:11 -04:00
Andrew Lunn	ec777e6b42	dsa: dsa: Fix freeing of fixed-phys from user ports. All ports types can have a fixed PHY associated with it. Remove the check which limits removal to only CPU and DSA ports. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:43:11 -04:00
Andrew Lunn	3a44514ff9	dsa: Destroy fixed link phys after the phy has been disconnected The phy is disconnected from the slave in dsa_slave_destroy(). Don't destroy fixed link phys until after this, since there can be fixed linked phys connected to ports. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:43:11 -04:00
Andrew Lunn	b71be352f7	dsa: slave: Don't reference NULL pointer during phy_disconnect When the phy is disconnected, the parent pointer to the netdev it was attached to is set to NULL. The code then tries to suspend the phy, but dsa_slave_fixed_link_update needs the parent pointer to determine which switch the phy is connected to. So it dereferenced a NULL pointer. Check for this condition. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:43:10 -04:00
Andrew Lunn	ca3dfa51e6	dsa: Rename mv88e6123_61_65 to mv88e6123 to be consistent All the drivers support multiple chips, but mv88e6123_61_65 is the only one that reflects this in its naming. Change it to be consistent with the other drivers. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:43:10 -04:00
David S. Miller	dc91d7efdc	Merge branch 'of_mdio-checks' Sergei Shtylyov says: ==================== of_mdio: use IS_ERR_OR_NULL() and PTR_ERR_OR_ZERO() Here's the set of 3 patches against DaveM's 'net-next.git' repo. They deal with some error checks in the device tree MDIO code... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:31:59 -04:00
Sergei Shtylyov	5189b1d82f	of_mdio: use PTR_ERR_OR_ZERO() PTR_ERR_OR_ZERO() is open coded in of_phy_register_fixed_link(), so just call it directly... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:31:58 -04:00
Sergei Shtylyov	ac044b902e	of_mdio: use IS_ERR_OR_NULL() IS_ERR_OR_NULL() is open coded in of_mdiobus_register_phy(), so just call it directly... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:31:58 -04:00
Sergei Shtylyov	a3478bae7a	of_mdio: mdio_device_create() never returns NULL mdio_device_create() never returns NULL, thus checking for it in of_mdiobus_register_device() is pointless... Suggested-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:31:58 -04:00
David S. Miller	1f3a1c5466	Merge branch 'thunderx-phy' David Daney says: ==================== net/phy: Improvements to Cavium Thunder MDIO code. Changes from v1: - In 1/3 Add back check for non-OF objects in bgx_init_of_phy(). It is probably not necessary, but better safe than sorry... The firmware on many Cavium Thunder systems configures the MDIO bus hardware to be probed as a PCI device. In order to use the MDIO bus drivers in this configuration, we must add PCI probing to the driver. There are two parts to this set of three patches: 1) Cleanup the PHY probing code in thunder_bgx.c to handle the case where there is no PHY attached to a port, as well as being more robust in the face of driver loading order by use of -EPROBE_DEFER. 2) Split mdio-octeon.c into two drivers, one with platform probing, and the other with PCI probing. Common code is shared between the two. Tested on several different Thunder and OCTEON systems, also compile tested on x86_64. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:27:23 -04:00
David Daney	379d7ac7ca	phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses. The Cavium Thunder SoCs have multiple MIDO buses that are part of a single PCI device. To model this in the device tree we call the PCI parent device a "cavium,thunder-8890-mdio-nexus", it has several children, one for each MDIO bus. The MDIO bus hardware is identical to that found in the OCTEON SoCs, so we use that code for things that are not part of the PCI driver probe/remove Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:27:22 -04:00
David Daney	1eefee901f	phy: mdio-octeon: Refactor into two files/modules A follow-on patch uses PCI probing to find the Thunder MDIO hardware. In preparation for this, split out the common code into a new file mdio-cavium.c, which will be used by both the existing OCTEON driver, and the new Thunder PCI based driver. As part of the refactoring simplify the struct cavium_mdiobus by removing fields that are only ever used in the probe function and can just as well be local variables. Use readq/writeq in preference to readq_relaxed/writeq_relaxed as the relaxed form was an optimization for an early chip revision, and the MDIO drivers are not performance bottlenecks that need optimization in the first place. Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:27:22 -04:00
David Daney	5fc7cf1794	net: thunderx: Cleanup PHY probing code. Remove the call to force the octeon-mdio driver to be loaded. Allow the standard driver loading mechanisms to load the PHY drivers, and use -EPROBE_DEFER to cause the BGX driver to be probed only after the PHY drivers are available. Reorder the setting of MAC addresses and PHY probing to allow BGX LMACs with no attached PHY to still be assigned a MAC address. Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:27:22 -04:00
Anna-Maria Gleixner	0df83e7a6b	net: mvneta: Add missing hotplug notifier transition The mvneta_percpu_notifier() hotplug callback lacks handling of the CPU_DOWN_FAILED case. That means, if CPU_DOWN_PREPARE failes, the driver is not well configured on the CPU. Add handling for CPU_DOWN_FAILED[_FROZEN] hotplug notifier transition to setup the driver. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: netdev@vger.kernel.org Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:22:10 -04:00
Igal Liberman	7c82a7b998	fsl/fman: fix dtsec_set_tx_pause_frames Fix a bug introduced in `e06a03b` (fsl/fman: fix the pause_time test) When pause_time is set to '0' - pause frames are disabled and there's no need to apply dTSEC-A003 Errata workaround. Signed-off-by: Igal Liberman <igal.liberman@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:03:10 -04:00
Florian Fainelli	387178ec26	Documentation: networking: phy.txt: Add missing functions Some new development in PHYLIB added new function pointers to the struct phy_driver, document these. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:00:04 -04:00
Chuck Lever	2fa8f88d88	xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit `14d3a3b249` ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, xprtrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post its own operations on a consumer's QP, and handle the completions itself, without changes to the consumer. Send completions were previously handled entirely in the completion upcall handler (ie, deferring to a process context is unneeded). Thus IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma send code path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:56:08 -04:00
Chuck Lever	c882a655b7	xprtrdma: Use an anonymous union in struct rpcrdma_mw Clean up: Make code more readable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:56:06 -04:00
Chuck Lever	552bf22528	xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit `14d3a3b249` ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, xprtrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post its own operations on a consumer's QP, and handle the completions itself, without changes to the consumer. xprtrdma's reply processing is already handled in a work queue, but there is some initial order-dependent processing that is done in the soft IRQ context before a work item is scheduled. IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma receive code path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:56:04 -04:00
Chuck Lever	23826c7aea	xprtrdma: Serialize credit accounting again Commit `fe97b47cd6` ("xprtrdma: Use workqueue to process RPC/RDMA replies") replaced the reply tasklet with a workqueue that allows RPC replies to be processed in parallel. Thus the credit values in RPC-over-RDMA replies can be applied in a different order than in which the server sent them. To fix this, revert commit `eba8ff660b` ("xprtrdma: Move credit update to RPC reply handler"). Reverting is done by hand to accommodate code changes that have occurred since then. Fixes: `fe97b47cd6` ("xprtrdma: Use workqueue to process . . .") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:56:01 -04:00
Chuck Lever	59aa1f9a3c	xprtrdma: Properly handle RDMA_ERROR replies These are shorter than RPCRDMA_HDRLEN_MIN, and they need to complete the waiting RPC. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:55:59 -04:00
Chuck Lever	d8647f2d3c	rpcrdma: Add RPCRDMA_HDRLEN_ERR Error headers are shorter than either RDMA_MSG or RDMA_NOMSG. Since HDRLEN_MIN is already used in several other places that would be annoying to change, add RPCRDMA_HDRLEN_ERR for the one or two spots where the shorter length is needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:55:57 -04:00
Chuck Lever	b892a699ce	xprtrdma: Do not wait if ib_post_send() fails If ib_post_send() in ro_unmap_sync() fails, the WRs have not been posted, no completions will fire, and wait_for_completion() will wait forever. Skip the wait in that case. To ensure the MRs are invalid, disconnect. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:55:55 -04:00
Chuck Lever	821c791a0b	xprtrdma: Segment head and tail XDR buffers on page boundaries A single memory allocation is used for the pair of buffers wherein the RPC client builds an RPC call message and decodes its matching reply. These buffers are sized based on the maximum possible size of the RPC call and reply messages for the operation in progress. This means that as the call buffer increases in size, the start of the reply buffer is pushed farther into the memory allocation. RPC requests are growing in size. It used to be that both the call and reply buffers fit inside a single page. But these days, thanks to NFSv4 (and especially security labels in NFSv4.2) the maximum call and reply sizes are large. NFSv4.0 OPEN, for example, now requires a 6KB allocation for a pair of call and reply buffers, and NFSv4 LOOKUP is not far behind. As the maximum size of a call increases, the reply buffer is pushed far enough into the buffer's memory allocation that a page boundary can appear in the middle of it. When the maximum possible reply size is larger than the client's RDMA receive buffers (currently 1KB), the client has to register a Reply chunk for the server to RDMA Write the reply into. The logic in rpcrdma_convert_iovs() assumes that xdr_buf head and tail buffers would always be contained on a single page. It supplies just one segment for the head and one for the tail. FMR, for example, registers up to a page boundary (only a portion of the reply buffer in the OPEN case above). But without additional segments, it doesn't register the rest of the buffer. When the server tries to write the OPEN reply, the RDMA Write fails with a remote access error since the client registered only part of the Reply chunk. rpcrdma_convert_iovs() must split the XDR buffer into multiple segments, each of which are guaranteed not to contain a page boundary. That way fmr_op_map is given the proper number of segments to register the whole reply buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:55:53 -04:00
Chuck Lever	af0f16e825	xprtrdma: Clean up dprintk format string containing a newline Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:55:50 -04:00
Chuck Lever	4d97291b6f	xprtrdma: Clean up physical_op_map() physical_op_unmap{_sync} don't use mr_nsegs, so don't bother to set it in physical_op_map. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:55:47 -04:00
Chuck Lever	66cede5279	xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro Fixes: `b3221d6a53` ('xprtrdma: Remove logic that constructs...') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-03-14 14:55:44 -04:00

... 13 14 15 16 17 ...

586866 commits