Commit graph

586866 commits

Author SHA1 Message Date
Jarno Rajahalme
394e910e90 openvswitch: Update the CT state key only after nf_conntrack_in().
Only a successful nf_conntrack_in() call can effect a connection state
change, so it suffices to update the key only after the
nf_conntrack_in() returns.

This change is needed for the later NAT patches.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:28 +01:00
Jarno Rajahalme
9f13ded8d3 openvswitch: Add commentary to conntrack.c
This makes the code easier to understand and the following patches
more focused.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:27 +01:00
Jarno Rajahalme
264619055b netfilter: Allow calling into nat helper without skb_dst.
NAT checksum recalculation code assumes existence of skb_dst, which
becomes a problem for a later patch in the series ("openvswitch:
Interface with NAT.").  Simplify this by removing the check on
skb_dst, as the checksum will be dealt with later in the stack.

Suggested-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:27 +01:00
Jarno Rajahalme
bfa3f9d7f3 netfilter: Remove IP_CT_NEW_REPLY definition.
Remove the definition of IP_CT_NEW_REPLY from the kernel as it does
not make sense.  This allows the definition of IP_CT_NUMBER to be
simplified as well.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:27 +01:00
Linus Torvalds
d37a14bb5f Merge branch 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull ram resource handling changes from Ingo Molnar:
 "Core kernel resource handling changes to support NVDIMM error
  injection.

  This tree introduces a new I/O resource type, IORESOURCE_SYSTEM_RAM,
  for System RAM while keeping the current IORESOURCE_MEM type bit set
  for all memory-mapped ranges (including System RAM) for backward
  compatibility.

  With this resource flag it no longer takes a strcmp() loop through the
  resource tree to find "System RAM" resources.

  The new resource type is then used to extend ACPI/APEI error injection
  facility to also support NVDIMM"

* 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ACPI/EINJ: Allow memory error injection to NVDIMM
  resource: Kill walk_iomem_res()
  x86/kexec: Remove walk_iomem_res() call with GART type
  x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search
  resource: Add walk_iomem_res_desc()
  memremap: Change region_intersects() to take @flags and @desc
  arm/samsung: Change s3c_pm_run_res() to use System RAM type
  resource: Change walk_system_ram() to use System RAM type
  drivers: Initialize resource entry to zero
  xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM
  kexec: Set IORESOURCE_SYSTEM_RAM for System RAM
  arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
  ia64: Set System RAM type and descriptor
  x86/e820: Set System RAM type and descriptor
  resource: Add I/O resource descriptor
  resource: Handle resource flags properly
  resource: Add System RAM resource type
2016-03-14 15:15:51 -07:00
Doug Ledford
76b0640279 Merge branches 'ib_core', 'ib_ipoib', 'srpt', 'drain-cq-v4' and 'net/9p' into k.o/for-4.6 2016-03-14 17:42:57 -04:00
Bryn M. Reeves
98dbc9c6c6 dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request()
An "old" (.request_fn) DM 'struct request' stores a pointer to the
associated 'struct dm_rq_target_io' in rq->special.

dm_requeue_original_request(), previously named
dm_requeue_unmapped_original_request(), called dm_unprep_request() to
reset rq->special to NULL.  But rq_end_stats() would go on to hit a NULL
pointer deference because its call to tio_from_request() returned NULL.

Fix this by calling rq_end_stats() _before_ dm_unprep_request()

Signed-off-by: Bryn M. Reeves <bmr@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: e262f34741 ("dm stats: add support for request-based DM devices")
Cc: stable@vger.kernel.org # 4.2+
2016-03-14 17:04:34 -04:00
David S. Miller
005bcea933 Merge branch 'dsa-finers-bridging-control'
Vivien Didelot says:

====================
net: dsa: finer bridging control

This patchset renames the bridging routines of the DSA layer, make the
unbridging routine return void, and rework the DSA netdev notifier handler,
similar to what the Mellanox Spectrum driver does.

Changes RFC -> v1:
  - drop unused NETDEV_PRECHANGEUPPER case
  - add Andrew's Tested-by tag
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 16:05:32 -04:00
Vivien Didelot
6debb68a2d net: dsa: refine netdev event notifier
Rework the netdev event handler, similar to what the Mellanox Spectrum
driver does, to easily welcome more events later (for example
NETDEV_PRECHANGEUPPER) and use netdev helpers (such as
netif_is_bridge_master).

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 16:05:32 -04:00
Vivien Didelot
16bfa7024e net: dsa: make port_bridge_leave return void
netdev_upper_dev_unlink() which notifies NETDEV_CHANGEUPPER, returns
void, as well as del_nbp(). So there's no advantage to catch an eventual
error from the port_bridge_leave routine at the DSA level.

Make this routine void for the DSA layer and its existing drivers.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 16:05:31 -04:00
Vivien Didelot
71327a4e7d net: dsa: rename port_*_bridge routines
Rename DSA port_join_bridge and port_leave_bridge routines to
respectively port_bridge_join and port_bridge_leave in order to respect
an implicit Port::Bridge namespace.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 16:05:31 -04:00
Maciej S. Szmigiero
1e1589ad8b mISDN: Support DR6 indication in mISDNipac driver
According to figure 39 in PEB3086 data sheet, version 1.4 this indication
replaces DR when layer 1 transition source state is F6.

This fixes mISDN layer 1 getting stuck in F6 state in TE mode on
Dialogic Diva 2.02 card (and possibly others) when NT deactivates it.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Acked-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:51:29 -04:00
Maciej S. Szmigiero
464be1e0be mISDN: Order IPAC register defines
It looks like IPAC/ISAC chips register defines weren't in any particular
order.

Order them by their number to make it easier to spot holes.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Acked-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:51:29 -04:00
Sergei Shtylyov
4fa8c3cc70 sh_eth: kill useless initializers
Some of the local variable intializers in the driver turned out to be
pointless,  kill 'em.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:50:27 -04:00
James Bottomley
7ee7895c93 MAINTAINERS: use new email address for James Bottomley
The @odin.com one has been bouncing for a while now, so replace with
new Employer email.

Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2016-03-14 12:49:38 -07:00
David S. Miller
d98654a7f3 Merge branch 'mvneta-fixes'
Gregory CLEMENT says:

====================
Few mvneta fixes

In this second version I split the last patch in two parts as
requested.

For the record the initial cover letter was:
"here is a patch set of few fixes. Without the first one, a kernel
configured with debug features ended to hang when the driver is built
as a module and is removed. This is quite is annoying for debugging!

The second patch fix a forgotten flag at the initial submission of the
driver.

The third patch is only really a cosmetic one so I have no problem to
not apply it for 4.5 and wait for 4.6.

I really would like to see the first one applied for 4.5 and for the
second I let you judge if it something needed for now or that should
wait the next release."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:48:53 -04:00
Dmitri Epshtein
a3703fb31a net: mvneta: replace magic numbers by existing macros
Some literal values are actually already defined by macros, so let's use
them.

[gregory.clement@free-electrons.com: split intial commit in two
individual changes]
Signed-off-by: Dmitri Epshtein <dima@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:48:52 -04:00
Dmitri Epshtein
0838abb3c0 net: mvneta: fix error messages in mvneta_port_down function
This commit corrects error printing when shutting down the port.

[gregory.clement@free-electrons.com: split initial commit in two
individual changes]
Signed-off-by: Dmitri Epshtein <dima@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:48:52 -04:00
Dmitri Epshtein
928b6519af net: mvneta: enable change MAC address when interface is up
Function eth_prepare_mac_addr_change() is called as part of MAC
address change. This function check if interface is running.
To enable change MAC address when interface is running:
IFF_LIVE_ADDR_CHANGE flag must be set to dev->priv_flags field

Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP
network unit")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitri Epshtein <dima@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:48:51 -04:00
Gregory CLEMENT
1c2722a975 net: mvneta: Fix spinlock usage
In the previous patch, the spinlock was not initialized. While it didn't
cause any trouble yet it could be a problem to use it uninitialized.

The most annoying part was the critical section protected by the spinlock
in mvneta_stop(). Some of the functions could sleep as pointed when
activated CONFIG_DEBUG_ATOMIC_SLEEP. Actually, in mvneta_stop() we only
need to protect the is_stopped flagged, indeed the code of the notifier
for CPU online is protected by the same spinlock, so when we get the
lock, the notifer work is done.

Reported-by: Patrick Uiterwijk <patrick@puiterwijk.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:48:51 -04:00
Florian Westphal
8626c56c82 bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict
Zefir Kurtisi reported kernel panic with an openwrt specific patch.
However, it turns out that mainline has a similar bug waiting to happen.

Once NF_HOOK() returns the skb is in undefined state and must not be
used.   Moreover, the okfn must consume the skb to support async
processing (NF_QUEUE).

Current okfn in this spot doesn't consume it and caller assumes that
NF_HOOK return value tells us if skb was freed or not, but thats wrong.

It "works" because no in-tree user registers a NFPROTO_BRIDGE hook at
LOCAL_IN that returns STOLEN or NF_QUEUE verdicts.

Once we add NF_QUEUE support for nftables bridge this will break --
NF_QUEUE holds the skb for async processing, caller will erronoulsy
return RX_HANDLER_PASS and on reinject netfilter will access free'd skb.

Fix this by pushing skb up the stack in the okfn instead.

NB: It also seems dubious to use LOCAL_IN while bypassing PRE_ROUTING
completely in this case but this is how its been forever so it seems
preferable to not change this.

Cc: Felix Fietkau <nbd@openwrt.org>
Cc: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:46:41 -04:00
David S. Miller
180a2c542c Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2016-03-12

Here's the last bluetooth-next pull request for the 4.6 kernel.

 - New USB ID for AR3012 in btusb
 - New BCM2E55 ACPI ID
 - Buffer overflow fix for the Add Advertising command
 - Support for a new Bluetooth LE limited privacy mode
 - Fix for firmware activation in btmrvl_sdio
 - Cleanups to mac802154 & 6lowpan code

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:44:56 -04:00
David S. Miller
f7f58ae0f8 Merge branch 'dsa-cleanups'
Andrew Lunn says:

====================
DSA cleanup and fixes

The RFC patchset for re-architecturing DSA probing contains a few
standalone patches, either clean up or fixes. This pulls them out for
submission.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:43:12 -04:00
Andrew Lunn
5bcbe0f35f phy: fixed: Fix removal of phys.
The fixed phys delete function simply removed the fixed phy from the
internal linked list and freed the memory. It however did not
unregister the associated phy device. This meant it was still possible
to find the phy device on the mdio bus.

Make fixed_phy_del() an internal function and add a
fixed_phy_unregister() to unregisters the phy device and then uses
fixed_phy_del() to free resources.

Modify DSA to use this new API function, so we don't leak phys.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:43:11 -04:00
Andrew Lunn
ec777e6b42 dsa: dsa: Fix freeing of fixed-phys from user ports.
All ports types can have a fixed PHY associated with it. Remove the
check which limits removal to only CPU and DSA ports.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:43:11 -04:00
Andrew Lunn
3a44514ff9 dsa: Destroy fixed link phys after the phy has been disconnected
The phy is disconnected from the slave in dsa_slave_destroy(). Don't
destroy fixed link phys until after this, since there can be fixed
linked phys connected to ports.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:43:11 -04:00
Andrew Lunn
b71be352f7 dsa: slave: Don't reference NULL pointer during phy_disconnect
When the phy is disconnected, the parent pointer to the netdev it was
attached to is set to NULL. The code then tries to suspend the phy,
but dsa_slave_fixed_link_update needs the parent pointer to determine
which switch the phy is connected to. So it dereferenced a NULL
pointer. Check for this condition.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:43:10 -04:00
Andrew Lunn
ca3dfa51e6 dsa: Rename mv88e6123_61_65 to mv88e6123 to be consistent
All the drivers support multiple chips, but mv88e6123_61_65 is the
only one that reflects this in its naming. Change it to be consistent
with the other drivers.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:43:10 -04:00
David S. Miller
dc91d7efdc Merge branch 'of_mdio-checks'
Sergei Shtylyov says:

====================
of_mdio: use IS_ERR_OR_NULL() and PTR_ERR_OR_ZERO()

   Here's the set of 3 patches against DaveM's 'net-next.git' repo. They deal
with some error checks in the device tree MDIO code...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:31:59 -04:00
Sergei Shtylyov
5189b1d82f of_mdio: use PTR_ERR_OR_ZERO()
PTR_ERR_OR_ZERO() is open coded in of_phy_register_fixed_link(), so just
call it directly...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:31:58 -04:00
Sergei Shtylyov
ac044b902e of_mdio: use IS_ERR_OR_NULL()
IS_ERR_OR_NULL() is open coded in of_mdiobus_register_phy(), so just call
it directly...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:31:58 -04:00
Sergei Shtylyov
a3478bae7a of_mdio: mdio_device_create() never returns NULL
mdio_device_create() never returns NULL, thus checking for it in
of_mdiobus_register_device() is pointless...

Suggested-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:31:58 -04:00
David S. Miller
1f3a1c5466 Merge branch 'thunderx-phy'
David Daney says:

====================
net/phy: Improvements to Cavium Thunder MDIO code.

Changes from v1:

 - In 1/3 Add back check for non-OF objects in bgx_init_of_phy().  It
   is probably not necessary, but better safe than sorry...

The firmware on many Cavium Thunder systems configures the MDIO bus
hardware to be probed as a PCI device.  In order to use the MDIO bus
drivers in this configuration, we must add PCI probing to the driver.

There are two parts to this set of three patches:

 1) Cleanup the PHY probing code in thunder_bgx.c to handle the case
    where there is no PHY attached to a port, as well as being more
    robust in the face of driver loading order by use of
    -EPROBE_DEFER.

 2) Split mdio-octeon.c into two drivers, one with platform probing,
 and the other with PCI probing.  Common code is shared between the
 two.

Tested on several different Thunder and OCTEON systems, also compile
tested on x86_64.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:27:23 -04:00
David Daney
379d7ac7ca phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.
The Cavium Thunder SoCs have multiple MIDO buses that are part of a
single PCI device.  To model this in the device tree we call the PCI
parent device a "cavium,thunder-8890-mdio-nexus", it has several
children, one for each MDIO bus.

The MDIO bus hardware is identical to that found in the OCTEON SoCs,
so we use that code for things that are not part of the PCI driver
probe/remove

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:27:22 -04:00
David Daney
1eefee901f phy: mdio-octeon: Refactor into two files/modules
A follow-on patch uses PCI probing to find the Thunder MDIO hardware.
In preparation for this, split out the common code into a new file
mdio-cavium.c, which will be used by both the existing OCTEON driver,
and the new Thunder PCI based driver.

As part of the refactoring simplify the struct cavium_mdiobus by
removing fields that are only ever used in the probe function and can
just as well be local variables.

Use readq/writeq in preference to readq_relaxed/writeq_relaxed as the
relaxed form was an optimization for an early chip revision, and the
MDIO drivers are not performance bottlenecks that need optimization in
the first place.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:27:22 -04:00
David Daney
5fc7cf1794 net: thunderx: Cleanup PHY probing code.
Remove the call to force the octeon-mdio driver to be loaded.  Allow
the standard driver loading mechanisms to load the PHY drivers, and
use -EPROBE_DEFER to cause the BGX driver to be probed only after the
PHY drivers are available.

Reorder the setting of MAC addresses and PHY probing to allow BGX
LMACs with no attached PHY to still be assigned a MAC address.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:27:22 -04:00
Anna-Maria Gleixner
0df83e7a6b net: mvneta: Add missing hotplug notifier transition
The mvneta_percpu_notifier() hotplug callback lacks handling of the
CPU_DOWN_FAILED case. That means, if CPU_DOWN_PREPARE failes, the
driver is not well configured on the CPU.

Add handling for CPU_DOWN_FAILED[_FROZEN] hotplug notifier transition
to setup the driver.

Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:22:10 -04:00
Igal Liberman
7c82a7b998 fsl/fman: fix dtsec_set_tx_pause_frames
Fix a bug introduced in e06a03b (fsl/fman: fix the pause_time test)
When pause_time is set to '0' - pause frames are disabled and
there's no need to apply dTSEC-A003 Errata workaround.

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:03:10 -04:00
Florian Fainelli
387178ec26 Documentation: networking: phy.txt: Add missing functions
Some new development in PHYLIB added new function pointers to the struct
phy_driver, document these.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:00:04 -04:00
Chuck Lever
2fa8f88d88 xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b249
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.

By converting to this new API, xprtrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.

Because each ib_cqe carries a pointer to a completion method, the
core can now post its own operations on a consumer's QP, and handle
the completions itself, without changes to the consumer.

Send completions were previously handled entirely in the completion
upcall handler (ie, deferring to a process context is unneeded).
Thus IB_POLL_SOFTIRQ is a direct replacement for the current
xprtrdma send code path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:56:08 -04:00
Chuck Lever
c882a655b7 xprtrdma: Use an anonymous union in struct rpcrdma_mw
Clean up: Make code more readable.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:56:06 -04:00
Chuck Lever
552bf22528 xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b249
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.

By converting to this new API, xprtrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.

Because each ib_cqe carries a pointer to a completion method, the
core can now post its own operations on a consumer's QP, and handle
the completions itself, without changes to the consumer.

xprtrdma's reply processing is already handled in a work queue, but
there is some initial order-dependent processing that is done in the
soft IRQ context before a work item is scheduled.

IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma
receive code path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:56:04 -04:00
Chuck Lever
23826c7aea xprtrdma: Serialize credit accounting again
Commit fe97b47cd6 ("xprtrdma: Use workqueue to process RPC/RDMA
replies") replaced the reply tasklet with a workqueue that allows
RPC replies to be processed in parallel. Thus the credit values in
RPC-over-RDMA replies can be applied in a different order than in
which the server sent them.

To fix this, revert commit eba8ff660b ("xprtrdma: Move credit
update to RPC reply handler"). Reverting is done by hand to
accommodate code changes that have occurred since then.

Fixes: fe97b47cd6 ("xprtrdma: Use workqueue to process . . .")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:56:01 -04:00
Chuck Lever
59aa1f9a3c xprtrdma: Properly handle RDMA_ERROR replies
These are shorter than RPCRDMA_HDRLEN_MIN, and they need to
complete the waiting RPC.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:59 -04:00
Chuck Lever
d8647f2d3c rpcrdma: Add RPCRDMA_HDRLEN_ERR
Error headers are shorter than either RDMA_MSG or RDMA_NOMSG.

Since HDRLEN_MIN is already used in several other places that would
be annoying to change, add RPCRDMA_HDRLEN_ERR for the one or two
spots where the shorter length is needed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:57 -04:00
Chuck Lever
b892a699ce xprtrdma: Do not wait if ib_post_send() fails
If ib_post_send() in ro_unmap_sync() fails, the WRs have not been
posted, no completions will fire, and wait_for_completion() will
wait forever. Skip the wait in that case.

To ensure the MRs are invalid, disconnect.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:55 -04:00
Chuck Lever
821c791a0b xprtrdma: Segment head and tail XDR buffers on page boundaries
A single memory allocation is used for the pair of buffers wherein
the RPC client builds an RPC call message and decodes its matching
reply. These buffers are sized based on the maximum possible size
of the RPC call and reply messages for the operation in progress.

This means that as the call buffer increases in size, the start of
the reply buffer is pushed farther into the memory allocation.

RPC requests are growing in size. It used to be that both the call
and reply buffers fit inside a single page.

But these days, thanks to NFSv4 (and especially security labels in
NFSv4.2) the maximum call and reply sizes are large. NFSv4.0 OPEN,
for example, now requires a 6KB allocation for a pair of call and
reply buffers, and NFSv4 LOOKUP is not far behind.

As the maximum size of a call increases, the reply buffer is pushed
far enough into the buffer's memory allocation that a page boundary
can appear in the middle of it.

When the maximum possible reply size is larger than the client's
RDMA receive buffers (currently 1KB), the client has to register a
Reply chunk for the server to RDMA Write the reply into.

The logic in rpcrdma_convert_iovs() assumes that xdr_buf head and
tail buffers would always be contained on a single page. It supplies
just one segment for the head and one for the tail.

FMR, for example, registers up to a page boundary (only a portion of
the reply buffer in the OPEN case above). But without additional
segments, it doesn't register the rest of the buffer.

When the server tries to write the OPEN reply, the RDMA Write fails
with a remote access error since the client registered only part of
the Reply chunk.

rpcrdma_convert_iovs() must split the XDR buffer into multiple
segments, each of which are guaranteed not to contain a page
boundary. That way fmr_op_map is given the proper number of segments
to register the whole reply buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:53 -04:00
Chuck Lever
af0f16e825 xprtrdma: Clean up dprintk format string containing a newline
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:50 -04:00
Chuck Lever
4d97291b6f xprtrdma: Clean up physical_op_map()
physical_op_unmap{_sync} don't use mr_nsegs, so don't bother to set
it in physical_op_map.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:47 -04:00
Chuck Lever
66cede5279 xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
Fixes: b3221d6a53 ('xprtrdma: Remove logic that constructs...')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:44 -04:00