linux-uconsole/include/linux/sunrpc
Benjamin Coddington 2440f3cebc SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT
This patch is only appropriate for stable kernels v4.16 - v4.19

Since commit 9b30889c54 ("SUNRPC: Ensure we always close the socket after
a connection shuts down"), and until commit c544577dad ("SUNRPC: Clean up
transport write space handling"), it is possible for the NFS client to spin
in the following tight loop:

269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_bind [sunrpc]
269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_connect [sunrpc]
269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_transmit [sunrpc]
269.964085: xprt_transmit: peer=[10.0.1.82]:2049 xid=0x761d3f77 status=-32
269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_transmit_status [sunrpc]
269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_status [sunrpc]
269.964085: rpc_call_status: task:43@0 status=-32

The issue is that the path through call_transmit_status does not release
the XPRT_LOCK when the transmit result is -EPIPE, so the socket cannot be
properly shut down.

The below commit fixed things up in mainline by unconditionally calling
xprt_end_transmit() and releasing the XPRT_LOCK after every pass through
call_transmit.  However, the entirety of this commit is not appropriate for
stable kernels because its original inclusion was part of a series that
modifies the sunrpc code to use a different queueing model.  As a result,
there are machinations within this patch that are not needed for a stable
fix and will not make sense without a larger backport of the mainline
series.

In this patch, we take the slightly modified bit of the mainline patch
below, which is to release the XPRT_LOCK on transmission error should we
detect that the transport is waiting to close.

commit c544577dad upstream
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Mon Sep 3 23:39:27 2018 -0400

    SUNRPC: Clean up transport write space handling

    Treat socket write space handling in the same way we now treat transport
    congestion: by denying the XPRT_LOCK until the transport signals that it
    has free buffer space.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

The original discussion of the problem is here:

    https://lore.kernel.org/linux-nfs/20181212135157.4489-1-dwysocha@redhat.com/T/#t

This passes my usual cthon and xfstests on NFS as applied on v4.19 mainline.

Reported-by: Dave Wysochanski <dwysocha@redhat.com>
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15 08:10:13 +01:00
..
addr.h
auth.h net/sunrpc: Make rpc_auth_create_args a const 2018-07-30 13:19:41 -04:00
auth_gss.h
bc_xprt.h
cache.h
clnt.h NFSv4 client live hangs after live data migration recovery 2018-07-31 12:53:40 -04:00
debug.h
gss_api.h
gss_asn1.h
gss_err.h
gss_krb5.h
gss_krb5_enctypes.h
metrics.h sunrpc: Change rpc_print_iostats to rpc_clnt_show_stats and handle rpc_clnt clones 2018-07-31 12:53:35 -04:00
msg_prot.h
rpc_pipe_fs.h remove rpc_rmdir() 2018-04-16 14:20:26 -04:00
rpc_rdma.h xprtrdma: Add proper SPDX tags for NetApp-contributed source 2018-05-07 09:20:03 -04:00
sched.h
stats.h
svc.h sunrpc: use-after-free in svc_process_common() 2019-01-16 22:04:37 +01:00
svc_rdma.h svcrdma: Remove unused svc_rdma_op_ctxt 2018-05-11 15:48:57 -04:00
svc_xprt.h
svcauth.h sunrpc: Extract target name into svc_cred 2018-08-22 18:32:07 -04:00
svcauth_gss.h
svcsock.h
timer.h
types.h
xdr.h NFSv4; Clean up XDR encoding of type bitmap4 2018-04-10 16:06:22 -04:00
xprt.h SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT 2019-02-15 08:10:13 +01:00
xprtmultipath.h
xprtrdma.h xprtrdma: Add proper SPDX tags for NetApp-contributed source 2018-05-07 09:20:03 -04:00
xprtsock.h