Add fields to the rpc_procinfo struct that allow the display of a
human-readable name for each procedure in the rpc_iostats output.
Also fix it so that the NFSv4 stats are broken up correctly by
sub-procedure number. NFSv4 uses only two real RPC procedures:
NULL, and COMPOUND.
Test plan:
Mount with NFSv2, NFSv3, and NFSv4, and do "cat /proc/self/mountstats".
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFS client now shows various RPC I/O metrics in /proc/self/mountstats.
Test plan:
Mount/umount while doing "cat /proc/self/mountstats", multiple iterations
of connectathon locking suite. Test with NFS version 2, 3, and 4.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a simple mechanism for collecting stats in the RPC client. Stats are
tabulated during xprt_release. Note that per_cpu shenanigans are not
required here because the RPC client already serializes on the transport
write lock.
Test plan:
Compile kernel with CONFIG_NFS enabled. Basic performance regression
testing with high-speed networking and high performance server.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Account for various things that occur while an RPC task is executed.
Separate timers for RPC round trip and RPC execution time show how
long RPC requests wait in queue before being sent. Eventually these
will be accumulated at xprt_release time in one place where they can
be viewed from userland.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Monitor generic transport events. Add a transport switch callout to
format transport counters for export to user-land.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
RPC wait queue length will eventually be exported to userland via the RPC
iostats interface.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a field in nfs_server to record a timestamp when a mount succeeds.
Report the number of seconds the file system has been mounted via
nfs_show_stats().
Test plan:
Mount an NFS file system, watch the mountstats reports and compare with
clock time.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make an inode or an nfs_server struct available in the logic that handles
JUKEBOX/DELAY type errors so the NFS client can account for them.
This patch is split out from the main nfs iostat patch to highlight minor
architectural changes required to support this statistic.
Test plan:
None.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Invoke the byte and event counter macros where we want to count bytes and
events.
Clean-up: fix a possible NULL dereference in nfs_lock, and simplify
nfs_file_open.
Test-plan:
fsx and iozone on UP and SMP systems, with and without pre-emption. Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a per-superblock performance counter facility to the NFS client. This
facility mimics the counters available for block devices and for
networking. Expose these new counters via the new /proc/self/mountstats
interface.
Thanks to Andrew Morton and Trond Myklebust for their review and comments.
Test plan:
fsx and iozone on UP and SMP systems, with and without pre-emption. Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Get rid of "lock" and "posix", and spell out "vers=".
Test plan:
None.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Sometimes it's important to know the exact RPC retransmit settings the
kernel is using for an NFS mount point. Add this facility to the NFS
client's show_options method.
Test plan:
Set various retransmit settings via the mount command, and check that the
settings are reflected in /proc/mounts.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Create a new file under /proc/self, called mountstats, where mounted file
systems can export information (configuration options, performance counters,
and so on). Use a mechanism similar to /proc/mounts and s_ops->show_options.
This mechanism does not violate namespace security, and is safe to use while
other processes are unmounting file systems.
Thanks to Mike Waychison for his review and comments.
Test-plan:
Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds server ip address to be printed out when "server
requires stronger authentication" error occured.
Signed-off-by: Levent Serinol <lserinol@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
read_cache_mtime is no longer used in nfs_inode. This patch removes
references of read_cache_mtime in the code comments.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
semaphore to mutex conversion.
the conversion was generated via scripts, and the result was validated
automatically via a script as well.
build and boot tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
this converts fs/nfs to kzalloc() usage.
compile tested with make allyesconfig
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs4_open_revalidate: 'res' may be used uninitialized
nfs4_callback_compound: ‘hdr_res.nops’ may be used uninitialized
'op_nr’ may be used uninitialized
encode_getattr_res: ‘savep’ may be used uninitialized
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If not, we cannot guarantee that idmap->idmap_dentry, gss_auth->dentry and
clnt->cl_dentry are valid dentries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds a request_module call to rpcauth_create which will try
to auto-load the kernel module for the requested authentication flavor.
For kernels with modular sunrpc, this reduces the admin overhead for
the user.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
My previous "const static" vs "static const" cleanup missed a single case,
patch below takes care of it.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we flush out writes in the case when someone calls utimes() in
order to set the file times.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, nlm_alloc_call tests for a signal before it even tries to
allocate memory.
Fix it so that it tries at least once.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I've been reading through fs/nfs/write.c trying to track down a bug
that seems to be related to pages loosing a refcount and getting
freed too early (you interested in detail??) and I spotted a little
bug which the following patch should fix.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, there is no serialisation between NFS asynchronous writebacks
and truncation at the page level due to the fact that nfs_sync_inode()
cannot lock the pages that it is about to write out.
This means that it is possible to be flushing out data (and calling something
like set_page_writeback()) while the page cache is busy evicting the page.
Oops...
Use the hooks provided in try_to_release_page() to ensure that dirty pages
are always written back to storage before we evict them.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nfs_open_context may live longer than the file descriptor that spawned
it, so it needs to carry a reference to the vfsmount. If not, then
generic_shutdown_super() may end up being called before reads and writes
have been flushed out.
Make a couple of functions static while we're at it...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
JFS: add uid, gid, and umask mount options
JFS: Take logsync lock before testing mp->lsn
JFS: kzalloc conversion
JFS: Add missing file from fa3241d24c
JFS: Use the kthread_ API
JFS: Fix regression. fsck complains if symlinks do not have INLINEEA attribute
JFS: ext2 inode attributes for jfs
JFS: semaphore to mutex conversion.
JFS: make buddy table static
JFS: Add back directory i_size calculations for legacy partitions
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (150 commits)
[PATCH] ipw2100: Update version ipw2100 stamp to 1.2.2
[PATCH] ipw2100: move mutex.h include from ipw2100.c to ipw2100.h
[PATCH] ipw2100: semaphore to mutexes conversion
[PATCH] ipw2100: Fix radiotap code gcc warning
[PATCH] ipw2100: add radiotap headers to packtes captured in monitor mode
[PATCH] ipw2x00: expend Copyright to 2006
[PATCH] drivers/net/wireless/ipw2200.c: fix an array overun
[PATCH] ieee80211: Don't update network statistics from off-channel packets.
[PATCH] ipw2200: Update ipw2200 version stamp to 1.1.1
[PATCH] ipw2200: switch to the new ipw2200-fw-3.0 image format
[PATCH] ipw2200: wireless extension sensitivity threshold support
[PATCH] ipw2200: Enables the "slow diversity" algorithm
[PATCH] ipw2200: Set a meaningful silence threshold value
[PATCH] ipw2200: export `debug' module param only if CONFIG_IPW2200_DEBUG
[PATCH] ipw2200: Change debug level for firmware error logging
[PATCH] ipw2200: Filter unsupported channels out in ad-hoc mode
[PATCH] ipw2200: Fix ipw_sw_reset() implementation inconsistent with comment
[PATCH] ipw2200: Fix rf_kill is activated after mode change with 'disable=1'
[PATCH] ipw2200: remove the WPA card associates to non-WPA AP checking
[PATCH] ipw2200: Add signal level to iwlist scan output
...
* 'block-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/block:
[PATCH] fix rmmod problems with elevator attributes, clean them up
[PATCH] elevator_t lifetime rules and sysfs fixes
[PATCH] noise removal: cfq-iosched.c
[PATCH] don't bother with refcounting for cfq_data
[PATCH] fix sysfs interaction and lifetime rules handling for queues
[PATCH] regularize blk_cleanup_queue() use
[PATCH] fix cfq_get_queue()/ioprio_set(2) races
[PATCH] deal with rmmod/put_io_context() races
[PATCH] stop elv_unregister() from rogering other iosched's data, fix locking
[PATCH] stop cfq from pinning queue down
[PATCH] make cfq_exit_queue() prune the cfq_io_context for that queue
[PATCH] fix the exclusion for ioprio_set()
[PATCH] keep sync and async cfq_queue separate
[PATCH] switch to use of ->key to get cfq_data by cfq_io_context
[PATCH] stop leaking cfq_data in cfq_set_request()
[PATCH] fix cfq hash lookups
[PATCH] fix locking in queue_requests_store()
[PATCH] fix double-free in blk_init_queue_node()
[PATCH] don't do exit_io_context() until we know we won't be doing any IO
Fix endianness handling of srq_limit: it is big-endian in the context
structure, so we need to swab it before returning it.
Also add support for srq_limit query for Tavor (non-MemFree) HCAs.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In neigh_add_path(), the queue of delayed packets can never be full,
because the queue is always freshly created and cannot be found by any
other code path. In fact, the test of the queue length is worse than
useless: if somehow the test ever triggered and path_rec_start() also
failed, then dev_kfree_skb_any() will be called twice on the same skb.
Fix this by deleting the useless test. Pointed out by Michael
S. Tsirkin <mst@mellanox.co.il>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
MemFree devices need to reserve one shared receive queue (SRQ) work
request for internal use, so the capacity returned from the create_srq
and query_srq methods should be srq->max - 1.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix an oopsable race debugged by Eli Cohen <eli@mellanox.co.il>:
After removing the port from port_list, ib_mad_port_close flushes
port_priv->wq before destroying the special QPs. This means that a
completion event could arrive, and queue a new work in this work queue
after flush.
This patch also removes an unnecessary flush_workqueue():
destroy_workqueue() already includes a flush.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix leak found by Coverity: in the SRP_OPT_DGID case,
srp_parse_options() didn't free the result of match_strdup().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix bug found by coverity: the loop body never executed, because it
was doing for (i = 0; i < MTHCA_EQ_CMD; ++i), but MTHCA_EQ_CMD is 0.
The correct loop bound is MTHCA_NUM_EQ, to loop over all EQs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix two bugs found by coverity:
- Memory leak in error path of alloc_group_attrs()
- Fencepost error in state_show(): the test should be < ARRAY_SIZE(),
not <= ARRAY_SIZE().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move ipoib_ib_dev_flush() to ipoib's workqueue. This keeps it ordered
with respect to other work scheduled by the ipoib driver. This fixes
problems with races, for example:
- ipoib_ib_dev_flush() has started running because of an IB event
- user does ifconfig ib0 down
- ipoib_mcast_stop_thread() gets called twice and waits for the same
completion twice
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix the IPoIB build (which is broken in net-2.6.17 because of my
screw-up, which left out this chunk in ipoib_multicast.c).
The neighbour destructor is now in neigh_params, so we don't
need to clear it in the ops structure.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The old code incorrectly used the primary P_Key index as the alternate
index too.
Signed-off-by: Ami Perlmutter <amip@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for sending and receiving large RMPP transfers. The old
code supports transfers only as large as a single contiguous kernel
memory allocation. This patch uses linked list of memory buffers when
sending and receiving data to avoid needing contiguous pages for
larger transfers.
Receive side: copy the arriving MADs in chunks instead of coalescing
to one large buffer in kernel space.
Send side: split a multipacket MAD buffer to a list of segments,
(multipacket_list) and send these using a gather list of size 2.
Also, save pointer to last sent segment, and retrieve requested
segments by walking list starting at last sent segment. Finally,
save pointer to last-acked segment. When retrying, retrieve
segments for resending relative to this pointer. When updating last
ack, start at this pointer.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add SCSI host attributes in sysfs that show the ID extension, IOC
GUID, service ID, P_Key and destination GID for each target port that
the SRP initiator connects to.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move checking the state of a cm_id before modifying it when handling a
REP. This fixes a bug seen under MPI scale-up testing, where a NULL
timewait_info pointer is dereferenced if a request times out before a
REP is received.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Sinai (one-port PCI Express) HCAs get improved throughput for messages
bigger than 80 KB in DDR mode if memory keys are formatted in a
specific way. The enhancement only works if the memory key table is
smaller than 2^24 entries. For larger tables, the enhancement is off
and a warning is printed (to avoid silent performance loss).
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>