Create a new file under /proc/self, called mountstats, where mounted file
systems can export information (configuration options, performance counters,
and so on). Use a mechanism similar to /proc/mounts and s_ops->show_options.
This mechanism does not violate namespace security, and is safe to use while
other processes are unmounting file systems.
Thanks to Mike Waychison for his review and comments.
Test-plan:
Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds server ip address to be printed out when "server
requires stronger authentication" error occured.
Signed-off-by: Levent Serinol <lserinol@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
read_cache_mtime is no longer used in nfs_inode. This patch removes
references of read_cache_mtime in the code comments.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
semaphore to mutex conversion.
the conversion was generated via scripts, and the result was validated
automatically via a script as well.
build and boot tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
this converts fs/nfs to kzalloc() usage.
compile tested with make allyesconfig
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs4_open_revalidate: 'res' may be used uninitialized
nfs4_callback_compound: ‘hdr_res.nops’ may be used uninitialized
'op_nr’ may be used uninitialized
encode_getattr_res: ‘savep’ may be used uninitialized
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If not, we cannot guarantee that idmap->idmap_dentry, gss_auth->dentry and
clnt->cl_dentry are valid dentries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds a request_module call to rpcauth_create which will try
to auto-load the kernel module for the requested authentication flavor.
For kernels with modular sunrpc, this reduces the admin overhead for
the user.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
My previous "const static" vs "static const" cleanup missed a single case,
patch below takes care of it.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we flush out writes in the case when someone calls utimes() in
order to set the file times.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, nlm_alloc_call tests for a signal before it even tries to
allocate memory.
Fix it so that it tries at least once.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I've been reading through fs/nfs/write.c trying to track down a bug
that seems to be related to pages loosing a refcount and getting
freed too early (you interested in detail??) and I spotted a little
bug which the following patch should fix.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, there is no serialisation between NFS asynchronous writebacks
and truncation at the page level due to the fact that nfs_sync_inode()
cannot lock the pages that it is about to write out.
This means that it is possible to be flushing out data (and calling something
like set_page_writeback()) while the page cache is busy evicting the page.
Oops...
Use the hooks provided in try_to_release_page() to ensure that dirty pages
are always written back to storage before we evict them.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nfs_open_context may live longer than the file descriptor that spawned
it, so it needs to carry a reference to the vfsmount. If not, then
generic_shutdown_super() may end up being called before reads and writes
have been flushed out.
Make a couple of functions static while we're at it...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
JFS: add uid, gid, and umask mount options
JFS: Take logsync lock before testing mp->lsn
JFS: kzalloc conversion
JFS: Add missing file from fa3241d24c
JFS: Use the kthread_ API
JFS: Fix regression. fsck complains if symlinks do not have INLINEEA attribute
JFS: ext2 inode attributes for jfs
JFS: semaphore to mutex conversion.
JFS: make buddy table static
JFS: Add back directory i_size calculations for legacy partitions
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (150 commits)
[PATCH] ipw2100: Update version ipw2100 stamp to 1.2.2
[PATCH] ipw2100: move mutex.h include from ipw2100.c to ipw2100.h
[PATCH] ipw2100: semaphore to mutexes conversion
[PATCH] ipw2100: Fix radiotap code gcc warning
[PATCH] ipw2100: add radiotap headers to packtes captured in monitor mode
[PATCH] ipw2x00: expend Copyright to 2006
[PATCH] drivers/net/wireless/ipw2200.c: fix an array overun
[PATCH] ieee80211: Don't update network statistics from off-channel packets.
[PATCH] ipw2200: Update ipw2200 version stamp to 1.1.1
[PATCH] ipw2200: switch to the new ipw2200-fw-3.0 image format
[PATCH] ipw2200: wireless extension sensitivity threshold support
[PATCH] ipw2200: Enables the "slow diversity" algorithm
[PATCH] ipw2200: Set a meaningful silence threshold value
[PATCH] ipw2200: export `debug' module param only if CONFIG_IPW2200_DEBUG
[PATCH] ipw2200: Change debug level for firmware error logging
[PATCH] ipw2200: Filter unsupported channels out in ad-hoc mode
[PATCH] ipw2200: Fix ipw_sw_reset() implementation inconsistent with comment
[PATCH] ipw2200: Fix rf_kill is activated after mode change with 'disable=1'
[PATCH] ipw2200: remove the WPA card associates to non-WPA AP checking
[PATCH] ipw2200: Add signal level to iwlist scan output
...
* 'block-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/block:
[PATCH] fix rmmod problems with elevator attributes, clean them up
[PATCH] elevator_t lifetime rules and sysfs fixes
[PATCH] noise removal: cfq-iosched.c
[PATCH] don't bother with refcounting for cfq_data
[PATCH] fix sysfs interaction and lifetime rules handling for queues
[PATCH] regularize blk_cleanup_queue() use
[PATCH] fix cfq_get_queue()/ioprio_set(2) races
[PATCH] deal with rmmod/put_io_context() races
[PATCH] stop elv_unregister() from rogering other iosched's data, fix locking
[PATCH] stop cfq from pinning queue down
[PATCH] make cfq_exit_queue() prune the cfq_io_context for that queue
[PATCH] fix the exclusion for ioprio_set()
[PATCH] keep sync and async cfq_queue separate
[PATCH] switch to use of ->key to get cfq_data by cfq_io_context
[PATCH] stop leaking cfq_data in cfq_set_request()
[PATCH] fix cfq hash lookups
[PATCH] fix locking in queue_requests_store()
[PATCH] fix double-free in blk_init_queue_node()
[PATCH] don't do exit_io_context() until we know we won't be doing any IO
Fix endianness handling of srq_limit: it is big-endian in the context
structure, so we need to swab it before returning it.
Also add support for srq_limit query for Tavor (non-MemFree) HCAs.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In neigh_add_path(), the queue of delayed packets can never be full,
because the queue is always freshly created and cannot be found by any
other code path. In fact, the test of the queue length is worse than
useless: if somehow the test ever triggered and path_rec_start() also
failed, then dev_kfree_skb_any() will be called twice on the same skb.
Fix this by deleting the useless test. Pointed out by Michael
S. Tsirkin <mst@mellanox.co.il>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
MemFree devices need to reserve one shared receive queue (SRQ) work
request for internal use, so the capacity returned from the create_srq
and query_srq methods should be srq->max - 1.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix an oopsable race debugged by Eli Cohen <eli@mellanox.co.il>:
After removing the port from port_list, ib_mad_port_close flushes
port_priv->wq before destroying the special QPs. This means that a
completion event could arrive, and queue a new work in this work queue
after flush.
This patch also removes an unnecessary flush_workqueue():
destroy_workqueue() already includes a flush.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix leak found by Coverity: in the SRP_OPT_DGID case,
srp_parse_options() didn't free the result of match_strdup().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix bug found by coverity: the loop body never executed, because it
was doing for (i = 0; i < MTHCA_EQ_CMD; ++i), but MTHCA_EQ_CMD is 0.
The correct loop bound is MTHCA_NUM_EQ, to loop over all EQs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix two bugs found by coverity:
- Memory leak in error path of alloc_group_attrs()
- Fencepost error in state_show(): the test should be < ARRAY_SIZE(),
not <= ARRAY_SIZE().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move ipoib_ib_dev_flush() to ipoib's workqueue. This keeps it ordered
with respect to other work scheduled by the ipoib driver. This fixes
problems with races, for example:
- ipoib_ib_dev_flush() has started running because of an IB event
- user does ifconfig ib0 down
- ipoib_mcast_stop_thread() gets called twice and waits for the same
completion twice
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix the IPoIB build (which is broken in net-2.6.17 because of my
screw-up, which left out this chunk in ipoib_multicast.c).
The neighbour destructor is now in neigh_params, so we don't
need to clear it in the ops structure.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The old code incorrectly used the primary P_Key index as the alternate
index too.
Signed-off-by: Ami Perlmutter <amip@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for sending and receiving large RMPP transfers. The old
code supports transfers only as large as a single contiguous kernel
memory allocation. This patch uses linked list of memory buffers when
sending and receiving data to avoid needing contiguous pages for
larger transfers.
Receive side: copy the arriving MADs in chunks instead of coalescing
to one large buffer in kernel space.
Send side: split a multipacket MAD buffer to a list of segments,
(multipacket_list) and send these using a gather list of size 2.
Also, save pointer to last sent segment, and retrieve requested
segments by walking list starting at last sent segment. Finally,
save pointer to last-acked segment. When retrying, retrieve
segments for resending relative to this pointer. When updating last
ack, start at this pointer.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add SCSI host attributes in sysfs that show the ID extension, IOC
GUID, service ID, P_Key and destination GID for each target port that
the SRP initiator connects to.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move checking the state of a cm_id before modifying it when handling a
REP. This fixes a bug seen under MPI scale-up testing, where a NULL
timewait_info pointer is dereferenced if a request times out before a
REP is received.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Sinai (one-port PCI Express) HCAs get improved throughput for messages
bigger than 80 KB in DDR mode if memory keys are formatted in a
specific way. The enhancement only works if the memory key table is
smaller than 2^24 entries. For larger tables, the enhancement is off
and a warning is printed (to avoid silent performance loss).
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The old code didn't convert from the kernel's enum correctly.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
According to the IB spec version 1.2, section 11.2.4.2, the current
table has a couple of mistakes where it allows the current QP state
(IB_QP_CUR_STATE) attribute. For the transitions:
RTS -> RTS: IB_QP_CUR_STATE should be allowed for all transports
SQD -> SQD: IB_QP_CUR_STATE should never be allowed
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipoib_mcast_stop_thread currently tests mcast->query and if it is
NULL, does not perform wait_for_completion on the mcast and frees the
mcast object directly.
However, since both operations are done without locking, it is
possible that ipoib_mcast_join_complete is in progress on this mcast
object and has set mcast->query to NULL already.
Solve this by:
- taking priv->lock before we change mcast->query in ipoib_mcast_join_complete,
and keeping it until we no longer need the mcast object
- taking priv->lock around mcast->query test in ipoib_mcast_stop_thread
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If posting receives in ipoib_ib_dev_open() fails, call
ipoib_ib_dev_stop() to move the device's QP back to the RESET state so
that we can try again later.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use a named enum for the HCA's internal page size, rather than having
magic values of 4096 and shifts by 12 all over the code. Also, fix
one minor bug in EQ handling: only one HCA page is mapped to the HCA
during initialization, but a full kernel page is unmapped during
cleanup. This might cause problems when PAGE_SIZE != 4096.
Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Check that the alternate P_Key index is in range when setting the
alternate path for a QP. Also make a cosmetic touch up to the debug
message printed when the main P_Key index is out of range.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for IB_SEND_FENCE flag in post_send methods.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipoib_mcast_send() tests mcast->ah twice. If this value is changed
between these two points, we leak an skb. However,
ipoib_mcast_join_finish() sets mcast->ah with no locking, so it could
race against ipoib_mcast_send().
As a solution, take priv->lock around assignment to mcast->ah thus
making sure ipoib_mcast_send() (which also takes priv->lock) is not in
flight.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Implement query_ah (except for AVs which are in HCA memory). This is
needed to implement RMPP duplicate session detection on sending side
(extraction of DGID/DLID and GRH flag from address handle).
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch is checks whether the HCA supports posting FW commands
through a doorbell page (user access region 0, or "UAR0"). If this is
supported, the driver maps UAR0 and uses it for FW commands. This can
be controlled by the value of a writable module parameter
fw_cmd_doorbell. When the parameter is 0, the commands are posted
through HCR using the old method; otherwise if HCA is capable commands
go through UAR0.
This use of UAR0 to post commands eliminates the need for polling the
"go" bit prior to posting a new command. Since reading from a PCI
device is much more expensive then issuing a posted write, it is
expected that issuing FW commands this way will provide better CPU
utilization.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Pass actual capacity of created SRQ back to userspace, so that
userspace can report accurate capacities. This requires an ABI bump,
to change struct ib_uverbs_create_srq_resp.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Have mthca's create_srq method return the actual capacity of the SRQ
that gets created. Also update comments in <rdma/ib_verbs.h> to
clarify that this is what is expected from ib_create_srq().
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>