Commit graph

60063 commits

Author SHA1 Message Date
Herbert Xu
a7ab4b501f [TCPv4]: Improve BH latency in /proc/net/tcp
Currently the code for /proc/net/tcp disable BH while iterating
over the entire established hash table.  Even though we call
cond_resched_softirq for each entry, we still won't process
softirq's as regularly as we would otherwise do which results
in poor performance when the system is loaded near capacity.

This anomaly comes from the 2.4 code where this was all in a
single function and the local_bh_disable might have made sense
as a small optimisation.

The cost of each local_bh_disable is so small when compared
against the increased latency in keeping it disabled over a
large but mostly empty TCP established hash table that we
should just move it to the individual read_lock/read_unlock
calls as we do in inet_diag.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:20 -07:00
Jamal Hadi Salim
c716a81ab9 [NET_SCHED]: Cleanup readability of qdisc restart
Over the years this code has gotten hairier. Resulting in many long
discussions over long summer days and patches that get it wrong.
This patch helps tame that code so normal people will understand it.

Thanks to Thomas Graf, Peter J. waskiewicz Jr, and Patrick McHardy
for their valuable reviews.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:16 -07:00
Allan Stephens
05646c9110 [TIPC]: Optimize stream send routine to avoid fragmentation
This patch enhances TIPC's stream socket send routine so that
it avoids transmitting data in chunks that require fragmentation
and reassembly, thereby improving performance at both the
sending and receiving ends of the connection.

The "maximum packet size" hint that records MTU info allows
the socket to decide how big a chunk it should send; in the
event that the hint has become stale, fragmentation may still
occur, but the data will be passed correctly and the hint will
be updated in time for the following send.  Note: The 66060 byte
pseudo-MTU used for intra-node connections requires the send
routine to perform an additional check to ensure it does not
exceed TIPC"s limit of 66000 bytes of user data per chunk.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Jon Paul Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:12 -07:00
Allan Stephens
5eee6a6dc9 [TIPC]: Use standard socket "not implemented" routines
This patch modifies TIPC's socket API to utilize existing
generic routines to indicate unsupported operations, rather
than adding similar TIPC-specific routines.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Jon Paul Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:09 -07:00
Allan Stephens
f3ec75f627 [TIPC]: Improved support for Ethernet traffic filtering
This patch simplifies TIPC's Ethernet receive routine to take
advantage of information already present in each incoming sk_buff
indicating whether the packet was explicitly sent to the interface,
has been broadcast to all interfaces, or was picked up because the
interface is in promiscous mode.

This new approach also fixes the problem of TIPC accepting unwanted
traffic through UML's multicast-based Ethernet interfaces (which
deliver traffic in a promiscuous manner even if the interface is
not configured to be promiscuous).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Jon Paul Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:02 -07:00
David S. Miller
e06e7c6158 [IPV4]: The scheduled removal of multipath cached routing support.
With help from Chris Wedgwood.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:05:57 -07:00
Mikko Rapeli
84950cf0ba [Bluetooth] Hangup TTY before releasing rfcomm_dev
The core problem is that RFCOMM socket layer ioctl can release
rfcomm_dev struct while RFCOMM TTY layer is still actively using
it. Calling tty_vhangup() is needed for a synchronous hangup before
rfcomm_dev is freed.

Addresses the oops at http://bugzilla.kernel.org/show_bug.cgi?id=7509

Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-07-11 07:01:26 +02:00
Robert P. J. Day
924f0e4a06 [Bluetooth] Remove the redundant non-seekable llseek method
Remove the llseek method given that the open method already calls
nonseekable_open().

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-07-11 06:53:45 +02:00
Sean Hefty
6164c8cd13 IB/cm: Send no match if a SIDR REQ does not match a listen
If a SIDR REQ does not match a listen, we should reply with status
value 1 (service ID not supported), rather than dropping through to
the default case of status 2 (rejected by service provider).

Doing this also fixes a bug where the cm_id_priv is removed from the
remote_sidr_table twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:52:28 -07:00
Sean Hefty
29c2731cbf IB/cm: Fix handling of duplicate SIDR REQs
Fix handling to duplicate SIDR REQs to avoid sending a reject if a
duplicate is detected.  Duplicates should just be silently discarded.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:51:43 -07:00
Sean Hefty
5d861be8c8 IB/cm: cm_msgs.h should include ib_cm.h
cm_msgs.h uses definitions from ib_cm.h.  Include it directly, rather
than depending on a specific include order.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:50:53 -07:00
Sean Hefty
1d84612649 IB/cm: Include HCA ACK delay in local ACK timeout
The IB CM should include the HCA ACK delay when calculating the local
ACK timeout value to use for RC QPs.  If the HCA ACK delay is large
enough relative to the packet life time, then if it is not taken into
account, the calculated timeout value ends up being too small, which
can result in "retry exceeded" errors.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:50:05 -07:00
Sean Hefty
24be6e81c7 IB/cm: Use spin_lock_irq() instead of spin_lock_irqsave() when possible
The ib_cm is a little over zealous about using spin_lock_irqsave,
when spin_lock_irq would do.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:47:29 -07:00
Sean Hefty
2aec5c602c IB/sa: Make sure SA queries use default P_Key
MADs sent to the SA should use the the default P_Key (0x7fff/0xffff).
There's no requirement that the default P_Key is stored at index 0 in
the local P_Key table, so add code to the sa_query module to look up
the index of the default P_Key when creating an address handle for the
SA (which is done any time the P_Key table might change), and use this
index for all SA queries.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:45:31 -07:00
Marcel Holtmann
babf4d42d0 [Bluetooth] Use hci_recv_fragment() within HCI USB driver
This patch modifies the HCI USB driver to use the new helper function
for reassembling HCI data packets and events.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-07-11 06:42:35 +02:00
Marcel Holtmann
ef222013fc [Bluetooth] Add hci_recv_fragment() helper function
Most drivers must handle fragmented HCI data packets and events. This
patch adds a generic function for their reassembly to the Bluetooth
core layer and thus allows to shrink the complexity of the drivers.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-07-11 06:42:04 +02:00
J. Bruce Fields
d8558f99fb sunrpc: drop BKL around wrap and unwrap
We don't need the BKL when wrapping and unwrapping; and experiments by Avishay
Traeger have found that permitting multiple encryption and decryption
operations to proceed in parallel can provide significant performance
improvements.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Avishay Traeger <atraeger@cs.sunysb.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:50 -04:00
Frank Filz
137d6acaa6 NFSv4: Make sure unlock is really an unlock when cancelling a lock
I ran into a curious issue when a lock is being canceled. The
cancellation results in a lock request to the vfs layer instead of an
unlock request. This is particularly insidious when the process that
owns the lock is exiting. In that case, sometimes the erroneous lock is
applied AFTER the process has entered zombie state, preventing the lock
from ever being released. Eventually other processes block on the lock
causing a slow degredation of the system. In the 2.6.16 kernel this was
investigated on, the problem is compounded by the fact that the cl_sem
is held while blocking on the vfs lock, which results in most processes
accessing the nfs file system in question hanging.

In more detail, here is how the situation occurs:

first _nfs4_do_setlk():

static int _nfs4_do_setlk(struct nfs4_state *state, int cmd, struct file_lock *fl, int reclaim)
...
        ret = nfs4_wait_for_completion_rpc_task(task);
        if (ret == 0) {
...
        } else
                data->cancelled = 1;

then nfs4_lock_release():

static void nfs4_lock_release(void *calldata)
...
        if (data->cancelled != 0) {
                struct rpc_task *task;
                task = nfs4_do_unlck(&data->fl, data->ctx, data->lsp,
                                data->arg.lock_seqid);

The problem is the same file_lock that was passed in to _nfs4_do_setlk()
gets passed to nfs4_do_unlck() from nfs4_lock_release(). So the type is
still F_RDLCK or FWRLCK, not F_UNLCK. At some point, when cancelling the
lock, the type needs to be changed to F_UNLCK. It seemed easiest to do
that in nfs4_do_unlck(), but it could be done in nfs4_lock_release().
The concern I had with doing it there was if something still needed the
original file_lock, though it turns out the original file_lock still
needs to be modified by nfs4_do_unlck() because nfs4_do_unlck() uses the
original file_lock to pass to the vfs layer, and a copy of the original
file_lock for the RPC request.

It seems like the simplest solution is to force all situations where
nfs4_do_unlck() is being used to result in an unlock, so with that in
mind, I made the following change:

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Frank van Maarseveen
c98451bdb2 NLM: fix source address of callback to client
Use the destination address of the original NLM request as the
source address in callbacks to the client.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Frank van Maarseveen
d3bc9a1deb SUNRPC client: add interface for binding to a local address
In addition to binding to a local privileged port the NFS client should
allow binding to a specific local address. This is used by the server
for callbacks. The patch adds the necessary interface.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Frank van Maarseveen
a97476926e SUNRPC server: record the destination address of a request
Save the destination address of an incoming request over TCP like is
done already for UDP. It is necessary later for callbacks by the server.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Frank van Maarseveen
96802a0951 SUNRPC: cleanup transport creation argument passing
Cleanup argument passing to functions for creating an RPC transport.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Trond Myklebust
6f2e64d3e1 NFSv4: Make the NFS state model work with the nosharedcache mount option
Consider the case where the user has mounted the remote filesystem
server:/foo on the two local directories /bar and /baz using the
nosharedcache mount option. The files /bar/file and /baz/file are
represented by different inodes in the local namespace, but refer to the
same file /foo/file on the server.
Consider the case where a process opens both /bar/file and /baz/file, then
closes /bar/file: because the nfs4_state is not shared between /bar/file
and /baz/file, the kernel will see that the nfs4_state for /bar/file is no
longer referenced, so it will send off a CLOSE rpc call. Unless the
open_owners differ, then that CLOSE call will invalidate the open state on
/baz/file too.

Conclusion: we cannot share open state owners between two different
non-shared mount instances of the same filesystem.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Trond Myklebust
275a5d24bf NFS: Error when mounting the same filesystem with different options
Unless the user sets the NFS_MOUNT_NOSHAREDCACHE mount flag, we should
return EBUSY if the filesystem is already mounted on a superblock that
has set conflicting mount options.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Trond Myklebust
75180df2ed NFS: Add the mount option "nosharecache"
Prior to David Howell's mount changes in 2.6.18, users who mounted
different directories which happened to be from the same filesystem on the
server would get different super blocks, and hence could choose different
mount options. As long as there were no hard linked files that crossed from
one subtree to another, this was quite safe.
Post the changes, if the two directories are on the same filesystem (have
the same 'fsid'), they will share the same super block, and hence the same
mount options.

Add a flag to allow users to elect not to share the NFS super block with
another mount point, even if the fsids are the same. This will allow
users to set different mount options for the two different super blocks, as
was previously possible. It is still up to the user to ensure that there
are no cache coherency issues when doing this, however the default
behaviour will be to share super blocks whenever two paths result in
the same fsid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Chuck Lever
8007122520 NFS: Add support for mounting NFSv4 file systems with string options
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Chuck Lever
136d558ce7 NFS: Add final pieces to support in-kernel mount option parsing
Hook in final components required for supporting in-kernel mount option
parsing for NFSv2 and NFSv3 mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Chuck Lever
0076d7b7ba NFS: Introduce generic mount client API
For NFSv2 and v3 mounts, the first step is to contact the server's MOUNTD
and request the file handle for the root of the mounted share.  Add a
function to the NFS client that handles this operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:47 -04:00
Chuck Lever
bf0fd7680f NFS: Add enums and match tables for mount option parsing
This generic infrastructure works for both NFS and NFSv4 mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:47 -04:00
Chuck Lever
013a8c1ab5 NFS: Improve debugging output in NFS in-kernel mount client
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:47 -04:00
Chuck Lever
19207231c9 NFS: Clean up in-kernel NFS mount
Clean up white space and coding conventions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever
3ea97309e6 NFS: Remake nfsroot_mount as a permanent part of NFS client
In preparation for supporting NFSv2 and NFSv3 mount option handling in the
kernel NFS client, convert mount_clnt.c to be a permanent part of the NFS
client, instead of built only when CONFIG_ROOT_NFS is enabled.

In addition, we also replace the "struct sockaddr_in *" argument with
something more generic, to help support IPv6 at some later point.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever
43780b87fa SUNRPC: Add a convenient default for the hostname when calling rpc_create()
A couple of callers just use a stringified IP address for the rpc client's
hostname.  Move the logic for constructing this into rpc_create(), so it can
be shared.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever
45160d6275 SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name
Clean up, for consistency.  Rename rpcb_getport as rpcb_getport_async, to
match the naming scheme of rpcb_getport_sync.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever
cce63cd637 SUNRPC: Rename rpcb_getport_external routine
In preparation for handling NFS mount option parsing in the kernel,
rename rpcb_getport_external as rpcb_get_port_sync, and make it available
always (instead of only when CONFIG_ROOT_NFS is enabled).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever
f7fb558e50 SUNRPC: Allow rpcbind requests to be interrupted by a signal.
This allows NFS mount requests and RPC re-binding to be interruptible if the
server isn't responding.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:45 -04:00
Chuck Lever
f0768ebd09 NFS: Introduce nfs4_validate_mount_options
Refactor NFSv4 mount processing to break out mount data validation
in the same way it's broken out in the NFSv2/v3 mount path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:45 -04:00
Chuck Lever
5df36e78da NFS: Clean up nfs_validate_mount_data
Move error handling code out of the main code path.  The switch statement
was also improperly indented, according to Documentation/CodingStyle.  This
prepares nfs_validate_mount_data for the addition of option string parsing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:45 -04:00
Chuck Lever
f18289931d NFS: Add a new NFS debugging flag just for mount processing
Note to self: fix up /usr/sbin/rpcdebug too

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:45 -04:00
Chuck Lever
fc50d58fd0 NFS: Clean-up: Refactor IP address sanity checks in NFS client
NFS and NFSv4 mounts can now share server address sanity checking.  And, it
provides an easy mechanism for adding IPv6 address checking at some later
point.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
4d81cd1611 NFS: Clean-up: fix a compiler warning in fs/nfs/super.c
/home/cel/linux/fs/nfs/super.c: In function 'nfs_pseudoflavour_to_name':
/home/cel/linux/fs/nfs/super.c:270: warning: comparison between signed and unsigned

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
0655960f76 NFS: Clean up error handling in nfs_get_sb
The error return logic in nfs_get_sb now matches nfs4_get_sb, and is more maintainable.
A subsequent patch will take advantage of this simplification.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
29eb981a3b NFS: Clean-up: Replace nfs_copy_user_string with strndup_user
The new string utility function strndup_user can be used instead of
nfs_copy_user_string, eliminating an unnecessary duplication of function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
5680d48be8 NFS: Clean-up: Define macros for maximum host and export path name lengths
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
9eaa67c6a5 NFS: Clean-up: use correct type when converting NFS blocks to local blocks
inode->i_blocks is a blkcnt_t these days, which can be a u64 or unsigned
long, depending on the setting of CONFIG_LSF.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
433c92379d NFS: Clean up nfs_size_to_loff_t()
Use the same file size limit that lockd uses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust
8bda4e4c98 NFSv4: Fix up stateid locking...
We really don't need to grab both the state->so_owner and the
inode->i_lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust
1ac7e2fd35 NFSv4: Clean up the callers of nfs4_open_recover_helper()
Rely on nfs4_try_open_cached() when appropriate.

Also fix an RCU violation in _nfs4_do_open_reclaim()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust
6ee4126890 NFSv4: Don't call OPEN if we already have an open stateid for a file
If we already have a stateid with the correct open mode for a given file,
then we can reuse that stateid instead of re-issuing an OPEN call without
violating the close-to-open caching semantics.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust
aac00a8d0a NFSv4: Check for the existence of a delegation in nfs4_open_prepare()
We should not be calling open() on an inode that has a delegation unless
we're doing a reclaim.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00