Commit graph

8655 commits

Author SHA1 Message Date
Al Viro
1076d17ac7 jbd/jbd2 NULL noise
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30 14:18:41 -07:00
Linus Torvalds
af8be4e4b3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock
  [PATCH] do shrink_submounts() for all fs types
  [PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts()
  [PATCH] count ghost references to vfsmounts
  [PATCH] reduce stack footprint in namespace.c
2008-03-28 15:23:01 -07:00
Sven Schnelle
5214b729e1 afs: prevent double cell registration
kafs doesn't check if the cell already exists - so if you do an echo "add
newcell.org 1.2.3.4" >/proc/fs/afs/cells it will try to create this cell
again.  kobject will also complain about a double registration.  To prevent
such problems, return -EEXIST in that case.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28 14:45:21 -07:00
Dmitri Monakhov
5b41e74ad1 vfs: fix data leak in nobh_write_end()
Current nobh_write_end() implementation ignore partial writes(copied < len)
case if page was fully mapped and simply mark page as Uptodate, which is
totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in
previous write_begin call.  It simply contains garbage from pagecache and
result in data leakage.

#TEST_CASE_BEGIN:
~~~~~~~~~~~~~~~~
In fact issue triggered by classical testcase
	open("/mnt/test", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
	ftruncate(3, 409600)                    = 0
	writev(3, [{"a", 1}, {NULL, 4095}], 2)  = 1
##TESTCASE_SOURCE:
~~~~~~~~~~~~~~~~~
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <sys/mman.h>
#include <errno.h>
int main(int argc, char **argv)
{
	int fd,  ret;
	void* p;
	struct iovec iov[2];
	fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666);
	ftruncate(fd, 409600);
	iov[0].iov_base="a";
	iov[0].iov_len=1;
	iov[1].iov_base=NULL;
	iov[1].iov_len=4096;
	ret = writev(fd, iov, sizeof(iov)/sizeof(struct iovec));
	printf("writev  = %d, err = %d\n", ret, errno);
	return 0;
}
##TESTCASE RESULT:
~~~~~~~~~~~~~~~~~~
[root@ts63 ~]# mount | grep mnt2
/dev/mapper/test on /mnt2 type ext2 (rw,nobh)
[root@ts63 ~]#  /tmp/writev /mnt2/test
writev  = 1, err = 0
[root@ts63 ~]# hexdump -C /mnt2/test

00000000  61 65 62 6f 6f 74 00 00  f0 b9 b4 59 3a 00 00 00  |aeboot.....Y:...|
00000010  20 00 00 00 00 00 00 00  21 00 00 00 00 00 00 00  | .......!.......|
00000020  df df df df df df df df  df df df df df df df df  |................|
00000030  3a 00 00 00 2a 00 00 00  21 00 00 00 00 00 00 00  |:...*...!.......|
00000040  60 c0 8c 00 00 00 00 00  40 4a 8d 00 00 00 00 00  |`.......@J......|
00000050  00 00 00 00 00 00 00 00  41 00 00 00 00 00 00 00  |........A.......|
00000060  74 69 6d 65 20 64 64 20  69 66 3d 2f 64 65 76 2f  |time dd if=/dev/|
00000070  6c 6f 6f 70 30 20 20 6f  66 3d 2f 64 65 76 2f 6e  |loop0  of=/dev/n|
skip..
00000f50  00 00 00 00 00 00 00 00  31 00 00 00 00 00 00 00  |........1.......|
00000f60  6d 6b 66 73 2e 65 78 74  33 20 2f 64 65 76 2f 76  |mkfs.ext3 /dev/v|
00000f70  7a 76 67 2f 74 65 73 74  20 2d 62 34 30 39 36 00  |zvg/test -b4096.|
00000f80  a0 fe 8c 00 00 00 00 00  21 00 00 00 00 00 00 00  |........!.......|
00000f90  23 31 32 30 35 39 35 30  34 30 34 00 3a 00 00 00  |#1205950404.:...|
00000fa0  20 00 8d 00 00 00 00 00  21 00 00 00 00 00 00 00  | .......!.......|
00000fb0  d0 cf 8c 00 00 00 00 00  10 d0 8c 00 00 00 00 00  |................|
00000fc0  00 00 00 00 00 00 00 00  41 00 00 00 00 00 00 00  |........A.......|
00000fd0  6d 6f 75 6e 74 20 2f 64  65 76 2f 76 7a 76 67 2f  |mount /dev/vzvg/|
00000fe0  74 65 73 74 20 20 2f 76  7a 20 2d 6f 20 64 61 74  |test  /vz -o dat|
00000ff0  61 3d 77 72 69 74 65 62  61 63 6b 00 00 00 00 00  |a=writeback.....|
00001000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

As you can see file's page contains garbage from pagecache instead of zeros.
#TEST_CASE_END

Attached patch:
- Add sanity check BUG_ON in order to prevent incorrect usage by caller,
  This is function invariant because page can has buffers and in no zero
  *fadata pointer at the same time.
- Always attach buffers to page is it is partial write case.
- Always switch back to generic_write_end if page has buffers.
  This is reasonable because if page already has buffer then generic_write_begin
  was called previously.

Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org>
Reviewed-by: Nick Piggin <npiggin@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28 14:45:21 -07:00
Al Viro
6758f953d0 [PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27 20:48:04 -04:00
Al Viro
c35038beca [PATCH] do shrink_submounts() for all fs types
... and take it out of ->umount_begin() instances.  Call with all locks
already taken (by do_umount()) and leave calling release_mounts() to
caller (it will do release_mounts() anyway, so we can just put into
the same list).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27 20:47:58 -04:00
Al Viro
bcc5c7d2b6 [PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts()
... and fix a race on access of ->mnt_share et.al. without namespace_sem
in the latter.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27 20:47:52 -04:00
Al Viro
7c4b93d826 [PATCH] count ghost references to vfsmounts
make propagate_mount_busy() exclude references from the vfsmounts
that had been isolated by umount_tree() and are just waiting for
release_mounts() to dispose of their ->mnt_parent/->mnt_mountpoint.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27 20:47:46 -04:00
Al Viro
1a39068954 [PATCH] reduce stack footprint in namespace.c
A lot of places misuse struct nameidata when they need struct path.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27 20:47:40 -04:00
Denis V. Lunev
0e5f8be138 [NETNS]: Compile NET /proc support only if CONFIG_NET is set.
This fix broken compilation for 'allnoconfig'. This was introduced by
Introduced by commit 1218854afa ("[NET]
NETNS: Omit seq_net_private->net without CONFIG_NET_NS.")

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 14:25:53 -07:00
YOSHIFUJI Hideaki
1218854afa [NET] NETNS: Omit seq_net_private->net without CONFIG_NET_NS.
Without CONFIG_NET_NS, no namespace other than &init_net exists,
no need to store net in seq_net_private.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:56 +09:00
Linus Torvalds
7ed7fe5e82 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [PATCH] get stack footprint of pathname resolution back to relative sanity
  [PATCH] double iput() on failure exit in hugetlb
  [PATCH] double dput() on failure exit in tiny-shmem
  [PATCH] fix up new filp allocators
  [PATCH] check for null vfsmount in dentry_open()
  [PATCH] reiserfs: eliminate private use of struct file in xattr
  [PATCH] sanitize hppfs
  hppfs pass vfsmount to dentry_open()
  [PATCH] restore export of do_kern_mount()
2008-03-25 08:57:47 -07:00
Andrew Morton
815d2d50da driver core: debug for bad dev_attr_show() return value.
Try to find the culprit who caused
http://bugzilla.kernel.org/show_bug.cgi?id=10150

Cc: <balajirrao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-03-24 22:33:49 -07:00
Linus Torvalds
0d995b2b44 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Fix mem leak on dfs referral
  [CIFS] file create with acl support enabled is slow
  [CIFS] Fix mtime on cp -p when file data cached but written out too late
  [CIFS] Fix build problem
  [CIFS] cifs: replace remaining __FUNCTION__ occurrences
  [CIFS]  DFS patch that connects inode with dfs handling ops
2008-03-22 17:05:31 -07:00
Hans Rosenfeld
f16278c679 Change pagemap output format to allow for future reporting of huge pages
Change pagemap output format to allow for future reporting of huge pages.

(Format comment and minor cleanups: mpm@selenic.com)

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-22 17:03:10 -07:00
Steve French
04b6e6ec1a [CIFS] Fix mem leak on dfs referral
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-03-22 22:57:44 +00:00
Linus Torvalds
7d3628b230 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
  [NET] ifb: set separate lockdep classes for queue locks
  [IPV6] KCONFIG: Fix description about IPV6_TUNNEL.
  [TCP]: Fix shrinking windows with window scaling
  netpoll: zap_completion_queue: adjust skb->users counter
  bridge: use time_before() in br_fdb_cleanup()
  [TG3]: Fix build warning on sparc32.
  MAINTAINERS: bluez-devel is subscribers-only
  audit: netlink socket can be auto-bound to pid other than current->pid (v2)
  [NET]: Fix permissions of /proc/net
  [SCTP]: Fix a race between module load and protosw access
  [NETFILTER]: ipt_recent: sanity check hit count
  [NETFILTER]: nf_conntrack_h323: logical-bitwise & confusion in process_setup()
  [RT2X00] drivers/net/wireless/rt2x00/rt2x00dev.c: remove dead code, fix warning
  [IPV4]: esp_output() misannotations
  [8021Q]: vlan_dev misannotations
  xfrm: ->eth_proto is __be16
  [IPV4]: ipv4_is_lbcast() misannotations
  [SUNRPC]: net/* NULL noise
  [SCTP]: fix misannotated __sctp_rcv_asconf_lookup()
  [PKT_SCHED]: annotate cls_u32
  ...
2008-03-21 07:57:45 -07:00
Andre Noll
4f42c288e6 [NET]: Fix permissions of /proc/net
commit e9720ac ([NET]: Make /proc/net a symlink on /proc/self/net (v3))
broke ganglia and probably other applications that read /proc/net/dev.

This is due to the change of permissions of /proc/net that was
introduced in that commit.

Before: dr-xr-xr-x 5 root root 0 Mar 19 11:30 /proc/net
After: dr-xr--r-- 5 root root 0 Mar 19 11:29 /proc/self/net

This patch restores the permissions to the old value which makes
ganglia happy again.

Pavel Emelyanov says:

	This also broke the postfix, as it was reported in bug #10286
	and described in detail by Benjamin.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-20 15:27:28 -07:00
Linus Torvalds
49ccf74aaf Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  nfs: don't ignore return value from nfs_pageio_add_request
2008-03-20 11:59:34 -07:00
Andrew Morton
9df130392f fs/ufs/balloc.c: fix sparc64 printk warning
fs/ufs/balloc.c: In function `ufs_change_blocknr':
fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 2)
fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 3)

sector_t is u64 and we don't know what type the architecture uses to implement
u64.

Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:37 -07:00
Dave Young
08ca0db8aa zisofs: fix readpage() outside i_size
A read request outside i_size will be handled in do_generic_file_read().  So
we just return 0 to avoid getting -EIO as normal reading, let
do_generic_file_read do the rest.

At the same time we need unlock the page to avoid system stuck.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10227

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Jan Kara <jack@suse.cz>
Report-by: Christian Perle <chris@linuxinfotag.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Randy Dunlap
a6b91919e0 fs: fix kernel-doc notation warnings
Fix kernel-doc notation warnings in fs/.

Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
 *	mark_files_ro
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
 *	lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
 *	lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
 * lookup_one_len:  filesystem helper to lookup single pathname component
Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
 * bh_uptodate_or_lock: Test whether the buffer is uptodate
Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
 * bh_submit_read: Submit a locked buffer for reading
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
 * writeback_acquire: attempt to get exclusive writeback access to a device
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
 * writeback_in_progress: determine whether there is writeback in progress
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
 * writeback_release: relinquish exclusive writeback access against a device.
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
 * void journal_invalidatepage()

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Michael Halcrow
5366dc9fd1 eCryptfs: Swap dput() and mntput()
ecryptfs_d_release() is doing a mntput before doing the dput.  This patch
moves the dput before the mntput.

Thanks to Rajouri Jammu for reporting this.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Rajouri Jammu <rajouri.jammu@gmail.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Duane Griffin
d00256766a jbd2: correctly unescape journal data blocks
Fix a long-standing typo (predating git) that will cause data corruption if a
journal data block needs unescaping.  At the moment the wrong buffer head's
data is being unescaped.

To test this case mount a filesystem with data=journal, start creating and
deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then
pull the plug on the device.  Without this patch the files will contain zeros
instead of the correct data after recovery.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Duane Griffin
439aeec639 jbd: correctly unescape journal data blocks
Fix a long-standing typo (predating git) that will cause data corruption if a
journal data block needs unescaping.  At the moment the wrong buffer head's
data is being unescaped.

To test this case mount a filesystem with data=journal, start creating and
deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then
pull the plug on the device.  Without this patch the files will contain zeros
instead of the correct data after recovery.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
David Howells
4ebf89845b ROMFS: Fix up an error in iget removal
Fix up an error in iget removal in which romfs_lookup() making a successful
call to romfs_iget() continues through the negative/error handling (previously
the successful case jumped around the negative/error handling case):

 (1) inode is initialised to NULL at the top of the function, eliminating the
     need for specific negative-inode handling.  This means the positive
     success handling now flows straight through.

 (2) Rename the labels to be clearer about what they mean.

Also make romfs_lookup()'s result variable of type long so as to avoid
32-bit/64-bit conversions with PTR_ERR() and friends.

Based upon a report and patch from Adam Richter.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Adam J. Richter" <adam@yggdrasil.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Josef Bacik
c587f0c0a6 ext3: fix wrong gfp type under transaction
There are several places where we make allocations with GFP_KERNEL while under
a transaction, which could lead to an assertion panic or lockup if under
memory pressure.  This patch switches these problem areas to use GFP_NOFS to
keep these problems from happening.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Jan Kara
87cb055bc1 quota: add possibly missing iput() when quotaon and quotaoff races
We should always put inode we have reference to, even if quota was reenabled
in the mean time.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:35 -07:00
Randy Dunlap
0cf01f6685 jbd: fix jbd kernel-doc notation
Fix kernel-doc notation in jbd.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:35 -07:00
Quentin Barnes
6cb2a21049 aio: bad AIO race in aio_complete() leads to process hang
My group ran into a AIO process hang on a 2.6.24 kernel with the process
sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come
and it never would.

We ran the tests on x86_64 SMP.  The hang only occurred on a Xeon box
("Clovertown") but not a Core2Duo ("Conroe").  On the Xeon, the L2 cache isn't
shared between all eight processors, but is L2 is shared between between all
two processors on the Core2Duo we use.

My analysis of the hang is if you go down to the second while-loop
in read_events(), what happens on processor #1:
	1) add_wait_queue_exclusive() adds thread to ctx->wait
	2) aio_read_evt() to check tail
	3) if aio_read_evt() returned 0, call [io_]schedule() and sleep

In aio_complete() with processor #2:
	A) info->tail = tail;
	B) waitqueue_active(&ctx->wait)
	C) if waitqueue_active() returned non-0, call wake_up()

The way the code is written, step 1 must be seen by all other processors
before processor 1 checks for pending events in step 2 (that were recorded by
step A) and step A by processor 2 must be seen by all other processors
(checked in step 2) before step B is done.

The race I believed I was seeing is that steps 1 and 2 were
effectively swapped due to the __list_add() being delayed by the L2
cache not shared by some of the other processors.  Imagine:
proc 2: just before step A
proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet
proc 1, step 2: checks tail and sees no pending events
proc 2, step A: updates tail
proc 1, step 3: calls [io_]schedule() and sleeps
proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup
                so proc 1 sleeps indefinitely

My patch adds a memory barrier between steps A and B.  It ensures that the
update in step 1 gets seen on processor 2 before continuing.  If processor 1
was just before step 1, the memory barrier makes sure that step A (update
tail) gets seen by the time processor 1 makes it to step 2 (check tail).

Before the patch our AIO process would hang virtually 100% of the time.  After
the patch, we have yet to see the process ever hang.

Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com>
Reviewed-by: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ We should probably disallow that "if (waitqueue_active()) wake_up()"
  coding pattern, because it's so often buggy wrt memory ordering ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:35 -07:00
Chuck Lever
0490a54a00 lockd: introduce new function to encode private argument in SM_MON requests
Clean up: refactor the encoding of the opaque 16-byte private argument in
xdr_encode_mon().  This will be updated later to support IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:01:10 -04:00
Chuck Lever
2ca7754d4c lockd: Fix up incorrect RPC buffer size calculations.
Switch to using the new mon_id encoder function.

Now that we've refactored the encoding of SM_MON requests, we've
discovered that the pre-computed buffer length maximums are
incorrect!

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:01:07 -04:00
Chuck Lever
ea72a7f170 lockd: document use of mon_id argument in SM_MON requests
Clean up: document the argument type that xdr_encode_common() is
marshalling by introducing a new function.  The new function will replace
xdr_encode_common() in just a sec.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:01:04 -04:00
Chuck Lever
850c95fd07 lockd: refactor SM_MON my_id argument encoder
Clean up: introduce a new XDR encoder specifically for the my_id
argument of SM_MON requests.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:01:01 -04:00
Chuck Lever
4969517431 lockd: refactor SM_MON mon_name argument encoder
Clean up: introduce a new XDR encoder specifically for the mon_name
argument of SM_MON requests.  This will be updated later to support IPv6
addresses in addition to IPv4 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:57 -04:00
Chuck Lever
099bd05f27 lockd: Ensure NSM strings aren't longer than protocol allows
Introduce a special helper function to check the length of NSM strings
before they are placed on the wire.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:54 -04:00
Chuck Lever
eb18860e13 NLM: NLM protocol version numbers are u32
Clean up: RPC protocol version numbers are u32.  Make sure we use an
appropriate type for NLM version numbers when calling nlm_lookup_host().

Eliminates a harmless mixed sign comparison in nlm_host_lookup().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:47 -04:00
Chuck Lever
90d5b18061 NLM: LOCKD fails to load if CONFIG_SYSCTL is not set
Bruce Fields says:
"By the way, we've got another config-related nit here:

	http://bugzilla.linux-nfs.org/show_bug.cgi?id=156

You can build lockd without CONFIG_SYSCTL set, but then the module will
fail to load."

For now, disable the sysctl registration calls in lockd if CONFIG_SYSCTL
is not enabled.  This allows the kernel to build properly if PROC_FS or
SYSCTL is not enabled, but an NFS client is desired.

In the long run, we would like to be able to build the kernel with an
NFS client but without lockd.  This makes sense, for example, if you want
an NFSv4-only NFS client, as NFSv4 doesn't use NLM at all.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:44 -04:00
Chuck Lever
1e40316bc4 SUNRPC: Add a default setting for CONFIG_SUNRPC_BIND34
Most distros will want support for rpcbind protocols 3 and 4 to default off
until they have integrated user-space support for the new rpcbind daemon
which supports IPv6 RPC services.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:41 -04:00
Chuck Lever
327a299d8a SUNRPC: Update help Kconfig text
Clean up: refresh the help text for Kconfig items related to the sunrpc
module.  Remove obsolete URLs, and make the language consistent among
the options.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:38 -04:00
Chuck Lever
ecfc555a83 NFS: Always enable NFS direct I/O
Since O_DIRECT is a standard feature that is enabled in most distros,
eliminate the CONFIG_NFS_DIRECTIO build option, and change the
fs/nfs/Makefile to always build in the NFS direct I/O engine.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:34 -04:00
Chuck Lever
82d101d58a NFS: Show most mount options via nfs_show_options()
Display all mount options in /proc/mount which may be needed to reconstruct
a previous mount.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:29 -04:00
Chuck Lever
3f8400d1f1 NFS: Save the values of the "mount*=" mount options
Save the value of the mountproto= mountport= mountvers= and mountaddr=
options so that these values can be displayed later via
nfs_show_options().

This preserves the intent of the original mount options, should the file
system need to be remounted based on what's displayed in /proc/mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:22 -04:00
Chuck Lever
f22d6d79fe NFS: Save the value of the "port=" mount option
During a remount based on the mount options displayed in /proc/mounts, we
want to preserve the original behavior of the mount request.  Let's save
the original setting of the "port=" mount option in the mount's nfs_server
structure.

This allows us to simplify the default behavior of port setting for NFSv4
mounts: by default, NFSv2/3 mounts first try an RPC bind to determine the
NFS server's port, unless the user specified the "port=" mount option;
Users can force the client to skip the RPC bind by explicitly specifying
"port=<value>".

NFSv4, by contrast, assumes the NFS server port is 2049 and skips the RPC
bind, unless the user specifies "port=".  Users can force an RPC bind for
NFSv4 by explicitly specifying "port=0".

I added a couple of extra comments to clarify this behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:19 -04:00
Chuck Lever
78fa701f34 NFS: Fix up data types of fields in nfs_parsed_mount_options
Clean up: make data types of fields in nfs_parsed_mount_options more
consistent with other uses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:16 -04:00
Chuck Lever
2d76743227 NFS: numeric mount parameters are unsigned
Clean up: use %u instead of %d when displaying NFS mount options.

Nit: Fix reporting of "namlen=" option in nfs_show_mount_stats.  The mount
option is called "namlen" without the "e".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:13 -04:00
Jeff Layton
7bda2cdf48 NFS: clean up short packet handling for NFSv4 readdir
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in decode_readdir().

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:10 -04:00
Jeff Layton
643f81115b NFS: clean up short packet handling for NFSv3 readdir
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in nfs3_xdr_readdirres().

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:06 -04:00
Jeff Layton
caa02bd540 NFS: clean up short packet handling for NFSv2 readdir
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in nfs_xdr_readdirres().

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:03 -04:00
Fred Isaman
4af68bffac nfs: remove duplicate initializations of nfs_read_data field
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 17:59:59 -04:00