Commit graph

22,184 commits

Author SHA1 Message Date
Al Viro
dfef6dcd35 unfuck proc_sysctl ->d_compare()
a) struct inode is not going to be freed under ->d_compare();
however, the thing PROC_I(inode)->sysctl points to just might.
Fortunately, it's enough to make freeing that sucker delayed,
provided that we don't step on its ->unregistering, clear
the pointer to it in PROC_I(inode) before dropping the reference
and check if it's NULL in ->d_compare().

b) I'm not sure that we *can* walk into NULL inode here (we recheck
dentry->seq between verifying that it's still hashed / fetching
dentry->d_inode and passing it to ->d_compare() and there's no
negative hashed dentries in /proc/sys/*), but if we can walk into
that, we really should not have ->d_compare() return 0 on it!
Said that, I really suspect that this check can be simply killed.
Nick?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-08 02:22:27 -05:00
Ryusuke Konishi
be667377a8 nilfs2: record used amount of each checkpoint in checkpoint list
This records the number of used blocks per checkpoint in each
checkpoint entry of cpfile.  Even though userland tools can get the
block count via nilfs_get_cpinfo ioctl, it was not updated by the
nilfs2 kernel code.  This fixes the issue and makes it available for
userland tools to calculate used amount per checkpoint.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
2011-03-08 14:58:31 +09:00
Ryusuke Konishi
ae191838b0 nilfs2: optimize rec_len functions
This is a similar change to those in ext2/ext3 codebase (commit
40a063f669 and a4ae309486, respectively).

The addition of 64k block capability in the rec_len_from_disk and
rec_len_to_disk functions added a bit of math overhead which slows
down file create workloads needlessly when the architecture cannot
even support 64k blocks.  This will cut the corner.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:30 +09:00
Ryusuke Konishi
4138ec2382 nilfs2: append blocksize info to warnings during loading super blocks
At present, the same warning message can be output twice when nilfs
detected a problem on super blocks:

 NILFS warning: broken superblock. using spare superblock.
 NILFS warning: broken superblock. using spare superblock.
 ...

This is because these super blocks are reloaded with the block size
written in a super block if it differs from the first block size, but
this repetition looks somewhat confusing.  So, we hint at what is
going on by appending block size information to those messages.

Reported-by: Wakko Warner <wakko@animx.eu.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:30 +09:00
Ryusuke Konishi
828b1c50ae nilfs2: add compat ioctl
The current FS_IOC_GETFLAGS/SETFLAGS/GETVERSION will fail if
application is 32 bit and kernel is 64 bit.

This issue is avoidable by adding compat_ioctl method.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:30 +09:00
Ryusuke Konishi
cde98f0f84 nilfs2: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION
Add support for the standard attributes set via chattr and read via
lsattr.  These attributes are already in the flags value in the nilfs2
inode, but currently we don't have any ioctl commands that expose them
to the userland.

Collaterally, this adds the FS_IOC_GETVERSION ioctl for getting
i_generation, which allows users to list the file's generation number
with "lsattr -v".

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:30 +09:00
Ryusuke Konishi
b253a3e4f2 nilfs2: tighten restrictions on inode flags
Nilfs has few rectrictions on which flags may be set on which inodes
like ext2/3/4 filesystems used to be.  Specifically DIRSYNC may only
be set on directories and IMMUTABLE and APPEND may not be set on
links.  Tighten that to disallow TOPDIR being set on non-directories
and only NODUMP and NOATIME to be set on non-regular file,
non-directories.

This introduces a flags masking function like those of extN and uses
it during inode creation.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:29 +09:00
Ryusuke Konishi
32f4aeb315 nilfs2: mark S_NOATIME on inodes only if NOATIME attribute is set
At present, nilfs marks S_NOATIME flag on all inodes.  This restricts
nilfs_set_inode_flags function so that it marks S_NOATIME only if a
given inode has an FS_NOATIME_FL flag.

Although nilfs does not support atime yet, touch_atime() still safely
returns on IS_NOATIME check since MS_NOATIME is always set on sb.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:29 +09:00
Ryusuke Konishi
f0c9f242f9 nilfs2: use common file attribute macros
Replaces uses of own inode flags (i.e. NILFS_SECRM_FL, NILFS_UNRM_FL,
NILFS_COMPR_FL, and so forth) with common inode flags, and removes the
own flag declarations.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:29 +09:00
Ryusuke Konishi
9954e7af14 nilfs2: add free entries count only if clear bit operation succeeded
Three functions of the current persistent object allocator,
nilfs_palloc_commit_free_entry, nilfs_palloc_abort_alloc_entry, and
nilfs_palloc_freev functions unconditionally add a counter after doing
clear bit operation on a bitmap block.

If the clear bit operation overlapped, the counter will not add up.
This fixes the issue by making the counter operations conditional.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:04 +09:00
Ryusuke Konishi
25b18d39cc nilfs2: decrement inodes count only if raw inode was successfully deleted
This fixes the issue that inodes count will not add up after removal
of raw inodes fails.  Hence, this prevents possible under flow of the
inodes count.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:04 +09:00
James Morris
fe3fa43039 Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next 2011-03-08 11:38:10 +11:00
James Morris
1cc26bada9 Merge branch 'master'; commit 'v2.6.38-rc7' into next 2011-03-08 10:55:06 +11:00
Mi Jinlong
5ece3cafbd nfsd41: modify the members value of nfsd4_op_flags
The members of nfsd4_op_flags, (ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS)
equals to  ALLOWED_AS_FIRST_OP, maybe that's not what we want.

OP_PUTROOTFH with op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS,
can't appears as the first operation with out SEQUENCE ops.

This patch modify the wrong value of ALLOWED_WITHOUT_FH etc which
was introduced by f9bb94c4.

Cc: stable@kernel.org
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07 12:10:33 -05:00
Kevin Coffman
b0b0c0a26e nfsd: add proc file listing kernel's gss_krb5 enctypes
Add a new proc file which lists the encryption types supported
by the kernel's gss_krb5 code.

Newer MIT Kerberos libraries support the assertion of acceptor
subkeys.  This enctype information allows user-land (svcgssd)
to request that the Kerberos libraries limit the encryption
types that it uses when generating the subkeys.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07 12:06:48 -05:00
Jesper Juhl
46d4cef9cf NFSD, VFS: Remove dead code in nfsd_rename()
Currently we have the following code in fs/nfsd/vfs.c::nfsd_rename() :

	...
	host_err = nfsd_break_lease(odentry->d_inode);
	if (host_err)
		goto out_drop_write;
	if (ndentry->d_inode) {
		host_err = nfsd_break_lease(ndentry->d_inode);
		if (host_err)
			goto out_drop_write;
	}
	if (host_err)
		goto out_drop_write;
	...

'host_err' is guaranteed to be 0 by the time we test 'ndentry->d_inode'.
If 'host_err' becomes != 0 inside the 'if' statement, then we goto
'out_drop_write'. So, after the 'if' statement there is no way that
'host_err' can be anything but 0, so the test afterwards is just dead
code.
This patch removes the dead code.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07 12:05:14 -05:00
Shan Wei
35079582e7 nfsd: kill unused macro definition
These macros had never been used for several years.
So, remove them.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07 12:05:09 -05:00
Namhyung Kim
f32cb53219 locks: use assign_type()
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07 12:05:09 -05:00
J. Bruce Fields
32b007b4e1 nfsd4: fix bad pointer on failure to find delegation
In case of a nonempty list, the return on error here is obviously bogus;
it ends up being a pointer to the list head instead of to any valid
delegation on the list.

In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky,
then renew_client may oops, and it may take an embarassingly long time to
figure out why.  Facepalm.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
IP: [<ffffffff81292965>] nfsd4_delegreturn+0x125/0x200
...

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07 11:44:53 -05:00
Eric Sandeen
d7433142b6 ext3: Always set dx_node's fake_dirent explicitly.
(crossport of 1f7bebb9e9
by Andreas Schlick <schlick@lavabit.com>)

When ext3_dx_add_entry() has to split an index node, it has to ensure that
name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck
won't recognise it as an intermediate htree node and consider the htree to
be corrupted.

CC: stable@kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-03-07 17:20:58 +01:00
Chris Mason
31339acd07 Btrfs: deal with short returns from copy_from_user
When copy_from_user is only able to copy some of the bytes we requested,
we may end up creating a partially up to date page.  To avoid garbage in
the page, we need to treat a partial copy as a zero length copy.

This makes the rest of the file_write code drop the page and
retry the whole copy instead of marking the partially up to
date page as dirty.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
cc: stable@kernel.org
2011-03-07 11:10:24 -05:00
Chris Mason
b1bf862e9d Btrfs: fix regressions in copy_from_user handling
Commit 914ee295af fixed deadlocks in
btrfs_file_write where we would catch page faults on pages we had
locked.

But, there were a few problems:

1) The x86-32 iov_iter_copy_from_user_atomic code always fails to copy
data when the amount to copy is more than 4K and the offset to start
copying from is not page aligned.  The result was btrfs_file_write
looping forever retrying the iov_iter_copy_from_user_atomic

We deal with this by changing btrfs_file_write to drop down to single
page copies when iov_iter_copy_from_user_atomic starts returning failure.

2) The btrfs_file_write code was leaking delalloc reservations when
iov_iter_copy_from_user_atomic returned zero.  The looping above would
result in the entire filesystem running out of delalloc reservations and
constantly trying to flush things to disk.

3) btrfs_file_write will lock down page cache pages, make sure
any writeback is finished, do the copy_from_user and then release them.
Before the loop runs we check the first and last pages in the write to
see if they are only being partially modified.  If the start or end of
the write isn't aligned, we make sure the corresponding pages are
up to date so that we don't introduce garbage into the file.

With the copy_from_user changes, we're allowing the VM to reclaim the
pages after a partial update from copy_from_user, but we're not
making sure the page cache page is up to date when we loop around to
resume the write.

We deal with this by pushing the up to date checks down into the page
prep code.  This fits better with how the rest of file_write works.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org>
cc: stable@kernel.org
2011-03-07 10:42:27 -05:00
Dave Chinner
9130090b5f xfs: kill support/debug.[ch]
The remaining functionality in debug.[ch] is effectively just assert
handling, conditional debug definitions and hex dumping. The hex
dumping and assert function can be moved into the new printk module,
while the rest can be moved into top-level header files. This allows
fs/xfs/support/debug.[ch] to be completely removed from the
codebase.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:09:35 +11:00
Dave Chinner
0b932cccbd xfs: Convert remaining cmn_err() callers to new API
Once converted, kill the remainder of the cmn_err() interface.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:08:35 +11:00
Dave Chinner
8221112b43 xfs: convert the quota debug prints to new API
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:07:35 +11:00
Dave Chinner
6d4a8ecb34 xfs: rename xfs_cmn_err_fsblock_zero()
The "cmn_err" part of the function name is no longer relevant. Rename
the function to xfs_alert_fsblock_zero() to match the new logging
API.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:06:35 +11:00
Dave Chinner
5348778699 xfs: convert xfs_fs_cmn_err to new error logging API
Continue to clean up the error logging code by converting all the
callers of xfs_fs_cmn_err() to the new API. Once done, remove the
unused old API function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:05:35 +11:00
Dave Chinner
af34e09da4 xfs: kill xfs_fs_mount_cmn_err() macro
The xfs_fs_mount_cmn_err() hides a simple check as to whether the
mount path should output an error or not. Remove the macro and open
code the check.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:04:35 +11:00
Dave Chinner
65333b4c3d xfs: kill xfs_fs_repair_cmn_err() macro
In certain cases of inode corruption, the xfs_fs_repair_cmn_err()
macro is used to output an extra message in the corruption report.
That extra message is "unmount and run xfs_repair", which really
applies to any corruption report. Each case that this macro is
called (except one) a following call to xfs_corruption_error() is
made to optionally dump more information about the error.

Hence, move the output of "run xfs_repair" to xfs_corruption_error()
so that it is output on all corruption reports.  Also, convert the
callers of the repair macro that don't call xfs_corruption_error()
to call it, hence provide consiѕtent error reporting for all cases
where xfs_fs_repair_cmn_err() used to be called.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:03:35 +11:00
Dave Chinner
6a19d9393a xfs: convert xfs_cmn_err to xfs_alert_tag
Continue the conversion of the old cmn_err interface be converting
all the conditional panic tag errors to xfs_alert_tag() and then
removing xfs_cmn_err().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:02:35 +11:00
Dave Chinner
a0fa2b679e xfs: Convert xlog_warn to new logging interface
Convert the xfs log operations to use the new error logging
interfaces. This removes the xlog_{warn,panic} wrappers and makes
almost all errors emit the device they belong to instead of just
refering to "XFS".

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:01:35 +11:00
Dave Chinner
4f10700a2e xfs: Convert linux-2.6/ files to new logging interface
Convert the files in fs/xfs/linux-2.6/ to use the new xfs_<level>
logging format that replaces the old Irix inherited cmn_err()
interfaces. While there, also convert naked printk calls to use the
relevant xfs logging function to standardise output format.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-07 10:00:35 +11:00
Al Viro
31be83aeae omfs: make readdir stop when filldir says so
filldir returning an error does *not* mean "skip this entry, try the
next one"...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
2011-03-05 16:24:12 -05:00
Al Viro
d932805b3d omfs: merge unlink() and rmdir(), close leak in rename()
In case of directory-overwriting rename(), omfs forgot to mark the
victim doomed, so omfs_evict_inode() didn't free it.

We could fix that by calling omfs_rmdir() for directory victims
instead of doing omfs_unlink(), but it's easier to merge omfs_unlink()
and omfs_rmdir() instead.  Note that we have no hardlinks here.

It also makes the checks in omfs_rename() go away - they fold into
what omfs_remove() does when it runs into a directory.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
2011-03-05 16:24:01 -05:00
Al Viro
cdb26496db omfs: stop playing silly buggers with omfs_unlink() in ->rename()
Since omfs directories are hashes of inodes and name is part of
inode, we have to remove inode from old directory before we can
put it into new one / under new name.  So instead of
	bump i_nlink
	call omfs_unlink, which does
		omfs_delete_entry()
		decrement i_nlink and mark parent dirty in case of success
	decrement i_nlink if omfs_unlink failed and hadn't done it itself
let's just call omfs_delete_entry() and dirty the parent ourselves...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
2011-03-05 16:23:39 -05:00
Al Viro
013e4f4a28 omfs: rename() needs to mark old_inode dirty after ctime update
we *do* mark it dirty before, but it doesn't guarantee that we
don't get preempted just before assignment to ->i_ctime, with
inode getting written out before we get CPU back...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
2011-03-05 16:20:30 -05:00
Linus Torvalds
fb62c00a6d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: no .snap inside of snapped namespace
  libceph: fix msgr standby handling
  libceph: fix msgr keepalive flag
  libceph: fix msgr backoff
  libceph: retry after authorization failure
  libceph: fix handling of short returns from get_user_pages
  ceph: do not clear I_COMPLETE from d_release
  ceph: do not set I_COMPLETE
  Revert "ceph: keep reference to parent inode on ceph_dentry"
2011-03-05 10:43:22 -08:00
Mingming Cao
198868f35d ext4: Use single thread to perform DIO unwritten convertion
While running ext4 testing on multiple core, we found there are per
cpu ext4-dio-unwritten threads processing conversion from unwritten
extents to written for IOs completed from async direct IO patch.  Per
filesystem is enough, we don't need per cpu threads to work on
conversion.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2011-03-05 11:52:45 -05:00
Matt Fleming
ae7eb8979c fs/locks.c: Remove stale FIXME left over from BKL conversion
The comment is no longer true as (now that the BKL conversion is
finished) a spinlock _is_ now used to protect file_lock_list,
blocked_list and inode->i_flock.

Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2011-03-05 10:55:59 +01:00
Neil Horman
e9e3d724e2 nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3)
The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):

  bad_page+0x69/0x91
  free_hot_cold_page+0x81/0x144
  skb_release_data+0x5f/0x98
  __kfree_skb+0x11/0x1a
  tcp_ack+0x6a3/0x1868
  tcp_rcv_established+0x7a6/0x8b9
  tcp_v4_do_rcv+0x2a/0x2fa
  tcp_v4_rcv+0x9a2/0x9f6
  do_timer+0x2df/0x52c
  ip_local_deliver+0x19d/0x263
  ip_rcv+0x539/0x57c
  netif_receive_skb+0x470/0x49f
  :virtio_net:virtnet_poll+0x46b/0x5c5
  net_rx_action+0xac/0x1b3
  __do_softirq+0x89/0x133
  call_softirq+0x1c/0x28
  do_softirq+0x2c/0x7d
  do_IRQ+0xec/0xf5
  default_idle+0x0/0x50
  ret_from_intr+0x0/0xa
  default_idle+0x29/0x50
  cpu_idle+0x95/0xb8
  start_kernel+0x220/0x225
  _sinittext+0x22f/0x236

It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.

We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:

1) ensure that when we create the list of pages, no page struct has
   PG_Slab set

 or

2) not use a page list to send this data

Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go.  I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it.  This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked.  We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages.  This way the data
will be properly freed when the ack comes in

Successfully tested by myself to solve the above oops.

Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: security@kernel.org
CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 17:28:52 -08:00
Sage Weil
455cec0abf ceph: no .snap inside of snapped namespace
Otherwise you can do things like

# mkdir .snap/foo
# cd .snap/foo/.snap
# ls
<badness>

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04 12:25:09 -08:00
Al Viro
1858efd471 minimal fix for do_filp_open() race
failure exits on the no-O_CREAT side of do_filp_open() merge with
those of O_CREAT one; unfortunately, if do_path_lookup() returns
-ESTALE, we'll get out_filp:, notice that we are about to return
-ESTALE without having trying to create the sucker with LOOKUP_REVAL
and jump right into the O_CREAT side of code.  And proceed to try
and create a file.  Usually that'll fail with -ESTALE again, but
we can race and get that attempt of pathname resolution to succeed.

open() without O_CREAT really shouldn't end up creating files, races
or not.  The real fix is to rearchitect the whole do_filp_open(),
but for now splitting the failure exits will do.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-04 13:14:21 -05:00
Linus Torvalds
8336026942 Merge branch 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  hfs: fix rename() over non-empty directory
  udf: fix i_nlink limit
  fix reiserfs mkdir() breakage
  exofs: i_nlink races in rename()
  nilfs2: i_nlink races in rename()
  minix: i_nlink races in rename()
  ufs: i_nlink races in rename()
  sysv: i_nlink races in rename()
2011-03-03 15:37:59 -08:00
Tao Ma
425fa41072 ext3: Fix an overflow in ext3_trim_fs.
In a bs=4096 volume, if we call FITRIM with the following parameter as
fstrim_range(start = 102400, len = 134144000, minlen = 10240), with the
following code:
if (len >= EXT3_BLOCKS_PER_GROUP(sb))
        len -= (EXT3_BLOCKS_PER_GROUP(sb) - first_block);
else
        last_block = first_block + len;

So if len < EXT3_BLOCKS_PER_GROUP while first_block + len >
EXT3_BLOCKS_PER_GROUP, last_block will be set to an overflow value
which exceeds EXT3_BLOCKS_PER_GROUP.

This patch fixes it and adjusts len and last_block accordingly.

Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-03-04 00:34:15 +01:00
Eric Paris
ff36fe2c84 LSM: Pass -o remount options to the LSM
The VFS mount code passes the mount options to the LSM.  The LSM will remove
options it understands from the data and the VFS will then pass the remaining
options onto the underlying filesystem.  This is how options like the
SELinux context= work.  The problem comes in that -o remount never calls
into LSM code.  So if you include an LSM specific option it will get passed
to the filesystem and will cause the remount to fail.  An example of where
this is a problem is the 'seclabel' option.  The SELinux LSM hook will
print this word in /proc/mounts if the filesystem is being labeled using
xattrs.  If you pass this word on mount it will be silently stripped and
ignored.  But if you pass this word on remount the LSM never gets called
and it will be passed to the FS.  The FS doesn't know what seclabel means
and thus should fail the mount.  For example an ext3 fs mounted over loop

# mount -o loop /tmp/fs /mnt/tmp
# cat /proc/mounts | grep /mnt/tmp
/dev/loop0 /mnt/tmp ext3 rw,seclabel,relatime,errors=continue,barrier=0,data=ordered 0 0
# mount -o remount /mnt/tmp
mount: /mnt/tmp not mounted already, or bad option
# dmesg
EXT3-fs (loop0): error: unrecognized mount option "seclabel" or missing value

This patch passes the remount mount options to an new LSM hook.

Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
2011-03-03 16:12:27 -05:00
Linus Torvalds
4c7fd114c6 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: zero proper structure size for geometry calls
2011-03-03 12:44:22 -08:00
Linus Torvalds
c640e13f8e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix regression that i-flag is not set on changeless checkpoints
2011-03-03 12:42:48 -08:00
Sage Weil
16a8b70a5a ceph: do not clear I_COMPLETE from d_release
First, this was racy anyway: d_release isn't called until well after the
dentry is unhashed.  Second, this runs afoul of the recent dcache change
that clears d_parent prior to calling d_release (949854d0), causing a NULL
pointer dereference.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03 10:09:52 -08:00
Sage Weil
b545cc1505 ceph: do not set I_COMPLETE
Do not set the I_COMPLETE flag on directories until we resolve races with
dcache pruning.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03 10:09:51 -08:00
Sage Weil
9bde178d05 Revert "ceph: keep reference to parent inode on ceph_dentry"
This reverts commit 97d79b403e.

This fails to account for d_parent changes due to rename or disconnected
dentries due to submounts or NFS reexports.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03 10:09:50 -08:00