Commit graph

19,389 commits

Author SHA1 Message Date
stephen hemminger
1cc523271e seq_file: add RCU versions of new hlist/list iterators (v3)
Many usages of seq_file use RCU protected lists, so non RCU
iterators will not work safely.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-22 15:45:54 -08:00
Jens Axboe
f11cbd74c5 Merge branch 'master' into for-2.6.34 2010-02-22 13:48:51 +01:00
Ben Myers
978ebd97d1 xfs_export_operations.commit_metadata
This is the commit_metadata export operation for XFS.

- Takes one inode to be committed.

- Forces the log up to the lsn of the inode.

- Doesn't force the log if the inode doesn't have a pincount.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
[bfields@citi.umich.edu: trivial whitespace fix]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-02-20 13:14:50 -08:00
Ben Myers
f501912a35 commit_metadata export operation replacing nfsd_sync_dir
- Add commit_metadata export_operation to allow the underlying filesystem to
decide how to commit an inode most efficiently.

- Usage of nfsd_sync_dir and write_inode_now has been replaced with the
commit_metadata function that takes a svc_fh.

- The commit_metadata function calls the commit_metadata export_op if it's
there, or else falls back to sync_inode instead of fsync and write_inode_now
because only metadata need be synced here.

- nfsd4_sync_rec_dir now uses vfs_fsync so that commit_metadata can be static

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-02-20 13:13:44 -08:00
David Howells
8f9941aecc CacheFiles: Fix a race in cachefiles_delete_object() vs rename
cachefiles_delete_object() can race with rename.  It gets the parent directory
of the object it's asked to delete, then locks it - but rename may have changed
the object's parent between the get and the completion of the lock.

However, if such a circumstance is detected, we abandon our attempt to delete
the object - since it's no longer in the index key path, it won't be seen
again by lookups of that key.  The assumption is that cachefilesd may have
culled it by renaming it to the graveyard for later destruction.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-20 10:06:35 -05:00
Jiro SEKIBA
0d561f12b4 nilfs2: add reader's lock for cno in nilfs_ioctl_sync
This adds reader's lock for the_nilfs->cno in nilfs_ioctl_sync,
for the_nilfs->cno should be proctected by segctor_sem when reading.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-20 21:18:19 +09:00
Chuck Ebbert
aeaa5ccd64 vfs: don't call ima_file_check() unconditionally in nfsd_open()
commit 1e41568d73 ("Take ima_path_check()
in nfsd past dentry_open() in nfsd_open()") moved this code back to its
original location but missed the "else".

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-20 00:47:31 -05:00
Yehuda Sadeh
bcd2cbd10c ceph: cleanup redundant code in handle_cap_grant
There is no state in local vars that requires us to loop after temporarily
dropping i_lock.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-19 14:41:10 -08:00
Yehuda Sadeh
c9af9fb68e ceph: don't truncate dirty pages in invalidate work thread
Instead of truncating the whole range of pages, we skip those
pages that are dirty or in the middle of writeback. Those pages
will be cleared later when the writeback completes.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-19 14:40:51 -08:00
Yehuda Sadeh
e63dc5c780 ceph: remove page upon writeback completion if lost cache cap
This page should have been removed earlier when the cache cap was
revoked, but a writeback was in flight, so it was skipped. We truncate
it here just as the writeback finishes, while it's still locked.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-19 14:34:18 -08:00
Sage Weil
5ecad6fd7b ceph: fix check for invalidate_mapping_pages success
We need to know whether there was any page left behind, and not the
return value (the total number of pages invalidated).  Look at the mapping
to see if we were successful or not.

Move it all into a helper to simplify the two callers.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-19 14:33:18 -08:00
Al Viro
7fee4868be Switch proc/self to nd_set_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-19 10:25:41 -05:00
Al Viro
ac278a9c50 fix LOOKUP_FOLLOW on automount "symlinks"
Make sure that automount "symlinks" are followed regardless of LOOKUP_FOLLOW;
it should have no effect on them.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-19 03:56:42 -05:00
Al Viro
c44dcc56d2 switch inotify_user to anon_inode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-19 03:35:12 -05:00
Jiro SEKIBA
03f29365e8 nilfs2: delete unnecessary condition in load_segment_summary
This is a trivial patch to remove unnecessary condition.

load_segment_summary() checks crc of segment_summary OR crc of whole
log data blocks based on boolean argument full_check.  However,
callers of the function pass only 1 as full_check, which means only
whole log data blocks checking code is running all the time.

This patch deletes the condition and full_check argument and also
deletes enum 'NILFS_SEG_FAIL_CHECKSUM_SEGSUM' and corresponding case
clause, for it is nolonger used anymore.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-18 20:09:03 +09:00
Sage Weil
2c27c9a57c ceph: fix typo in ceph_queue_writeback debug output
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 15:45:51 -08:00
Sage Weil
a17d6473cc ceph: v0.19 release
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 13:56:07 -08:00
Sage Weil
4fc51be8fa ceph: use rbtree for pg pools; decode new osdmap format
Since we can now create and destroy pg pools, the pool ids will be sparse,
and an array no longer makes sense for looking up by pool id.  Use an
rbtree instead.

The OSDMap encoding also no longer has a max pool count (previously used to
allocate the array).  There is a new pool_max, that is the largest pool id
we've ever used, although we don't actually need it in the client.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 10:02:49 -08:00
Sage Weil
9794b146fa ceph: fix memory leak when destroying osdmap with pg_temp mappings
Also move _lookup_pg_mapping into a helper.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 10:02:48 -08:00
Sage Weil
7c1332b8cb ceph: fix iterate_caps removal race
We need to be able to iterate over all caps on a session with a
possibly slow callback on each cap.  To allow this, we used to
prevent cap reordering while we were iterating.  However, we were
not safe from races with removal: removing the 'next' cap would
make the next pointer from list_for_each_entry_safe be invalid,
and cause a lock up or similar badness.

Instead, we keep an iterator pointer in the session pointing to
the current cap.  As before, we avoid reordering.  For removal,
if the cap isn't the current cap we are iterating over, we are
fine.  If it is, we clear cap->ci (to mark the cap as pending
removal) but leave it in the session list.  In iterate_caps, we
can safely finish removal and get the next cap pointer.

While we're at it, clean up put_cap to not take a cap reservation
context, as it was never used.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 10:02:47 -08:00
Sage Weil
85ccce43a3 ceph: clean up readdir caps reservation
Use a global counter for the minimum number of allocated caps instead of
hard coding a check against readdir_max.  This takes into account multiple
client instances, and avoids examining the superblock mount options when a
cap is dropped.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 10:02:43 -08:00
David S. Miller
2bb4646fce Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-02-16 22:09:29 -08:00
Sage Weil
5ce6e9dbe6 ceph: fix authentication races, auth_none oops
Call __validate_auth() under monc->mutex, and use helper for
initial hello so that the pending_auth flag is set.  This fixes
possible races in which we have an authentication request (hello
or otherwise) pending and send another one.  In particular, with
auth_none, we _never_ want to call ceph_build_auth() from
__validate_auth(), since the ->build_request() method is NULL.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:11 -08:00
Sage Weil
85ff03f6bf ceph: use rbtree for mon statfs requests
An rbtree is lighter weight, particularly given we will generally have
very few in-flight statfs requests.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:10 -08:00
Sage Weil
a105f00cf1 ceph: use rbtree for snap_realms
Switch from radix tree to rbtree for snap realms.  This is much more
appropriate given that realm keys are few and far between.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:09 -08:00
Sage Weil
44ca18f268 ceph: use rbtree for mds requests
The rbtree is a more appropriate data structure than a radix_tree.  It
avoids extra memory usage and simplifies the code.

It also fixes a bug where the debugfs 'mdsc' file wasn't including the
most recent mds request.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:08 -08:00
Sage Weil
91e45ce389 ceph: cancel delayed work when closing connection
This ensures that if/when we reopen the connection, we can requeue work on
the connection immediately, without waiting for an old timer to expire.
Queue new delayed work inside con->mutex to avoid any race.

This fixes problems with clients failing to reconnect to the MDS due to
the client_reconnect message arriving too late (due to waiting for an old
delayed work timeout to expire).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:07 -08:00
Sage Weil
e2663ab60d ceph: allow connection to be reopened by fault callback
Fix the messenger to allow a ceph_con_open() during the fault callback.
Previously the work wasn't getting queued on the connection because the
fault path avoids requeued work (normally spurious).  Loop on reopening by
checking for the OPENING state bit.

This fixes OSD reconnects when a TCP connection drops.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:03 -08:00
Tejun Heo
003cb608a2 percpu: add __percpu sparse annotations to fs
Add __percpu sparse annotations to fs.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-02-17 11:17:38 +09:00
Eric W. Biederman
7c0ff870d1 sysfs: sysfs_sd_setattr set iattrs unconditionally
There is currently a bug in sysfs_sd_setattr inherited from
sysfs_setattr in 2.6.32 where the first time we set the attributes
on a sysfs file we allocate backing store but do not set the
backing store attributes.  Resulting in overly restrictive
permissions on sysfs files.

The fix is to simply modify the code so that it always executes
when we update the sysfs attributes, as we did in 2.6.31 and earlier.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Jean Delvare <khali@linux-fr.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-16 15:42:42 -08:00
Curt Wohlgemuth
73b50c1c92 ext4: Fix BUG_ON at fs/buffer.c:652 in no journal mode
Calls to ext4_handle_dirty_metadata should only pass in an inode
pointer for inode-specific metadata, and not for shared metadata
blocks such as inode table blocks, block group descriptors, the
superblock, etc.

The BUG_ON can get tripped when updating a special device (such as a
block device) that is opened (so that i_mapping is set in
fs/block_dev.c) and the file system is mounted in no journal mode.

Addresses-Google-Bug: #2404870

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-02-16 15:06:29 -05:00
Linus Torvalds
0813e22d4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: btrfs_mark_extent_written uses the wrong slot
2010-02-15 19:56:21 -08:00
Chuck Lever
65d269538a NFS: Too many GETATTR and ACCESS calls after direct I/O
The cached read and write paths initialize fattr->time_start in their
setup procedures.  The value of fattr->time_start is propagated to
read_cache_jiffies by nfs_update_inode().  Subsequent calls to
nfs_attribute_timeout() will then use a good time stamp when
computing the attribute cache timeout, and squelch unneeded GETATTR
calls.

Since the direct I/O paths erroneously leave the inode's
fattr->time_start field set to zero, read_cache_jiffies for that inode
is set to zero after any direct read or write operation.  This
triggers an otw GETATTR or ACCESS call to update the file's attribute
and access caches properly, even when the NFS READ or WRITE replies
have usable post-op attributes.

Make sure the direct read and write setup code performs the same fattr
initialization as the cached I/O paths to prevent unnecessary GETATTR
calls.

This was likely introduced by commit 0e574af1 in 2.6.15, which appears
to add new nfs_fattr_init() call sites in the cached read and write
paths, but not in the equivalent places in fs/nfs/direct.c.  A
subsequent commit in the same series, 33801147, introduces the
fattr->time_start field.

Interestingly, the direct write reschedule path already has a call to
nfs_fattr_init() in the right place.

Reported-by: Quentin Barnes <qbarnes@yahoo-inc.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-15 19:53:43 -08:00
Linus Torvalds
0aa2ca9ae1 Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  reiserfs: Fix softlockup while waiting on an inode
2010-02-15 19:51:45 -08:00
dingdinghua
ba869023ea jbd2: delay discarding buffers in journal_unmap_buffer
Delay discarding buffers in journal_unmap_buffer until
we know that "add to orphan" operation has definitely been
committed, otherwise the log space of committing transation
may be freed and reused before truncate get committed, updates
may get lost if crash happens.

Signed-off-by: dingdinghua <dingdinghua@nrchpc.ac.cn>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-02-15 16:35:42 -05:00
Leonard Michlmayr
aca92ff6f5 ext4: correctly calculate number of blocks for fiemap
ext4_fiemap() rounds the length of the requested range down to
blocksize, which is is not the true number of blocks that cover the
requested region.  This problem is especially impressive if the user
requests only the first byte of a file: not a single extent will be
reported.

We fix this by calculating the last block of the region and then
subtract to find the number of blocks in the extents.

Signed-off-by: Leonard Michlmayr <leonard.michlmayr@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-03-04 17:07:28 -05:00
Sage Weil
153a008bf7 ceph: reset osd connections after fault
A single osd connection fault (e.g. tcp disconnect) wasn't
reopening the connection, which causes all current and future
requests for that osd to hang.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-15 12:11:51 -08:00
Roel Kluin
9aaab0589b ext4: add missing error checking to ext4_expand_extra_isize_ea()
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
2010-02-15 14:26:16 -05:00
Eric Sandeen
12062dddda ext4: move __func__ into a macro for ext4_warning, ext4_error
Just a pet peeve of mine; we had a mishash of calls with either __func__
or "function_name" and the latter tends to get out of sync.

I think it's easier to just hide the __func__ in a macro, and it'll
be consistent from then on.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-02-15 14:19:27 -05:00
Frederic Weisbecker
175359f89d reiserfs: Fix softlockup while waiting on an inode
When we wait for an inode through reiserfs_iget(), we hold
the reiserfs lock. And waiting for an inode may imply waiting
for its writeback. But the inode writeback path may also require
the reiserfs lock, which leads to a deadlock.

We just need to release the reiserfs lock from reiserfs_iget()
to fix this.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Chris Mason <chris.mason@oracle.com>
2010-02-14 19:07:56 +01:00
Jeremy Kerr
7c540d9e3d proc_devtree: fix THIS_MODULE without module.h
Commit e22f628395 introduced a build
breakage for ARM devtree work: the THIS_MODULE macro was added, but we
don't have module.h

This change adds the necessary #include to get THIS_MODULE defined.
While we could just replace it with NULL (PROC_FS is a bool, not a
tristate), using THIS_MODULE will prevent unexpected breakage if we
ever do compile this as a module.

Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michal Simek <monstr@monstr.eu>
2010-02-14 07:13:41 -07:00
Sage Weil
6c5d1a49e5 ceph: fix msgr to keep sent messages until acked
The test was backwards from commit b3d1dbbd: keep the message if the
connection _isn't_ lossy.  This allows the client to continue when the
TCP connection drops for some reason (network glitch) but both ends
survive.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-13 20:29:31 -08:00
Julia Lawall
d67b1b0325 fs/xfs: Correct NULL test
Test the value that was just allocated rather than the previously tested one.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r@
expression *x;
expression e;
identifier l;
@@

if (x == NULL || ...) {
    ... when forall
    return ...; }
... when != goto l;
    when != x = e
    when != &x
*x == NULL
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-02-13 13:22:53 -06:00
Ryusuke Konishi
d1c6b72a72 nilfs2: move iterator to write log into segment buffer
This moves iterator to submit write requests for a series of logs into
segbuf.c, and hides nilfs_segbuf_write() and nilfs_segbuf_wait() in
the file.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:03 +09:00
Ryusuke Konishi
e605f0a724 nilfs2: get rid of s_dirt flag use
This replaces s_dirt flag use in nilfs with a new flag added on the
nilfs object.  The s_dirt flag was used to indicate if
sop->write_super() should be called, however the current version of
nilfs does not use the callback.  Thus, it can be replaced with the
own flag.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
2010-02-13 12:26:03 +09:00
Ryusuke Konishi
dcd7618695 nilfs2: get rid of nilfs_segctor_req struct
This will clean up nilfs_segctor_req struct and the obscure request
argument passed among private methods of segment constructor.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:03 +09:00
Jiro SEKIBA
086d1764b2 nilfs2: delete unnecessary condition in nilfs_dat_translate
This is a trivial patch to delete unnecessary condition in nilfs_dat_translate.

nilfs_dat_translate() will asign translated address to *blocknrp if blocknrp
is not NULL.  However the condition is unneeded, because all callers of
nilfs_dat_translate() pass blocknrp properly.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:03 +09:00
Ryusuke Konishi
fe5f171bb2 nilfs2: fix potential hang in nilfs_error on errors=remount-ro
nilfs_error() calls nilfs_detach_segment_constructor() if
errors=remount-ro option is specified, and this may lead to a hang due
to recursive locking of, for instance, nilfs->ns_segctor_sem and
others.

In this case, detaching segment constructor is not necessary because
read-only flag is set to the filesystem and further writes are
blocked.

This fixes the potential hang issue by removing the
nilfs_detach_segment_constructor() call from nilfs_error.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:03 +09:00
Ryusuke Konishi
7512487e6d nilfs2: use mnt_want_write in ioctls where write access is needed
A few nilfs2 ioctls need to ask for and then later release write
access to the mount in order to avoid potential write to read-only
mounts.

This adds the missing mnt_want_write and mnt_drop_write in
nilfs_ioctl_change_cpmode, nilfs_ioctl_delete_checkpoint, and
nilfs_ioctl_clean_segments.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:02 +09:00
Jiro SEKIBA
e902ec9906 nilfs2: issue discard request after cleaning segments
This adds a function to send discard requests for given array of
segment numbers, and calls the function when garbage collection
succeeded.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:02 +09:00