Commit graph

17,603 commits

Author SHA1 Message Date
Yehuda Sadeh
88d892a37f ceph: don't clobber write return value when using O_SYNC
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-23 14:26:36 -08:00
Sage Weil
a1ea787c7b ceph: fix client_request_forward decoding
The tid is in the message header, not body.  Broken since 6df058c0.

No need to look at next mds session; just mark the request and be done.
(The old error path was broken too, but now it's gone.)

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-23 14:26:36 -08:00
Sage Weil
2600d2dd50 ceph: drop messages on unregistered mds sessions; cleanup
Verify the mds session is currently registered before handling
incoming messages.  Clean up message handlers to pull mds out
of session->s_mds instead of less trustworthy src field.

Clean up con_{get,put} debug output.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-23 14:26:35 -08:00
Sage Weil
a6369741c4 ceph: fix comments, locking in destroy_inode
The destroy_inode path needs no inode locks since there are no
inode references.  Update __ceph_remove_cap comment to reflect
that it is called without cap->session->s_mutex in this case.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-23 14:26:35 -08:00
Alexander Beregalov
4ce1e9adab ceph: move dereference after NULL test
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-23 14:26:34 -08:00
Sage Weil
5b3a4db3e4 ceph: fix up unexpected message handling
Fix skipping of unexpected message types from osd, mon.

Clean up pr_info and debug output.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-23 14:26:29 -08:00
Steve French
96c03bccc7 [CIFS] Minor cleanup to EA patch
CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-23 20:51:43 +00:00
Jeff Layton
31c0519f7a cifs: merge CIFSSMBQueryEA with CIFSSMBQAllEAs
Add an "ea_name" parameter to CIFSSMBQAllEAs. When it's set make it
behave like CIFSSMBQueryEA does now. The current callers of
CIFSSMBQueryEA are converted to use CIFSSMBQAllEAs, and the old
CIFSSMBQueryEA function is removed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-23 20:47:32 +00:00
Jeff Layton
0cd126b504 cifs: verify lengths of QueryAllEAs reply
Make sure the lengths in a QUERY_ALL_EAS reply don't make the parser walk
off the end of the SMB.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-23 20:47:11 +00:00
Jeff Layton
e529614ad0 cifs: increase maximum buffer size in CIFSSMBQAllEAs
It's 4000 now, but there's no reason to limit it to that. We should be
able to handle a response up to CIFSMaxBufSize.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-23 20:46:49 +00:00
Jeff Layton
6e462b9f2c cifs: rename name_len to list_len in CIFSSMBQAllEAs
...for clarity and so we can reuse the name for the real name_len.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-23 20:46:27 +00:00
Jeff Layton
f0d3868b78 cifs: clean up indentation in CIFSSMBQAllEAs
Add a label that we can goto on error, and reduce some of the
if/then/else indentation in this function.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-23 20:45:52 +00:00
Jeff Layton
370b41911c cifs: add parens around smb_var in BCC macros
...to remove ambiguity about how these values are interpreted when
passing in more complex values as arguments.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-23 20:45:21 +00:00
Michael Neuling
a17e18790a fs/exec.c: fix initial stack reservation
803bf5ec25 ("fs/exec.c: restrict initial
stack space expansion to rlimit") attempts to limit the initial stack to
20*PAGE_SIZE.  Unfortunately, in attempting ensure the stack is not
reduced in size, we ended up not changing the stack at all.

This size reduction check is not necessary as the expand_stack call does
this already.

This caused a regression in UML resulting in most guest processes being
killed.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jouni Malinen <j@w1.fi>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-22 19:50:34 -08:00
stephen hemminger
1cc523271e seq_file: add RCU versions of new hlist/list iterators (v3)
Many usages of seq_file use RCU protected lists, so non RCU
iterators will not work safely.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-22 15:45:54 -08:00
Jens Axboe
f11cbd74c5 Merge branch 'master' into for-2.6.34 2010-02-22 13:48:51 +01:00
Ben Myers
978ebd97d1 xfs_export_operations.commit_metadata
This is the commit_metadata export operation for XFS.

- Takes one inode to be committed.

- Forces the log up to the lsn of the inode.

- Doesn't force the log if the inode doesn't have a pincount.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
[bfields@citi.umich.edu: trivial whitespace fix]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-02-20 13:14:50 -08:00
Ben Myers
f501912a35 commit_metadata export operation replacing nfsd_sync_dir
- Add commit_metadata export_operation to allow the underlying filesystem to
decide how to commit an inode most efficiently.

- Usage of nfsd_sync_dir and write_inode_now has been replaced with the
commit_metadata function that takes a svc_fh.

- The commit_metadata function calls the commit_metadata export_op if it's
there, or else falls back to sync_inode instead of fsync and write_inode_now
because only metadata need be synced here.

- nfsd4_sync_rec_dir now uses vfs_fsync so that commit_metadata can be static

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-02-20 13:13:44 -08:00
David Howells
8f9941aecc CacheFiles: Fix a race in cachefiles_delete_object() vs rename
cachefiles_delete_object() can race with rename.  It gets the parent directory
of the object it's asked to delete, then locks it - but rename may have changed
the object's parent between the get and the completion of the lock.

However, if such a circumstance is detected, we abandon our attempt to delete
the object - since it's no longer in the index key path, it won't be seen
again by lookups of that key.  The assumption is that cachefilesd may have
culled it by renaming it to the graveyard for later destruction.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-20 10:06:35 -05:00
Jiro SEKIBA
0d561f12b4 nilfs2: add reader's lock for cno in nilfs_ioctl_sync
This adds reader's lock for the_nilfs->cno in nilfs_ioctl_sync,
for the_nilfs->cno should be proctected by segctor_sem when reading.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-20 21:18:19 +09:00
Chuck Ebbert
aeaa5ccd64 vfs: don't call ima_file_check() unconditionally in nfsd_open()
commit 1e41568d73 ("Take ima_path_check()
in nfsd past dentry_open() in nfsd_open()") moved this code back to its
original location but missed the "else".

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-20 00:47:31 -05:00
Yehuda Sadeh
bcd2cbd10c ceph: cleanup redundant code in handle_cap_grant
There is no state in local vars that requires us to loop after temporarily
dropping i_lock.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-19 14:41:10 -08:00
Yehuda Sadeh
c9af9fb68e ceph: don't truncate dirty pages in invalidate work thread
Instead of truncating the whole range of pages, we skip those
pages that are dirty or in the middle of writeback. Those pages
will be cleared later when the writeback completes.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-19 14:40:51 -08:00
Yehuda Sadeh
e63dc5c780 ceph: remove page upon writeback completion if lost cache cap
This page should have been removed earlier when the cache cap was
revoked, but a writeback was in flight, so it was skipped. We truncate
it here just as the writeback finishes, while it's still locked.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-19 14:34:18 -08:00
Sage Weil
5ecad6fd7b ceph: fix check for invalidate_mapping_pages success
We need to know whether there was any page left behind, and not the
return value (the total number of pages invalidated).  Look at the mapping
to see if we were successful or not.

Move it all into a helper to simplify the two callers.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-19 14:33:18 -08:00
Al Viro
7fee4868be Switch proc/self to nd_set_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-19 10:25:41 -05:00
Al Viro
ac278a9c50 fix LOOKUP_FOLLOW on automount "symlinks"
Make sure that automount "symlinks" are followed regardless of LOOKUP_FOLLOW;
it should have no effect on them.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-19 03:56:42 -05:00
Al Viro
c44dcc56d2 switch inotify_user to anon_inode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-19 03:35:12 -05:00
Jiro SEKIBA
03f29365e8 nilfs2: delete unnecessary condition in load_segment_summary
This is a trivial patch to remove unnecessary condition.

load_segment_summary() checks crc of segment_summary OR crc of whole
log data blocks based on boolean argument full_check.  However,
callers of the function pass only 1 as full_check, which means only
whole log data blocks checking code is running all the time.

This patch deletes the condition and full_check argument and also
deletes enum 'NILFS_SEG_FAIL_CHECKSUM_SEGSUM' and corresponding case
clause, for it is nolonger used anymore.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-18 20:09:03 +09:00
Sage Weil
2c27c9a57c ceph: fix typo in ceph_queue_writeback debug output
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 15:45:51 -08:00
Sage Weil
a17d6473cc ceph: v0.19 release
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 13:56:07 -08:00
Sage Weil
4fc51be8fa ceph: use rbtree for pg pools; decode new osdmap format
Since we can now create and destroy pg pools, the pool ids will be sparse,
and an array no longer makes sense for looking up by pool id.  Use an
rbtree instead.

The OSDMap encoding also no longer has a max pool count (previously used to
allocate the array).  There is a new pool_max, that is the largest pool id
we've ever used, although we don't actually need it in the client.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 10:02:49 -08:00
Sage Weil
9794b146fa ceph: fix memory leak when destroying osdmap with pg_temp mappings
Also move _lookup_pg_mapping into a helper.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 10:02:48 -08:00
Sage Weil
7c1332b8cb ceph: fix iterate_caps removal race
We need to be able to iterate over all caps on a session with a
possibly slow callback on each cap.  To allow this, we used to
prevent cap reordering while we were iterating.  However, we were
not safe from races with removal: removing the 'next' cap would
make the next pointer from list_for_each_entry_safe be invalid,
and cause a lock up or similar badness.

Instead, we keep an iterator pointer in the session pointing to
the current cap.  As before, we avoid reordering.  For removal,
if the cap isn't the current cap we are iterating over, we are
fine.  If it is, we clear cap->ci (to mark the cap as pending
removal) but leave it in the session list.  In iterate_caps, we
can safely finish removal and get the next cap pointer.

While we're at it, clean up put_cap to not take a cap reservation
context, as it was never used.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 10:02:47 -08:00
Sage Weil
85ccce43a3 ceph: clean up readdir caps reservation
Use a global counter for the minimum number of allocated caps instead of
hard coding a check against readdir_max.  This takes into account multiple
client instances, and avoids examining the superblock mount options when a
cap is dropped.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 10:02:43 -08:00
David S. Miller
2bb4646fce Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-02-16 22:09:29 -08:00
Sage Weil
5ce6e9dbe6 ceph: fix authentication races, auth_none oops
Call __validate_auth() under monc->mutex, and use helper for
initial hello so that the pending_auth flag is set.  This fixes
possible races in which we have an authentication request (hello
or otherwise) pending and send another one.  In particular, with
auth_none, we _never_ want to call ceph_build_auth() from
__validate_auth(), since the ->build_request() method is NULL.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:11 -08:00
Sage Weil
85ff03f6bf ceph: use rbtree for mon statfs requests
An rbtree is lighter weight, particularly given we will generally have
very few in-flight statfs requests.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:10 -08:00
Sage Weil
a105f00cf1 ceph: use rbtree for snap_realms
Switch from radix tree to rbtree for snap realms.  This is much more
appropriate given that realm keys are few and far between.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:09 -08:00
Sage Weil
44ca18f268 ceph: use rbtree for mds requests
The rbtree is a more appropriate data structure than a radix_tree.  It
avoids extra memory usage and simplifies the code.

It also fixes a bug where the debugfs 'mdsc' file wasn't including the
most recent mds request.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:08 -08:00
Sage Weil
91e45ce389 ceph: cancel delayed work when closing connection
This ensures that if/when we reopen the connection, we can requeue work on
the connection immediately, without waiting for an old timer to expire.
Queue new delayed work inside con->mutex to avoid any race.

This fixes problems with clients failing to reconnect to the MDS due to
the client_reconnect message arriving too late (due to waiting for an old
delayed work timeout to expire).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:07 -08:00
Sage Weil
e2663ab60d ceph: allow connection to be reopened by fault callback
Fix the messenger to allow a ceph_con_open() during the fault callback.
Previously the work wasn't getting queued on the connection because the
fault path avoids requeued work (normally spurious).  Loop on reopening by
checking for the OPENING state bit.

This fixes OSD reconnects when a TCP connection drops.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:03 -08:00
Tejun Heo
003cb608a2 percpu: add __percpu sparse annotations to fs
Add __percpu sparse annotations to fs.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-02-17 11:17:38 +09:00
Eric W. Biederman
7c0ff870d1 sysfs: sysfs_sd_setattr set iattrs unconditionally
There is currently a bug in sysfs_sd_setattr inherited from
sysfs_setattr in 2.6.32 where the first time we set the attributes
on a sysfs file we allocate backing store but do not set the
backing store attributes.  Resulting in overly restrictive
permissions on sysfs files.

The fix is to simply modify the code so that it always executes
when we update the sysfs attributes, as we did in 2.6.31 and earlier.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Jean Delvare <khali@linux-fr.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-16 15:42:42 -08:00
Curt Wohlgemuth
73b50c1c92 ext4: Fix BUG_ON at fs/buffer.c:652 in no journal mode
Calls to ext4_handle_dirty_metadata should only pass in an inode
pointer for inode-specific metadata, and not for shared metadata
blocks such as inode table blocks, block group descriptors, the
superblock, etc.

The BUG_ON can get tripped when updating a special device (such as a
block device) that is opened (so that i_mapping is set in
fs/block_dev.c) and the file system is mounted in no journal mode.

Addresses-Google-Bug: #2404870

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-02-16 15:06:29 -05:00
Linus Torvalds
0813e22d4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: btrfs_mark_extent_written uses the wrong slot
2010-02-15 19:56:21 -08:00
Chuck Lever
65d269538a NFS: Too many GETATTR and ACCESS calls after direct I/O
The cached read and write paths initialize fattr->time_start in their
setup procedures.  The value of fattr->time_start is propagated to
read_cache_jiffies by nfs_update_inode().  Subsequent calls to
nfs_attribute_timeout() will then use a good time stamp when
computing the attribute cache timeout, and squelch unneeded GETATTR
calls.

Since the direct I/O paths erroneously leave the inode's
fattr->time_start field set to zero, read_cache_jiffies for that inode
is set to zero after any direct read or write operation.  This
triggers an otw GETATTR or ACCESS call to update the file's attribute
and access caches properly, even when the NFS READ or WRITE replies
have usable post-op attributes.

Make sure the direct read and write setup code performs the same fattr
initialization as the cached I/O paths to prevent unnecessary GETATTR
calls.

This was likely introduced by commit 0e574af1 in 2.6.15, which appears
to add new nfs_fattr_init() call sites in the cached read and write
paths, but not in the equivalent places in fs/nfs/direct.c.  A
subsequent commit in the same series, 33801147, introduces the
fattr->time_start field.

Interestingly, the direct write reschedule path already has a call to
nfs_fattr_init() in the right place.

Reported-by: Quentin Barnes <qbarnes@yahoo-inc.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-15 19:53:43 -08:00
Linus Torvalds
0aa2ca9ae1 Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  reiserfs: Fix softlockup while waiting on an inode
2010-02-15 19:51:45 -08:00
dingdinghua
ba869023ea jbd2: delay discarding buffers in journal_unmap_buffer
Delay discarding buffers in journal_unmap_buffer until
we know that "add to orphan" operation has definitely been
committed, otherwise the log space of committing transation
may be freed and reused before truncate get committed, updates
may get lost if crash happens.

Signed-off-by: dingdinghua <dingdinghua@nrchpc.ac.cn>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-02-15 16:35:42 -05:00
Leonard Michlmayr
aca92ff6f5 ext4: correctly calculate number of blocks for fiemap
ext4_fiemap() rounds the length of the requested range down to
blocksize, which is is not the true number of blocks that cover the
requested region.  This problem is especially impressive if the user
requests only the first byte of a file: not a single extent will be
reported.

We fix this by calculating the last block of the region and then
subtract to find the number of blocks in the extents.

Signed-off-by: Leonard Michlmayr <leonard.michlmayr@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-03-04 17:07:28 -05:00