Commit graph

22178 commits

Author SHA1 Message Date
Al Viro
f374ed5fa8 do_last: kill a rudiment of old ->d_revalidate() workaround
There used to be time when ->d_revalidate() couldn't return an error.
So intents code had lookup_instantiate_filp() stash ERR_PTR(error)
in nd->intent.open.filp and had it checked after lookup_hash(), to
catch the otherwise silent failures.  That had been introduced by
commit 4af4c52f34.  These days
->d_revalidate() can and does propagate errors back to callers
explicitly, so this check isn't needed anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:27 -04:00
Al Viro
6c0d46c493 fold __open_namei_create() and open_will_truncate() into do_last()
... and clean up a bit more

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:27 -04:00
Al Viro
ca344a894b do_last: unify may_open() call and everyting after it
We have a bunch of diverging codepaths in do_last(); some of
them converge, but the case of having to create a new file
duplicates large part of common tail of the rest and exits
separately.  Massage them so that they could be merged.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:27 -04:00
Al Viro
9b44f1b392 move may_open() from __open_name_create() to do_last()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:26 -04:00
Al Viro
0f9d1a10c3 expand finish_open() in its only caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:26 -04:00
Al Viro
5a202bcd75 sanitize pathname component hash calculation
Lift it to lookup_one_len() and link_path_walk() resp. into the
same place where we calculated default hash function of the same
name.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:26 -04:00
Al Viro
6a96ba5441 kill __lookup_one_len()
only one caller left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:26 -04:00
Al Viro
fe2d35ff0d switch non-create side of open() to use of do_last()
Instead of path_lookupat() doing trailing symlink resolution,
use the same scheme as on the O_CREAT side.  Walk with
LOOKUP_PARENT, then (in do_last()) look the final component
up, then either open it or return error or, if it's a symlink,
give the symlink back to path_openat() to be resolved there.

The really messy complication here is RCU.  We don't want to drop
out of RCU mode before the final lookup, since we don't want to
bounce parent directory ->d_count without a good reason.

Result is _not_ pretty; later in the series we'll clean it up.
For now we are roughly back where we'd been before the revert
done by Nick's series - top-level logics of path_openat() is
cleaned up, do_last() does actual opening, symlink resolution is
done uniformly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:26 -04:00
Al Viro
70e9b35711 get rid of nd->file
Don't stash the struct file * used as starting point of walk in nameidata;
pass file ** to path_init() instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:26 -04:00
Al Viro
951361f954 get rid of the last LOOKUP_RCU dependencies in link_path_walk()
New helper: terminate_walk().  An error has happened during pathname
resolution and we either drop nd->path or terminate RCU, depending
the mode we had been in.  After that, nd is essentially empty.
Switch link_path_walk() to using that for cleanup.

Now the top-level logics in link_path_walk() is back to sanity.  RCU
dependencies are in the lower-level functions.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:26 -04:00
Al Viro
a7472baba2 make nameidata_dentry_drop_rcu_maybe() always leave RCU mode
Now we have do_follow_link() guaranteed to leave without dangling RCU
and the next step will get LOOKUP_RCU logics completely out of
link_path_walk().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:25 -04:00
Al Viro
ef7562d528 make handle_dots() leave RCU mode on error
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:25 -04:00
Al Viro
4455ca6223 clear RCU on all failure exits from link_path_walk()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:25 -04:00
Al Viro
9856fa1b28 pull handling of . and .. into inlined helper
getting LOOKUP_RCU checks out of link_path_walk()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:25 -04:00
Al Viro
7bc055d1d5 kill out_dput: in link_path_walk()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:25 -04:00
Al Viro
13aab428a7 separate -ESTALE/-ECHILD retries in do_filp_open() from real work
new helper: path_openat().  Does what do_filp_open() does, except
that it tries only the walk mode (RCU/normal/force revalidation)
it had been told to.

Both create and non-create branches are using path_lookupat() now.
Fixed the double audit_inode() in non-create branch.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:25 -04:00
Al Viro
47c805dc2d switch do_filp_open() to struct open_flags
take calculation of open_flags by open(2) arguments into new helper
in fs/open.c, move filp_open() over there, have it and do_sys_open()
use that helper, switch exec.c callers of do_filp_open() to explicit
(and constant) struct open_flags.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:25 -04:00
Al Viro
c3e380b0b3 Collect "operation mode" arguments of do_last() into a structure
No point messing with passing shitloads of "operation mode" arguments
to do_open() one by one, especially since they are not going to change
during do_filp_open().  Collect them into a struct, fill it and pass
to do_last() by reference.

Make sure that lookup intent flags are correctly set and removed - we
want them for do_last(), but they make no sense for __do_follow_link().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:25 -04:00
Al Viro
f1afe9efc8 clean up the failure exits after __do_follow_link() in do_filp_open()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:24 -04:00
Al Viro
36f3b4f690 pull security_inode_follow_link() into __do_follow_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:24 -04:00
Al Viro
086e183a64 pull dropping RCU on success of link_path_walk() into path_lookupat()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:24 -04:00
Al Viro
16c2cd7179 untangle the "need_reval_dot" mess
instead of ad-hackery around need_reval_dot(), do the following:
set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute
symlink traversal, on ".." and on procfs-style symlinks.  Clear on
normal components, leave unchanged on ".".  Non-nested callers of
link_path_walk() call handle_reval_path(), which checks that flag
is set and that fs does want the final revalidate thing, then does
->d_revalidate().  In link_path_walk() all the return_reval stuff
is gone.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:24 -04:00
Al Viro
fe479a580d merge component type recognition
no need to do it in three places...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:24 -04:00
Al Viro
e41f7d4ee5 merge path_init and path_init_rcu
Actual dependency on whether we want RCU or not is in 3 small areas
(as it ought to be) and everything around those is the same in both
versions.  Since each function has only one caller and those callers
are on two sides of if (flags & LOOKUP_RCU), it's easier and cleaner
to merge them and pull the checks inside.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:24 -04:00
Al Viro
ee0827cd6b sanitize path_walk() mess
New helper: path_lookupat().  Basically, what do_path_lookup() boils to
modulo -ECHILD/-ESTALE handler.  path_walk* family is gone; vfs_path_lookup()
is using link_path_walk() directly, do_path_lookup() and do_filp_open()
are using path_lookupat().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:24 -04:00
Al Viro
52094c8a06 take RCU-dependent stuff around exec_permission() into a new helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:23 -04:00
Al Viro
c9c6cac0c2 kill path_lookup()
all remaining callers pass LOOKUP_PARENT to it, so
flags argument can die; renamed to kern_path_parent()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:23 -04:00
Steven Whitehouse
c618e87a5f GFS2: Update to AIL list locking
The previous patch missed a couple of places where the AIL list
needed locking, so this fixes up those places, plus a comment
is corrected too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Chinner <dchinner@redhat.com>
2011-03-14 12:40:29 +00:00
Al Viro
c44ed965be compat breakage in preadv() and pwritev()
Fix for a dumb preadv()/pwritev() compat bug - unlike the native
variants, the compat_...  ones forget to check FMODE_P{READ,WRITE}, so
e.g.  on pipe the native preadv() will fail with -ESPIPE and compat one
will act as readv() and succeed.

Not critical, but it's a clear bug with trivial fix, so IMO it's OK for
-final.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-13 16:29:07 -07:00
Al Viro
586ce098a2 compat breakage in preadv() and pwritev()
Fix for a dumb preadv()/pwritev() compat bug - unlike the native
variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g.
on pipe the native preadv() will fail with -ESPIPE and compat one will
act as readv() and succeed.  Not critical, but it's a clear bug with trivial
fix.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-13 19:21:26 -04:00
Linus Torvalds
0e5b88cd99 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: break out of shrink_delalloc earlier
  btrfs: fix not enough reserved space
  btrfs: fix dip leak
  Btrfs: make sure not to return overlapping extents to fiemap
  Btrfs: deal with short returns from copy_from_user
  Btrfs: fix regressions in copy_from_user handling
2011-03-13 16:00:49 -07:00
Chris Mason
36e39c40b3 Btrfs: break out of shrink_delalloc earlier
Josef had changed shrink_delalloc to exit after three shrink
attempts, which wasn't quite enough because new writers could
race in and steal free space.

But it also fixed deadlocks and stalls as we tried to recover
delalloc reservations.  The code was tweaked to loop 1024
times, and would reset the counter any time a small amount
of progress was made.  This was too drastic, and with a
lot of writers we can end up stuck in shrink_delalloc forever.

The shrink_delalloc loop is fairly complex because the caller is looping
too, and the caller will go ahead and force a transaction commit to make
sure we reclaim space.

This reworks things to exit shrink_delalloc when we've forced some
writeback and the delalloc reservations have gone down.  This means
the writeback has not just started but has also finished at
least some of the metadata changes required to reclaim delalloc
space.

If we've got this wrong, we're returning ENOSPC too early, which
is a big improvement over the current behavior of hanging the machine.

Test 224 in xfstests hammers on this nicely, and with 1000 writers
trying to fill a 1GB drive we get our first ENOSPC at 93% full.  The
other writers are able to continue until we get 100%.

This is a worst case test for btrfs because the 1000 writers are doing
small IO, and the small FS size means we don't have a lot of room
for metadata chunks.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-12 07:08:42 -05:00
Alex Elder
0c9ba97318 xfs: don't name variables "panic"
The new xfs_alert_tag() used a variable named "panic",
and that is to be avoided.  Rename it.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-03-11 16:34:51 -06:00
Rob Landley
c5cb09b6f8 Cleanup: Factor out some cut-and-paste code.
Factor out some cut-and-paste code in options parsing.
Saves about 800 bytes on x86-64.

Signed-off-by: Rob Landley <rlandley@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:28 -05:00
Rob Landley
c12bacec45 cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions.
Eliminate two mostly duplicate functions (nfs_parse_simple_hostname()
and nfs_parse_protected_hostname()) and instead just make the calling
function (nfs_parse_devname()) do everything.

Signed-off-by: Rob Landley <rlandley@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:28 -05:00
Konstantin Khlebnikov
7ec10f26e1 NFS: account direct-io into task io accounting
Account NFS direct-io reads and writes into Task I/O Accounting.
Do it before complition to handle aio.

NFS have unusual direct-io implementation,
thus accounting in generic code does not work.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:27 -05:00
Trond Myklebust
b064eca2cf NFSv4: Send unmapped uid/gids to the server when using auth_sys
The new behaviour is enabled using the new module parameter
'nfs4_disable_idmapping'.

Note that if the server rejects an unmapped uid or gid, then
the client will automatically switch back to using the idmapper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:27 -05:00
Trond Myklebust
3ddeb7c5c6 NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr
This will be required in order to switch uid/gid mapping back on if the
admin has tried to disable it.

Note that we also propagate NFS4ERR_BADNAME at the same time, in order to
work around a Linux server bug.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:27 -05:00
Trond Myklebust
e4fd72a17d NFSv4: cleanup idmapper functions to take an nfs_server argument
...instead of the nfs_client.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:26 -05:00
Trond Myklebust
f0b851689a NFSv4: Send unmapped uid/gids to the server if the idmapper fails
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:26 -05:00
Trond Myklebust
5cf36cfdc8 NFSv4: If the server sends us a numeric uid/gid then accept it
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:26 -05:00
Benny Halevy
75247affd7 NFSv4.1: reject zero layout with zeroed stripe unit
Allowing stripe_unit==0 causes the client to crash later on
when dividing by zero.

Reported-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:45 -05:00
Fred Isaman
36fe432d33 NFSv4.1: Clear lseg pointer in ->doio function
Now that we have access to the pointer, clear it immediately after
the put, instead of in caller.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:45 -05:00
Fred Isaman
c76069bda0 NFSv4.1: rearrange ->doio args
This will make it possible to clear the lseg pointer in the same
function as it is put, instead of in the caller nfs_pageio_doio().

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Fred Isaman
a69aef1496 NFSv4.1: pnfs filelayout driver write
Allows the pnfs filelayout driver to write to the data servers.

Note that COMMIT to data servers will be implemented in a future
patch.  To avoid improper behavior, for the moment any WRITE to a data
server that would also require a COMMIT to the data server is sent
NFS_FILE_SYNC.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Fred Isaman
7ffd10640d NFSv4.1: remove GETATTR from ds writes
Any WRITE compound directed to a data server needs to have the
GETATTR calls suppressed.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Andy Adamson
0382b74409 NFSv4.1: implement generic pnfs layer write switch
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Fred Isaman
44b83799a9 NFSv4.1: trigger LAYOUTGET for writes
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Fred Isaman
5053aa568d NFSv4.1: Send lseg down into nfs_write_rpcsetup
We grab the lseg sent in from the doio function and attach it to
each struct nfs_write_data created.  This is how the lseg will be
sent to the layout driver.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Fred Isaman
b029bc9b08 NFSv4.1: add callback to nfs4_write_done
Add callback that pnfs layout driver can use to do its own handling
of data server WRITE response.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:43 -05:00