linux-pinenote

Author	SHA1	Message	Date
Brian Foster	8018ec083c	xfs: mark all internal workqueues as freezable Workqueues must be explicitly set as freezable to ensure they are frozen in the assocated part of the hibernation/suspend sequence. Freezing of workqueues and kernel threads is important to ensure that modifications are not made on-disk after the hibernation image has been created. Otherwise, the in-memory state can become inconsistent with what is on disk and eventually lead to filesystem corruption. We have reports of free space btree corruptions that occur immediately after restore from hibernate that suggest the xfs-eofblocks workqueue could be causing such problems if it races with hibernation. Mark all of the internal XFS workqueues as freezable to ensure nothing changes on-disk once the freezer infrastructure freezes kernel threads and creates the hibernation image. Signed-off-by: Brian Foster <bfoster@redhat.com> Reported-by: Carlos E. R. <carlos.e.r@opensuse.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-09 11:44:46 +10:00
Jeff Layton	0c0e0d3c09	nfs: revert "nfs4: queue free_lock_state job submission to nfsiod" This reverts commit `49a4bda22e`. Christoph reported an oops due to the above commit: generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1] SMP [ 2187.042899] Modules linked in: [ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151 [ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 2187.044287] Workqueue: nfsiod free_lock_state_work [ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000 [ 2187.044287] RIP: 0010:[<ffffffff81361ca6>] [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 [ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296 [ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000 [ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8 [ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000 [ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240 [ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000 [ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0 [ 2187.044287] Stack: [ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258 [ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0 [ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240 [ 2187.044287] Call Trace: [ 2187.044287] [<ffffffff810cc877>] process_one_work+0x1c7/0x490 [ 2187.044287] [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490 [ 2187.044287] [<ffffffff810cd569>] worker_thread+0x119/0x4f0 [ 2187.044287] [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10 [ 2187.044287] [<ffffffff810cd450>] ? init_pwq+0x190/0x190 [ 2187.044287] [<ffffffff810d3c6f>] kthread+0xdf/0x100 [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 [ 2187.044287] [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0 [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 [ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3 [ 2187.044287] RIP [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 [ 2187.044287] RSP <ffff88007a4efd58> [ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]--- The original reason for this patch was because the fl_release_private operation couldn't sleep. With commit `ed9814d858` (locks: defer freeing locks in locks_delete_lock until after i_lock has been dropped), this is no longer a problem so we can revert this patch. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-08 17:00:32 -07:00
Cong Wang	21e81002f9	nfs: fix kernel warning when removing proc entry I saw the following kernel warning: [ 1852.321222] ------------[ cut here ]------------ [ 1852.326527] WARNING: CPU: 0 PID: 118 at fs/proc/generic.c:521 remove_proc_entry+0x154/0x16b() [ 1852.335630] remove_proc_entry: removing non-empty directory 'fs/nfsfs', leaking at least 'volumes' [ 1852.344084] CPU: 0 PID: 118 Comm: kworker/u8:2 Not tainted 3.16.0+ #540 [ 1852.350036] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 1852.354992] Workqueue: netns cleanup_net [ 1852.358701] 0000000000000000 ffff880116f2fbd0 ffffffff819c03e9 ffff880116f2fc18 [ 1852.366474] ffff880116f2fc08 ffffffff810744ee ffffffff811e0e6e ffff8800d4e96238 [ 1852.373507] ffffffff81dbe665 ffff8800d46a5948 0000000000000005 ffff880116f2fc68 [ 1852.380224] Call Trace: [ 1852.381976] [<ffffffff819c03e9>] dump_stack+0x4d/0x66 [ 1852.385495] [<ffffffff810744ee>] warn_slowpath_common+0x7a/0x93 [ 1852.389869] [<ffffffff811e0e6e>] ? remove_proc_entry+0x154/0x16b [ 1852.393987] [<ffffffff8107457b>] warn_slowpath_fmt+0x4c/0x4e [ 1852.397999] [<ffffffff811e0e6e>] remove_proc_entry+0x154/0x16b [ 1852.402034] [<ffffffff8129c73d>] nfs_fs_proc_net_exit+0x53/0x56 [ 1852.406136] [<ffffffff812a103b>] nfs_net_exit+0x12/0x1d [ 1852.409774] [<ffffffff81785bc9>] ops_exit_list+0x44/0x55 [ 1852.413529] [<ffffffff81786389>] cleanup_net+0xee/0x182 [ 1852.417198] [<ffffffff81088c9e>] process_one_work+0x209/0x40d [ 1852.502320] [<ffffffff81088bf7>] ? process_one_work+0x162/0x40d [ 1852.587629] [<ffffffff810890c1>] worker_thread+0x1f0/0x2c7 [ 1852.673291] [<ffffffff81088ed1>] ? process_scheduled_works+0x2f/0x2f [ 1852.759470] [<ffffffff8108e079>] kthread+0xc9/0xd1 [ 1852.843099] [<ffffffff8109427f>] ? finish_task_switch+0x3a/0xce [ 1852.926518] [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61 [ 1853.008565] [<ffffffff819cbeac>] ret_from_fork+0x7c/0xb0 [ 1853.076477] [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61 [ 1853.140653] ---[ end trace 69c4c6617f78e32d ]--- It looks wrong that we add "/proc/net/nfsfs" in nfs_fs_proc_net_init() while remove "/proc/fs/nfsfs" in nfs_fs_proc_net_exit(). Fixes: commit `65b38851a1` (NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes) Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Dan Aloni <dan@kernelim.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> [Trond: replace uses of remove_proc_entry() with remove_proc_subtree() as suggested by Al Viro] Cc: stable@vger.kernel.org # 3.4.x : `65b38851a1`: NFS: Fix /proc/fs/nfsfs/servers Cc: stable@vger.kernel.org # 3.4.x Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-08 16:41:36 -07:00
Linus Torvalds	8c68face55	Merge branch 'for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfix from Ted Ts'o. [ Hmm. It's possible we should make kfree() aware of error pointers, and use IS_ERR_OR_NULL rather than a NULL check. But in the meantime this is obviously the right fix. - Linus ] * 'for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid trying to kfree an ERR_PTR pointer	2014-09-08 15:51:01 -07:00
Linus Torvalds	861b7102b5	Merge branch 'for-3.17' of git://linux-nfs.org/~bfields/linux Pull nfsd bugfixes from Bruce Fields: "A couple minor nfsd bugfixes" * 'for-3.17' of git://linux-nfs.org/~bfields/linux: lockd: fix rpcbind crash on lockd startup failure nfsd4: fix rd_dircount enforcement	2014-09-08 15:18:06 -07:00
Chris Mason	b0d5d10f41	Btrfs: use insert_inode_locked4 for inode creation Btrfs was inserting inodes into the hash table before we had fully set the inode up on disk. This leaves us open to rare races that allow two different inodes in memory for the same [root, inode] pair. This patch fixes things by using insert_inode_locked4 to insert an I_NEW inode and unlock_new_inode when we're ready for the rest of the kernel to use the inode. It also makes sure to init the operations pointers on the inode before going into the error handling paths. Signed-off-by: Chris Mason <clm@fb.com> Reported-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-08 13:56:45 -07:00
Filipe Manana	49dae1bc1c	Btrfs: fix fsync data loss after a ranged fsync While we're doing a full fsync (when the inode has the flag BTRFS_INODE_NEEDS_FULL_SYNC set) that is ranged too (covers only a portion of the file), we might have ordered operations that are started before or while we're logging the inode and that fall outside the fsync range. Therefore when a full ranged fsync finishes don't remove every extent map from the list of modified extent maps - as for some of them, that fall outside our fsync range, their respective ordered operation hasn't finished yet, meaning the corresponding file extent item wasn't inserted into the fs/subvol tree yet and therefore we didn't log it, and we must let the next fast fsync (one that checks only the modified list) see this extent map and log a matching file extent item to the log btree and wait for its ordered operation to finish (if it's still ongoing). A test case for xfstests follows. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-09-08 13:56:43 -07:00
Dan Carpenter	c47ca32d3a	Btrfs: kfree()ing ERR_PTRs The "inherit" in btrfs_ioctl_snap_create_v2() and "vol_args" in btrfs_ioctl_rm_dev() are ERR_PTRs so we can't call kfree() on them. These kind of bugs are "One Err Bugs" where there is just one error label that does everything. I could set the "inherit = NULL" and keep the single out label but it ends up being more complicated that way. It makes the code simpler to re-order the unwind so it's in the mirror order of the allocation and introduce some new error labels. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-09-08 13:56:42 -07:00
J. Bruce Fields	7c17705e77	lockd: fix rpcbind crash on lockd startup failure Nikita Yuschenko reported that booting a kernel with init=/bin/sh and then nfs mounting without portmap or rpcbind running using a busybox mount resulted in: # mount -t nfs 10.30.130.21:/opt /mnt svc: failed to register lockdv1 RPC service (errno 111). lockd_up: makesock failed, error=-111 Unable to handle kernel paging request for data at address 0x00000030 Faulting instruction address: 0xc055e65c Oops: Kernel access of bad area, sig: 11 [#1] MPC85xx CDS Modules linked in: CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117 task: cf29cea0 ti: cf35c000 task.ti: cf35c000 NIP: c055e65c LR: c0566490 CTR: c055e648 REGS: cf35dad0 TRAP: 0300 Not tainted (3.10.44.cge) MSR: 00029000 <CE,EE,ME> CR: 22442488 XER: 20000000 DEAR: 00000030, ESR: 00000000 GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086 00000000 GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000 10090ae8 GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000 bfa46ef0 GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000 cf0ded80 NIP [c055e65c] call_start+0x14/0x34 LR [c0566490] __rpc_execute+0x70/0x250 Call Trace: [cf35db80] [00000080] 0x80 (unreliable) [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4 [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8 [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84 [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108 [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30 [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0 [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8 [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4 [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44 [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4 [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8 [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104 [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0 [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0 [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c The addition of svc_shutdown_net() resulted in two calls to svc_rpcb_cleanup(); the second is no longer necessary and crashes when it calls rpcb_register_call with clnt=NULL. Reported-by: Nikita Yushchenko <nyushchenko@dev.rtsoft.ru> Fixes: `679b033df4` "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up" Cc: stable@vger.kernel.org Acked-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-08 12:03:32 -04:00
J. Bruce Fields	aee3776441	nfsd4: fix rd_dircount enforcement Commit `3b29970909` "nfsd4: enforce rd_dircount" totally misunderstood rd_dircount; it refers to total non-attribute bytes returned, not number of directory entries returned. Bring the code into agreement with RFC 3530 section 14.2.24. Cc: stable@vger.kernel.org Fixes: `3b29970909` "nfsd4: enforce rd_dircount" Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-08 12:02:03 -04:00
Tejun Heo	018a17bdc8	bdi: reimplement bdev_inode_switch_bdi() A block_device may be attached to different gendisks and thus different bdis over time. bdev_inode_switch_bdi() is used to switch the associated bdi. The function assumes that the inode could be dirty and transfers it between bdis if so. This is a bit nasty in that it reaches into bdi internals. This patch reimplements the function so that it writes out the inode if dirty. This is a lot simpler and can be implemented without exposing bdi internals. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-09-08 10:00:43 -06:00
Tejun Heo	ff9ea32381	block, bdi: an active gendisk always has a request_queue associated with it bdev_get_queue() returns the request_queue associated with the specified block_device. blk_get_backing_dev_info() makes use of bdev_get_queue() to determine the associated bdi given a block_device. All the callers of bdev_get_queue() including blk_get_backing_dev_info() assume that bdev_get_queue() may return NULL and implement NULL handling; however, bdev_get_queue() requires the passed in block_device is opened and attached to its gendisk. Because an active gendisk always has a valid request_queue associated with it, bdev_get_queue() can never return NULL and neither can blk_get_backing_dev_info(). Make it clear that neither of the two functions can return NULL and remove NULL handling from all the callers. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-09-08 10:00:35 -06:00
Artem Bityutskiy	ba29e721eb	UBIFS: fix free log space calculation Hu (hujianyang <hujianyang@huawei.com>) discovered an issue in the 'empty_log_bytes()' function, which calculates how many bytes are left in the log: " If 'c->lhead_lnum + 1 == c->ltail_lnum' and 'c->lhead_offs == c->leb_size', 'h' would equalent to 't' and 'empty_log_bytes()' would return 'c->log_bytes' instead of 0. " At this point it is not clear what would be the consequences of this, and whether this may lead to any problems, but this patch addresses the issue just in case. Cc: stable@vger.kernel.org Tested-by: hujianyang <hujianyang@huawei.com> Reported-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-09-08 15:55:28 +03:00
Artem Bityutskiy	052c28073f	UBIFS: fix a race condition Hu (hujianyang@huawei.com) discovered a race condition which may lead to a situation when UBIFS is unable to mount the file-system after an unclean reboot. The problem is theoretical, though. In UBIFS, we have the log, which basically a set of LEBs in a certain area. The log has the tail and the head. Every time user writes data to the file-system, the UBIFS journal grows, and the log grows as well, because we append new reference nodes to the head of the log. So the head moves forward all the time, while the log tail stays at the same position. At any time, the UBIFS master node points to the tail of the log. When we mount the file-system, we scan the log, and we always start from its tail, because this is where the master node points to. The only occasion when the tail of the log changes is the commit operation. The commit operation has 2 phases - "commit start" and "commit end". The former is relatively short, and does not involve much I/O. During this phase we mostly just build various in-memory lists of the things which have to be written to the flash media during "commit end" phase. During the commit start phase, what we do is we "clean" the log. Indeed, the commit operation will index all the data in the journal, so the entire journal "disappears", and therefore the data in the log become unneeded. So we just move the head of the log to the next LEB, and write the CS node there. This LEB will be the tail of the new log when the commit operation finishes. When the "commit start" phase finishes, users may write more data to the file-system, in parallel with the ongoing "commit end" operation. At this point the log tail was not changed yet, it is the same as it had been before we started the commit. The log head keeps moving forward, though. The commit operation now needs to write the new master node, and the new master node should point to the new log tail. After this the LEBs between the old log tail and the new log tail can be unmapped and re-used again. And here is the possible problem. We do 2 operations: (a) We first update the log tail position in memory (see 'ubifs_log_end_commit()'). (b) And then we write the master node (see the big lock of code in 'do_commit()'). But nothing prevents the log head from moving forward between (a) and (b), and the log head may "wrap" now to the old log tail. And when the "wrap" happens, the contends of the log tail gets erased. Now a power cut happens and we are in trouble. We end up with the old master node pointing to the old tail, which was erased. And replay fails because it expects the master node to point to the correct log tail at all times. This patch merges the abovementioned (a) and (b) operations by moving the master node change code to the 'ubifs_log_end_commit()' function, so that it runs with the log mutex locked, which will prevent the log from being changed benween operations (a) and (b). Cc: stable@vger.kernel.org # `07e19df` UBIFS: remove mst_mutex Cc: stable@vger.kernel.org Reported-by: hujianyang <hujianyang@huawei.com> Tested-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-09-08 15:55:02 +03:00
Tejun Heo	a34375ef9e	percpu-refcount: add @gfp to percpu_ref_init() Percpu allocator now supports allocation mask. Add @gfp to percpu_ref_init() so that !GFP_KERNEL allocation masks can be used with percpu_refs too. This patch doesn't make any functional difference. v2: blk-mq conversion was missing. Updated. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <koverstreet@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Jens Axboe <axboe@kernel.dk>	2014-09-08 09:51:30 +09:00
Tejun Heo	908c7f1949	percpu_counter: add @gfp to percpu_counter_init() Percpu allocator now supports allocation mask. Add @gfp to percpu_counter_init() so that !GFP_KERNEL allocation masks can be used with percpu_counters too. We could have left percpu_counter_init() alone and added percpu_counter_init_gfp(); however, the number of users isn't that high and introducing _gfp variants to all percpu data structures would be quite ugly, so let's just do the conversion. This is the one with the most users. Other percpu data structures are a lot easier to convert. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jan Kara <jack@suse.cz> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: x86@kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org>	2014-09-08 09:51:29 +09:00
Paul E. McKenney	bde6c3aa99	rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops RCU-tasks requires the occasional voluntary context switch from CPU-bound in-kernel tasks. In some cases, this requires instrumenting cond_resched(). However, there is some reluctance to countenance unconditionally instrumenting cond_resched() (see http://lwn.net/Articles/603252/), so this commit creates a separate cond_resched_rcu_qs() that may be used in place of cond_resched() in locations prone to long-duration in-kernel looping. This commit currently instruments only RCU-tasks. Future possibilities include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce IPI usage. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:27:20 -07:00
Linus Torvalds	9142eadefe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull filesystem fixes from Al Viro: "Several bugfixes (all of them -stable fodder). Alexey's one deals with double mutex_lock() in UFS (apparently, nobody has tried to test "ufs: sb mutex merge + mutex_destroy" on something like file creation/removal on ufs). Mine deal with two kinds of umount bugs, in umount propagation and in handling of automounted submounts, both resulting in bogus transient EBUSY from umount" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs: fix deadlocks introduced by sb mutex merge fix EBUSY on umount() from MNT_SHRINKABLE get rid of propagate_umount() mistakenly treating slaves as busy.	2014-09-07 10:59:58 -07:00
Alexey Khoroshilov	9ef7db7f38	ufs: fix deadlocks introduced by sb mutex merge Commit `0244756edc` ("ufs: sb mutex merge + mutex_destroy") introduces deadlocks in ufs_new_inode() and ufs_free_inode(). Most callers of that functions acqure the mutex by themselves and ufs_{new,free}_inode() do that via lock_ufs(), i.e we have an unavoidable double lock. The patch proposes to resolve the issue by making sure that ufs_{new,free}_inode() are not called with the mutex held. Found by Linux Driver Verification project (linuxtesting.org). Cc: stable@vger.kernel.org # 3.16 Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-07 13:26:39 -04:00
Linus Torvalds	11e9739813	xfs: fixes for v3.17-rc3 Fix: - a direct IO read/buffered read data corruption - the associated fallout from the DIO data corruption fix - collapse range bugs that are potential data corruption issues. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJUCkM+AAoJEK3oKUf0dfodUBgP+gJu50/XV4TFRLPlCRxhvN61 371i3ASls1y7ivhj40NzgbDDAZHM2q8Zqwd//318dFViHhWQDlH/1ga06kscRpZX d8cQEbFHApgUGQL5Gdq2l2hvAzYa75H0m6cq3jveyrN2rscjCSmAXwtlcEmx3AR6 TnCpxuVL5asjEGZYb0KfQACq//rASHJbhukpo1gB4ccZ0boWHOVf5SxuS4remzs9 y+rlPFNl5RD/WVdnJSvu9zu/nP6op3Ax5r7jZanoKbisKHfd7QOa+k65+Vz0Vq9G kxgfhz+yLfkOvcktq+41e1lVBln7fCIlcO9m3b53uxWPx5cla323893UiGYsA4F/ j/gGlh1qaT6C/1M1JBWDLDx931S78XiR1Y+WbtAU1PO+GuO0IEap3+iqtS2+oNAv OrpThLOgqTspK6MJeToCzdn2lRT2BJpcKwxIyDK8g+p9N6qCpyw3DfiKyu0wipGH D2D3mtE6drSHNaSceFAz8CrQvPOR7Ygj92QGpGSfkohxap9h6VJR/wNp/oGnpmN0 qgcxTrHvx3kw1hXssB4gjh6fBDnOUkac0isqxdow22Qt529t9sIanzMBvz+JxHQF zeqeFSh96lOXmB7UFBU+QyOhbDp3cJWChHrtY3Esw/+FmG6fxEy8z6pdZsiJYELr 5tka2richPD+gyXzcZwP =tRQo -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "The fixes all address recently discovered data corruption issues. The original Direct IO issue was discovered by Chris Mason @ Facebook on a production workload which mixed buffered reads with direct reads and writes IO to the same file. The fix for that exposed other issues with page invalidation (exposed by millions of fsx operations) failing due to dirty buffers beyond EOF. Finally, the collapse_range code could also cause problems due to racing writeback changing the extent map while it was being shifted around. The commits for that problem are simple mitigation fixes that prevent the problem from occuring. A more robust fix for 3.18 that addresses the underlying problem is currently being worked on by Brian. Summary of fixes: - a direct IO read/buffered read data corruption - the associated fallout from the DIO data corruption fix - collapse range bugs that are potential data corruption issues" * tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: trim eofblocks before collapse range xfs: xfs_file_collapse_range is delalloc challenged xfs: don't log inode unless extent shift makes extent modifications xfs: use ranged writeback and invalidation for direct IO xfs: don't zero partial page cache pages during O_DIRECT writes xfs: don't zero partial page cache pages during O_DIRECT writes xfs: don't dirty buffers beyond EOF	2014-09-06 12:13:17 -07:00
Anton Altaparmakov	10096fb108	Export sync_filesystem() for modular ->remount_fs() use This patch changes sync_filesystem() to be EXPORT_SYMBOL(). The reason this is needed is that starting with 3.15 kernel, due to Theodore Ts'o's commit `02b9984d64` ("fs: push sync_filesystem() down to the file system's remount_fs()"), all file systems that have dirty data to be written out need to call sync_filesystem() from their ->remount_fs() method when remounting read-only. As this is now a generically required function rather than an internal only function it should be EXPORT_SYMBOL() so that all file systems can call it. Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-05 08:16:21 -07:00
Gioh Kim	a49058fab2	jbd/jbd2: use non-movable memory for the jbd superblock Sicne the jbd/jbd2 superblock is not released until the file system is unmounted, allocate the buffer cache from the non-moveable area to allow page migration and CMA allocations to more easily succeed. Signed-off-by: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-09-04 22:36:35 -04:00
Gioh Kim	a8ac900b81	ext4: use non-movable memory for the ext4 superblock Since the ext4 superblock is not released until the file system is unmounted, allocate the buffer cache entry for the ext4 superblock out of the non-moveable are to allow page migrations and thus CMA allocations to more easily succeed if the CMA area is limited. Signed-off-by: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-09-04 22:36:15 -04:00
Gioh Kim	3b5e6454aa	fs/buffer.c: support buffer cache allocations with gfp modifiers A buffer cache is allocated from movable area because it is referred for a while and released soon. But some filesystems are taking buffer cache for a long time and it can disturb page migration. New APIs are introduced to allocate buffer cache with user specific flag. _gfp APIs are for user want to set page allocation flag for page cache allocation. And _unmovable APIs are for the user wants to allocate page cache from non-movable area. Signed-off-by: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-09-04 22:04:42 -04:00
Linus Torvalds	b7fece1be8	Merge git://git.kvack.org/~bcrl/aio-fixes Pull aio bugfixes from Ben LaHaise: "Two small fixes" * git://git.kvack.org/~bcrl/aio-fixes: aio: block exit_aio() until all context requests are completed aio: add missing smp_rmb() in read_events_ring	2014-09-04 16:08:55 -07:00
Theodore Ts'o	d26e2c4d72	ext4: renumber EXT4_EX_* flags to avoid flag aliasing problems Suggested-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-09-04 18:09:29 -04:00
Jan Kara	0e5ecf0a76	jbd2: optimize jbd2_log_do_checkpoint() a bit When we discover written out buffer in transaction checkpoint list we don't have to recheck validity of a transaction. Either this is the last buffer in a transaction - and then we are done - or this isn't and then we can just take another buffer from the checkpoint list without dropping j_list_lock. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-09-04 18:09:29 -04:00
Theodore Ts'o	dc6e8d669c	jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint() The __jbd2_journal_remove_checkpoint() doesn't require an elevated b_count; indeed, until the jh structure gets released by the call to jbd2_journal_put_journal_head(), the bh's b_count is elevated by virtue of the existence of the jh structure. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-09-04 18:09:22 -04:00
Theodore Ts'o	754cfed6bb	ext4: drop the EXT4_STATE_DELALLOC_RESERVED flag Having done a full regression test, we can now drop the DELALLOC_RESERVED state flag. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-09-04 18:08:22 -04:00
Theodore Ts'o	e3cf5d5d9a	ext4: prepare to drop EXT4_STATE_DELALLOC_RESERVED The EXT4_STATE_DELALLOC_RESERVED flag was originally implemented because it was too hard to make sure the mballoc and get_block flags could be reliably passed down through all of the codepaths that end up calling ext4_mb_new_blocks(). Since then, we have mb_flags passed down through most of the code paths, so getting rid of EXT4_STATE_DELALLOC_RESERVED isn't as tricky as it used to. This commit plumbs in the last of what is required, and then adds a WARN_ON check to make sure we haven't missed anything. If this passes a full regression test run, we can then drop EXT4_STATE_DELALLOC_RESERVED. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-09-04 18:07:25 -04:00
Theodore Ts'o	a521100231	ext4: pass allocation_request struct to ext4_(alloc,splice)_branch Instead of initializing the allocation_request structure in ext4_alloc_branch(), set it up in ext4_ind_map_blocks(), and then pass it to ext4_alloc_branch() and ext4_splice_branch(). This allows ext4_ind_map_blocks to pass flags in the allocation request structure without having to add Yet Another argument to ext4_alloc_branch(). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-09-04 18:06:25 -04:00
Gu Zheng	6098b45b32	aio: block exit_aio() until all context requests are completed It seems that exit_aio() also needs to wait for all iocbs to complete (like io_destroy), but we missed the wait step in current implemention, so fix it in the same way as we did in io_destroy. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org	2014-09-04 16:54:47 -04:00
Al Viro	0b93a92be4	udf: saner calling conventions for udf_new_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz>	2014-09-04 21:37:41 +02:00
Al Viro	b231509616	udf: fix the udf_iget() vs. udf_new_inode() races Currently udf_iget() (triggered by NFS) can race with udf_new_inode() leading to two inode structures with the same inode number: nfsd: iget_locked() creates inode nfsd: try to read from disk, block on that. udf_new_inode(): allocate inode with that inumber udf_new_inode(): insert it into icache, set it up and dirty udf_write_inode(): write inode into buffer cache nfsd: get CPU again, look into buffer cache, see nice and sane on-disk inode, set the in-core inode from it Fix the problem by putting inode into icache in locked state (I_NEW set) and unlocking it only after it's fully set up. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz>	2014-09-04 21:37:41 +02:00
Al Viro	d2be51cb34	udf: merge the pieces inserting a new non-directory object into directory boilerplate code in udf_{create,mknod,symlink} taken to new helper symlink case converted to unique id calculated by udf_new_inode() - no point finding a new one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz>	2014-09-04 21:37:40 +02:00
Jan Kara	470cca56c3	udf: Set i_generation field Currently UDF doesn't initialize i_generation in any way and thus NFS can easily get reallocated inodes from stale file handles. Luckily UDF already has a unique object identifier associated with each inode - i_unique. Use that for initialization of i_generation. Signed-off-by: Jan Kara <jack@suse.cz>	2014-09-04 21:37:40 +02:00
Jan Kara	4071b91362	udf: Properly detect stale inodes NFS can easily ask for inodes that are already deleted. Currently UDF happily returns such inodes which is a bug. Return -ESTALE if udf_read_inode() is asked to read deleted inode. Signed-off-by: Jan Kara <jack@suse.cz>	2014-09-04 21:37:39 +02:00
Jan Kara	6d3d5e860a	udf: Make udf_read_inode() and udf_iget() return error Currently __udf_read_inode() wasn't returning anything and we found out whether we succeeded reading inode by checking whether inode is bad or not. udf_iget() returned NULL on failure and inode pointer otherwise. Make these two functions properly propagate errors up the call stack and use the return value in callers. Signed-off-by: Jan Kara <jack@suse.cz>	2014-09-04 21:36:35 +02:00
Jan Kara	c03aa9f6e1	udf: Avoid infinite loop when processing indirect ICBs We did not implement any bound on number of indirect ICBs we follow when loading inode. Thus corrupted medium could cause kernel to go into an infinite loop, possibly causing a stack overflow. Fix the possible stack overflow by removing recursion from __udf_read_inode() and limit number of indirect ICBs we follow to avoid infinite loops. Signed-off-by: Jan Kara <jack@suse.cz>	2014-09-04 14:12:29 +02:00
Jan Kara	bb7720a0b4	udf: Fold udf_fill_inode() into __udf_read_inode() There's no good reason to separate these since udf_fill_inode() is called only from __udf_read_inode() and both do part of the same thing. Signed-off-by: Jan Kara <jack@suse.cz>	2014-09-04 13:32:50 +02:00
Jan Kara	8a70ee3307	udf: Avoid dir link count to go negative If we are writing back inode of unlinked directory, its link count ends up being (u16)-1. Although the inode is deleted, udf_iget() can load the inode when NFS uses stale file handle and get confused. Signed-off-by: Jan Kara <jack@suse.cz>	2014-09-04 11:47:51 +02:00
Jaegeuk Kim	4081363fbe	f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB This patch adds three inline functions to clean up dirty casting codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-03 17:37:13 -07:00
Kinglong Mee	027bc41a3e	NFSD: Put export if prepare_creds() fail Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:04 -04:00
Kinglong Mee	13c82e8eb5	NFSD: Full checking of authentication name Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:03 -04:00
Kinglong Mee	48c348b09c	NFSD: Fix bad using of return value from qword_get Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:02 -04:00
Kinglong Mee	15d176c195	NFSD: Fix a memory leak if nfsd4_recdir_load fail Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:01 -04:00
Kinglong Mee	c2236f141e	NFSD: Reset creds after mnt_want_write_file() fail Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:01 -04:00
Kinglong Mee	8519f994e5	NFSD: Put file after ima_file_check fail in nfsd_open() Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:00 -04:00
Linus Torvalds	70c8038dd6	Merge tag 'for-f2fs-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs bug fixes from Jaegeuk Kim: "This series includes patches to: - fix recovery routines - fix bugs related to inline_data/xattr - fix when casting the dentry names - handle EIO or ENOMEM correctly - fix memory leak - fix lock coverage" * tag 'for-f2fs-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (28 commits) f2fs: reposition unlock_new_inode to prevent accessing invalid inode f2fs: fix wrong casting for dentry name f2fs: simplify by using a literal f2fs: truncate stale block for inline_data f2fs: use macro for code readability f2fs: introduce need_do_checkpoint for readability f2fs: fix incorrect calculation with total/free inode num f2fs: remove rename and use rename2 f2fs: skip if inline_data was converted already f2fs: remove rewrite_node_page f2fs: avoid double lock in truncate_blocks f2fs: prevent checkpoint during roll-forward f2fs: add WARN_ON in f2fs_bug_on f2fs: handle EIO not to break fs consistency f2fs: check s_dirty under cp_mutex f2fs: unlock_page when node page is redirtied out f2fs: introduce f2fs_cp_error for readability f2fs: give a chance to mount again when encountering errors f2fs: trigger release_dirty_inode in f2fs_put_super f2fs: don't skip checkpoint if there is no dirty node pages ...	2014-09-03 10:10:28 -07:00
Theodore Ts'o	a9cfcd63e8	ext4: avoid trying to kfree an ERR_PTR pointer Thanks to Dan Carpenter for extending smatch to find bugs like this. (This was found using a development version of smatch.) Fixes: `36de928641` Reported-by: Dan Carpenter <dan.carpenter@oracle.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-09-03 09:37:30 -04:00

... 25 26 27 28 29 ...

38,980 commits