Commit graph

19,389 commits

Author SHA1 Message Date
Frederic Weisbecker
0523676d3f reiserfs: Relax reiserfs lock while freeing the journal
Keeping the reiserfs lock while freeing the journal on
umount path triggers a lock inversion between bdev->bd_mutex
and the reiserfs lock.

We don't need the reiserfs lock at this stage. The filesystem
is not usable anymore, and there are no more pending commits,
everything got flushed (even this operation was done in parallel
and didn't required the reiserfs lock from the current process).

This fixes the following lockdep report:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-atom #172
-------------------------------------------------------
umount/3904 is trying to acquire lock:
 (&bdev->bd_mutex){+.+.+.}, at: [<c10de2c2>] __blkdev_put+0x22/0x160

but task is already holding lock:
 (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1143279>] reiserfs_write_lock+0x29/0x40

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&REISERFS_SB(s)->lock){+.+.+.}:
       [<c105ea7f>] __lock_acquire+0x11ff/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c140199b>] mutex_lock_nested+0x5b/0x340
       [<c1143229>] reiserfs_write_lock_once+0x29/0x50
       [<c111c485>] reiserfs_get_block+0x85/0x1620
       [<c10e1040>] do_mpage_readpage+0x1f0/0x6d0
       [<c10e1640>] mpage_readpages+0xc0/0x100
       [<c1119b89>] reiserfs_readpages+0x19/0x20
       [<c108f1ec>] __do_page_cache_readahead+0x1bc/0x260
       [<c108f2b8>] ra_submit+0x28/0x40
       [<c1087e3e>] filemap_fault+0x40e/0x420
       [<c109b5fd>] __do_fault+0x3d/0x430
       [<c109d47e>] handle_mm_fault+0x12e/0x790
       [<c1022a65>] do_page_fault+0x135/0x330
       [<c1403663>] error_code+0x6b/0x70
       [<c10ef9ca>] load_elf_binary+0x82a/0x1a10
       [<c10ba130>] search_binary_handler+0x90/0x1d0
       [<c10bb70f>] do_execve+0x1df/0x250
       [<c1001746>] sys_execve+0x46/0x70
       [<c1002fa5>] syscall_call+0x7/0xb

-> #2 (&mm->mmap_sem){++++++}:
       [<c105ea7f>] __lock_acquire+0x11ff/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c109b1ab>] might_fault+0x8b/0xb0
       [<c11b8f52>] copy_to_user+0x32/0x70
       [<c10c3b94>] filldir64+0xa4/0xf0
       [<c1109116>] sysfs_readdir+0x116/0x210
       [<c10c3e1d>] vfs_readdir+0x8d/0xb0
       [<c10c3ea9>] sys_getdents64+0x69/0xb0
       [<c1002ec4>] sysenter_do_call+0x12/0x32

-> #1 (sysfs_mutex){+.+.+.}:
       [<c105ea7f>] __lock_acquire+0x11ff/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c140199b>] mutex_lock_nested+0x5b/0x340
       [<c110951c>] sysfs_addrm_start+0x2c/0xb0
       [<c1109aa0>] create_dir+0x40/0x90
       [<c1109b1b>] sysfs_create_dir+0x2b/0x50
       [<c11b2352>] kobject_add_internal+0xc2/0x1b0
       [<c11b2531>] kobject_add_varg+0x31/0x50
       [<c11b25ac>] kobject_add+0x2c/0x60
       [<c1258294>] device_add+0x94/0x560
       [<c11036ea>] add_partition+0x18a/0x2a0
       [<c110418a>] rescan_partitions+0x33a/0x450
       [<c10de5bf>] __blkdev_get+0x12f/0x2d0
       [<c10de76a>] blkdev_get+0xa/0x10
       [<c11034b8>] register_disk+0x108/0x130
       [<c11a87a9>] add_disk+0xd9/0x130
       [<c12998e5>] sd_probe_async+0x105/0x1d0
       [<c10528af>] async_thread+0xcf/0x230
       [<c104bfd4>] kthread+0x74/0x80
       [<c1003aab>] kernel_thread_helper+0x7/0x3c

-> #0 (&bdev->bd_mutex){+.+.+.}:
       [<c105f176>] __lock_acquire+0x18f6/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c140199b>] mutex_lock_nested+0x5b/0x340
       [<c10de2c2>] __blkdev_put+0x22/0x160
       [<c10de40a>] blkdev_put+0xa/0x10
       [<c113ce22>] free_journal_ram+0xd2/0x130
       [<c113ea18>] do_journal_release+0x98/0x190
       [<c113eb2a>] journal_release+0xa/0x10
       [<c1128eb6>] reiserfs_put_super+0x36/0x130
       [<c10b776f>] generic_shutdown_super+0x4f/0xe0
       [<c10b7825>] kill_block_super+0x25/0x40
       [<c11255df>] reiserfs_kill_sb+0x7f/0x90
       [<c10b7f4a>] deactivate_super+0x7a/0x90
       [<c10cccd8>] mntput_no_expire+0x98/0xd0
       [<c10ccfcc>] sys_umount+0x4c/0x310
       [<c10cd2a9>] sys_oldumount+0x19/0x20
       [<c1002ec4>] sysenter_do_call+0x12/0x32

other info that might help us debug this:

2 locks held by umount/3904:
 #0:  (&type->s_umount_key#30){+++++.}, at: [<c10b7f45>] deactivate_super+0x75/0x90
 #1:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1143279>] reiserfs_write_lock+0x29/0x40

stack backtrace:
Pid: 3904, comm: umount Not tainted 2.6.32-atom #172
Call Trace:
 [<c13ff903>] ? printk+0x18/0x1a
 [<c105d33a>] print_circular_bug+0xca/0xd0
 [<c105f176>] __lock_acquire+0x18f6/0x19e0
 [<c108b66f>] ? free_pcppages_bulk+0x1f/0x250
 [<c105f2c8>] lock_acquire+0x68/0x90
 [<c10de2c2>] ? __blkdev_put+0x22/0x160
 [<c10de2c2>] ? __blkdev_put+0x22/0x160
 [<c140199b>] mutex_lock_nested+0x5b/0x340
 [<c10de2c2>] ? __blkdev_put+0x22/0x160
 [<c105c932>] ? mark_held_locks+0x62/0x80
 [<c10afe12>] ? kfree+0x92/0xd0
 [<c10de2c2>] __blkdev_put+0x22/0x160
 [<c105cc3b>] ? trace_hardirqs_on+0xb/0x10
 [<c10de40a>] blkdev_put+0xa/0x10
 [<c113ce22>] free_journal_ram+0xd2/0x130
 [<c113ea18>] do_journal_release+0x98/0x190
 [<c113eb2a>] journal_release+0xa/0x10
 [<c1128eb6>] reiserfs_put_super+0x36/0x130
 [<c1050596>] ? up_write+0x16/0x30
 [<c10b776f>] generic_shutdown_super+0x4f/0xe0
 [<c10b7825>] kill_block_super+0x25/0x40
 [<c10f41e0>] ? vfs_quota_off+0x0/0x20
 [<c11255df>] reiserfs_kill_sb+0x7f/0x90
 [<c10b7f4a>] deactivate_super+0x7a/0x90
 [<c10cccd8>] mntput_no_expire+0x98/0xd0
 [<c10ccfcc>] sys_umount+0x4c/0x310
 [<c10cd2a9>] sys_oldumount+0x19/0x20
 [<c1002ec4>] sysenter_do_call+0x12/0x32

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-02 01:56:54 +01:00
Frederic Weisbecker
27026a05bb reiserfs: Fix reiserfs lock <-> i_mutex dependency inversion on xattr
While deleting the xattrs of an inode, we hold the reiserfs lock
and grab the inode->i_mutex of the targeted inode and the root
private xattr directory.

Later on, we may relax the reiserfs lock for various reasons, this
creates inverted dependencies.

We can remove the reiserfs lock -> i_mutex dependency by relaxing
the former before calling open_xa_dir(). This is fine because the
lookup and creation of xattr private directories done in
open_xa_dir() are covered by the targeted inode mutexes. And deeper
operations in the tree are still done under the write lock.

This fixes the following lockdep report:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-atom #173
-------------------------------------------------------
cp/3204 is trying to acquire lock:
 (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11432b9>] reiserfs_write_lock_once+0x29/0x50

but task is already holding lock:
 (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c1141e18>] open_xa_dir+0xd8/0x1b0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&sb->s_type->i_mutex_key#4/3){+.+.+.}:
       [<c105ea7f>] __lock_acquire+0x11ff/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c1401a2b>] mutex_lock_nested+0x5b/0x340
       [<c1141d83>] open_xa_dir+0x43/0x1b0
       [<c1142722>] reiserfs_for_each_xattr+0x62/0x260
       [<c114299a>] reiserfs_delete_xattrs+0x1a/0x60
       [<c111ea1f>] reiserfs_delete_inode+0x9f/0x150
       [<c10c9c32>] generic_delete_inode+0xa2/0x170
       [<c10c9d4f>] generic_drop_inode+0x4f/0x70
       [<c10c8b07>] iput+0x47/0x50
       [<c10c0965>] do_unlinkat+0xd5/0x160
       [<c10c0a00>] sys_unlink+0x10/0x20
       [<c1002ec4>] sysenter_do_call+0x12/0x32

-> #0 (&REISERFS_SB(s)->lock){+.+.+.}:
       [<c105f176>] __lock_acquire+0x18f6/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c1401a2b>] mutex_lock_nested+0x5b/0x340
       [<c11432b9>] reiserfs_write_lock_once+0x29/0x50
       [<c1117012>] reiserfs_lookup+0x62/0x140
       [<c10bd85f>] __lookup_hash+0xef/0x110
       [<c10bf21d>] lookup_one_len+0x8d/0xc0
       [<c1141e2a>] open_xa_dir+0xea/0x1b0
       [<c1141fe5>] xattr_lookup+0x15/0x160
       [<c1142476>] reiserfs_xattr_get+0x56/0x2a0
       [<c1144042>] reiserfs_get_acl+0xa2/0x360
       [<c114461a>] reiserfs_cache_default_acl+0x3a/0x160
       [<c111789c>] reiserfs_mkdir+0x6c/0x2c0
       [<c10bea96>] vfs_mkdir+0xd6/0x180
       [<c10c0c10>] sys_mkdirat+0xc0/0xd0
       [<c10c0c40>] sys_mkdir+0x20/0x30
       [<c1002ec4>] sysenter_do_call+0x12/0x32

other info that might help us debug this:

2 locks held by cp/3204:
 #0:  (&sb->s_type->i_mutex_key#4/1){+.+.+.}, at: [<c10bd8d6>] lookup_create+0x26/0xa0
 #1:  (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c1141e18>] open_xa_dir+0xd8/0x1b0

stack backtrace:
Pid: 3204, comm: cp Not tainted 2.6.32-atom #173
Call Trace:
 [<c13ff993>] ? printk+0x18/0x1a
 [<c105d33a>] print_circular_bug+0xca/0xd0
 [<c105f176>] __lock_acquire+0x18f6/0x19e0
 [<c105d3aa>] ? check_usage+0x6a/0x460
 [<c105f2c8>] lock_acquire+0x68/0x90
 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50
 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50
 [<c1401a2b>] mutex_lock_nested+0x5b/0x340
 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50
 [<c11432b9>] reiserfs_write_lock_once+0x29/0x50
 [<c1117012>] reiserfs_lookup+0x62/0x140
 [<c105ccca>] ? debug_check_no_locks_freed+0x8a/0x140
 [<c105cbe4>] ? trace_hardirqs_on_caller+0x124/0x170
 [<c10bd85f>] __lookup_hash+0xef/0x110
 [<c10bf21d>] lookup_one_len+0x8d/0xc0
 [<c1141e2a>] open_xa_dir+0xea/0x1b0
 [<c1141fe5>] xattr_lookup+0x15/0x160
 [<c1142476>] reiserfs_xattr_get+0x56/0x2a0
 [<c1144042>] reiserfs_get_acl+0xa2/0x360
 [<c10ca2e7>] ? new_inode+0x27/0xa0
 [<c114461a>] reiserfs_cache_default_acl+0x3a/0x160
 [<c1402eb7>] ? _spin_unlock+0x27/0x40
 [<c111789c>] reiserfs_mkdir+0x6c/0x2c0
 [<c10c7cb8>] ? __d_lookup+0x108/0x190
 [<c105c932>] ? mark_held_locks+0x62/0x80
 [<c1401c8d>] ? mutex_lock_nested+0x2bd/0x340
 [<c10bd17a>] ? generic_permission+0x1a/0xa0
 [<c11788fe>] ? security_inode_permission+0x1e/0x20
 [<c10bea96>] vfs_mkdir+0xd6/0x180
 [<c10c0c10>] sys_mkdirat+0xc0/0xd0
 [<c10505c6>] ? up_read+0x16/0x30
 [<c1002fd8>] ? restore_all_notrace+0x0/0x18
 [<c10c0c40>] sys_mkdir+0x20/0x30
 [<c1002ec4>] sysenter_do_call+0x12/0x32

v2: Don't drop reiserfs_mutex_lock_nested_safe() as we'll still
    need it later

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-02 01:54:47 +01:00
Frederic Weisbecker
c4a62ca362 reiserfs: Warn on lock relax if taken recursively
When we relax the reiserfs lock to avoid creating unwanted
dependencies against others locks while grabbing these,
we want to ensure it has not been taken recursively, otherwise
the lock won't be really relaxed. Only its depth will be decreased.
The unwanted dependency would then actually happen.

To prevent from that, add a reiserfs_lock_check_recursive() call
in the places that need it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-02 01:54:37 +01:00
Frederic Weisbecker
0719d34347 reiserfs: Fix reiserfs lock <-> i_xattr_sem dependency inversion
i_xattr_sem depends on the reiserfs lock. But after we grab
i_xattr_sem, we may relax/relock the reiserfs lock while waiting
on a freezed filesystem, creating a dependency inversion between
the two locks.

In order to avoid the i_xattr_sem -> reiserfs lock dependency, let's
create a reiserfs_down_read_safe() that acts like
reiserfs_mutex_lock_safe(): relax the reiserfs lock while grabbing
another lock to avoid undesired dependencies induced by the
heivyweight reiserfs lock.

This fixes the following warning:

[  990.005931] =======================================================
[  990.012373] [ INFO: possible circular locking dependency detected ]
[  990.013233] 2.6.33-rc1 #1
[  990.013233] -------------------------------------------------------
[  990.013233] dbench/1891 is trying to acquire lock:
[  990.013233]  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<ffffffff81159505>] reiserfs_write_lock+0x35/0x50
[  990.013233]
[  990.013233] but task is already holding lock:
[  990.013233]  (&REISERFS_I(inode)->i_xattr_sem){+.+.+.}, at: [<ffffffff8115899a>] reiserfs_xattr_set_handle+0x8a/0x470
[  990.013233]
[  990.013233] which lock already depends on the new lock.
[  990.013233]
[  990.013233]
[  990.013233] the existing dependency chain (in reverse order) is:
[  990.013233]
[  990.013233] -> #1 (&REISERFS_I(inode)->i_xattr_sem){+.+.+.}:
[  990.013233]        [<ffffffff81063afc>] __lock_acquire+0xf9c/0x1560
[  990.013233]        [<ffffffff8106414f>] lock_acquire+0x8f/0xb0
[  990.013233]        [<ffffffff814ac194>] down_write+0x44/0x80
[  990.013233]        [<ffffffff8115899a>] reiserfs_xattr_set_handle+0x8a/0x470
[  990.013233]        [<ffffffff81158e30>] reiserfs_xattr_set+0xb0/0x150
[  990.013233]        [<ffffffff8115a6aa>] user_set+0x8a/0x90
[  990.013233]        [<ffffffff8115901a>] reiserfs_setxattr+0xaa/0xb0
[  990.013233]        [<ffffffff810e2596>] __vfs_setxattr_noperm+0x36/0xa0
[  990.013233]        [<ffffffff810e26bc>] vfs_setxattr+0xbc/0xc0
[  990.013233]        [<ffffffff810e2780>] setxattr+0xc0/0x150
[  990.013233]        [<ffffffff810e289d>] sys_fsetxattr+0x8d/0xa0
[  990.013233]        [<ffffffff81002dab>] system_call_fastpath+0x16/0x1b
[  990.013233]
[  990.013233] -> #0 (&REISERFS_SB(s)->lock){+.+.+.}:
[  990.013233]        [<ffffffff81063e30>] __lock_acquire+0x12d0/0x1560
[  990.013233]        [<ffffffff8106414f>] lock_acquire+0x8f/0xb0
[  990.013233]        [<ffffffff814aba77>] __mutex_lock_common+0x47/0x3b0
[  990.013233]        [<ffffffff814abebe>] mutex_lock_nested+0x3e/0x50
[  990.013233]        [<ffffffff81159505>] reiserfs_write_lock+0x35/0x50
[  990.013233]        [<ffffffff811340e5>] reiserfs_prepare_write+0x45/0x180
[  990.013233]        [<ffffffff81158bb6>] reiserfs_xattr_set_handle+0x2a6/0x470
[  990.013233]        [<ffffffff81158e30>] reiserfs_xattr_set+0xb0/0x150
[  990.013233]        [<ffffffff8115a6aa>] user_set+0x8a/0x90
[  990.013233]        [<ffffffff8115901a>] reiserfs_setxattr+0xaa/0xb0
[  990.013233]        [<ffffffff810e2596>] __vfs_setxattr_noperm+0x36/0xa0
[  990.013233]        [<ffffffff810e26bc>] vfs_setxattr+0xbc/0xc0
[  990.013233]        [<ffffffff810e2780>] setxattr+0xc0/0x150
[  990.013233]        [<ffffffff810e289d>] sys_fsetxattr+0x8d/0xa0
[  990.013233]        [<ffffffff81002dab>] system_call_fastpath+0x16/0x1b
[  990.013233]
[  990.013233] other info that might help us debug this:
[  990.013233]
[  990.013233] 2 locks held by dbench/1891:
[  990.013233]  #0:  (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff810e2678>] vfs_setxattr+0x78/0xc0
[  990.013233]  #1:  (&REISERFS_I(inode)->i_xattr_sem){+.+.+.}, at: [<ffffffff8115899a>] reiserfs_xattr_set_handle+0x8a/0x470
[  990.013233]
[  990.013233] stack backtrace:
[  990.013233] Pid: 1891, comm: dbench Not tainted 2.6.33-rc1 #1
[  990.013233] Call Trace:
[  990.013233]  [<ffffffff81061639>] print_circular_bug+0xe9/0xf0
[  990.013233]  [<ffffffff81063e30>] __lock_acquire+0x12d0/0x1560
[  990.013233]  [<ffffffff8115899a>] ? reiserfs_xattr_set_handle+0x8a/0x470
[  990.013233]  [<ffffffff8106414f>] lock_acquire+0x8f/0xb0
[  990.013233]  [<ffffffff81159505>] ? reiserfs_write_lock+0x35/0x50
[  990.013233]  [<ffffffff8115899a>] ? reiserfs_xattr_set_handle+0x8a/0x470
[  990.013233]  [<ffffffff814aba77>] __mutex_lock_common+0x47/0x3b0
[  990.013233]  [<ffffffff81159505>] ? reiserfs_write_lock+0x35/0x50
[  990.013233]  [<ffffffff81159505>] ? reiserfs_write_lock+0x35/0x50
[  990.013233]  [<ffffffff81062592>] ? mark_held_locks+0x72/0xa0
[  990.013233]  [<ffffffff814ab81d>] ? __mutex_unlock_slowpath+0xbd/0x140
[  990.013233]  [<ffffffff810628ad>] ? trace_hardirqs_on_caller+0x14d/0x1a0
[  990.013233]  [<ffffffff814abebe>] mutex_lock_nested+0x3e/0x50
[  990.013233]  [<ffffffff81159505>] reiserfs_write_lock+0x35/0x50
[  990.013233]  [<ffffffff811340e5>] reiserfs_prepare_write+0x45/0x180
[  990.013233]  [<ffffffff81158bb6>] reiserfs_xattr_set_handle+0x2a6/0x470
[  990.013233]  [<ffffffff81158e30>] reiserfs_xattr_set+0xb0/0x150
[  990.013233]  [<ffffffff814abcb4>] ? __mutex_lock_common+0x284/0x3b0
[  990.013233]  [<ffffffff8115a6aa>] user_set+0x8a/0x90
[  990.013233]  [<ffffffff8115901a>] reiserfs_setxattr+0xaa/0xb0
[  990.013233]  [<ffffffff810e2596>] __vfs_setxattr_noperm+0x36/0xa0
[  990.013233]  [<ffffffff810e26bc>] vfs_setxattr+0xbc/0xc0
[  990.013233]  [<ffffffff810e2780>] setxattr+0xc0/0x150
[  990.013233]  [<ffffffff81056018>] ? sched_clock_cpu+0xb8/0x100
[  990.013233]  [<ffffffff8105eded>] ? trace_hardirqs_off+0xd/0x10
[  990.013233]  [<ffffffff810560a3>] ? cpu_clock+0x43/0x50
[  990.013233]  [<ffffffff810c6820>] ? fget+0xb0/0x110
[  990.013233]  [<ffffffff810c6770>] ? fget+0x0/0x110
[  990.013233]  [<ffffffff81002ddc>] ? sysret_check+0x27/0x62
[  990.013233]  [<ffffffff810e289d>] sys_fsetxattr+0x8d/0xa0
[  990.013233]  [<ffffffff81002dab>] system_call_fastpath+0x16/0x1b

Reported-and-tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-02 01:54:04 +01:00
Theodore Ts'o
9d0be50230 ext4: Calculate metadata requirements more accurately
In the past, ext4_calc_metadata_amount(), and its sub-functions
ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount()
badly over-estimated the number of metadata blocks that might be
required for delayed allocation blocks.  This didn't matter as much
when functions which managed the reserved metadata blocks were more
aggressive about dropping reserved metadata blocks as delayed
allocation blocks were written, but unfortunately they were too
aggressive.  This was fixed in commit 0637c6f, but as a result the
over-estimation by ext4_calc_metadata_amount() would lead to reserving
2-3 times the number of pending delayed allocation blocks as
potentially required metadata blocks.  So if there are 1 megabytes of
blocks which have been not yet been allocation, up to 3 megabytes of
space would get reserved out of the user's quota and from the file
system free space pool until all of the inode's data blocks have been
allocated.

This commit addresses this problem by much more accurately estimating
the number of metadata blocks that will be required.  It will still
somewhat over-estimate the number of blocks needed, since it must make
a worst case estimate not knowing which physical blocks will be
needed, but it is much more accurate than before.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-01-01 02:41:30 -05:00
Theodore Ts'o
ee5f4d9cdf ext4: Fix accounting of reserved metadata blocks
Commit 0637c6f had a typo which caused the reserved metadata blocks to
not be released correctly.   Fix this.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-01-01 02:36:15 -05:00
Steve French
6a5fa2362b [CIFS] Add support for TCP_NODELAY
mount option sockopt=TCP_NODELAY helpful for faster networks
boosting performance.  Kernel bugzilla bug number 14032.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-01-01 01:28:43 +00:00
Linus Torvalds
c03f6bfc9f Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Enable mmap on forcedirectio mounts
  cifs: NULL out tcon, pSesInfo, and srvTcp pointers when chasing DFS referrals
2009-12-31 15:17:26 -08:00
Tao Ma
86470e98cc ocfs2: Handle O_DIRECT when writing to a refcounted cluster.
In case of writing to a refcounted cluster with O_DIRECT,
we need to fall back to buffer write. And when it is finished,
we need to flush the page and the journal as we did for other
O_DIRECT writes.

This patch fix oss bug 1191.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1191

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Tested-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-30 19:53:35 -08:00
Linus Torvalds
1f11abc966 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Patch up how we claim metadata blocks for quota purposes
  ext4: Ensure zeroout blocks have no dirty metadata
  ext4: return correct wbc.nr_to_write in ext4_da_writepages
  ext4: Update documentation to correct the inode_readahead_blks option name
  jbd2: don't use __GFP_NOFAIL in journal_init_common()
  ext4: flush delalloc blocks when space is low
  fs-writeback: Add helper function to start writeback if idle
  ext4: Eliminate potential double free on error path
  ext4: fix unsigned long long printk warning in super.c
  ext4, jbd2: Add barriers for file systems with exernal journals
  ext4: replace BUG() with return -EIO in ext4_ext_get_blocks
  ext4: add module aliases for ext2 and ext3
  ext4: Don't ask about supporting ext2/3 in ext4 if ext4 is not configured
  ext4: remove unused #include <linux/version.h>
2009-12-30 13:25:56 -08:00
Serge E. Hallyn
7ea6600148 generic_permission: MAY_OPEN is not write access
generic_permission was refusing CAP_DAC_READ_SEARCH-enabled
processes from opening DAC-protected files read-only, because
do_filp_open adds MAY_OPEN to the open mask.

Ignore MAY_OPEN.  After this patch, CAP_DAC_READ_SEARCH is
again sufficient to open(fname, O_RDONLY) on a file to which
DAC otherwise refuses us read permission.

Reported-by: Mike Kazantsev <mk.fraggod@gmail.com>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Mike Kazantsev <mk.fraggod@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-30 12:35:44 -08:00
Theodore Ts'o
0637c6f413 ext4: Patch up how we claim metadata blocks for quota purposes
As reported in Kernel Bugzilla #14936, commit d21cd8f triggered a BUG
in the function ext4_da_update_reserve_space() found in
fs/ext4/inode.c.  The root cause of this BUG() was caused by the fact
that ext4_calc_metadata_amount() can severely over-estimate how many
metadata blocks will be needed, especially when using direct
block-mapped files.

In addition, it can also badly *under* estimate how much space is
needed, since ext4_calc_metadata_amount() assumes that the blocks are
contiguous, and this is not always true.  If the application is
writing blocks to a sparse file, the number of metadata blocks
necessary can be severly underestimated by the functions
ext4_da_reserve_space(), ext4_da_update_reserve_space() and
ext4_da_release_space().  This was the cause of the dq_claim_space
reports found on kerneloops.org.

Unfortunately, doing this right means that we need to massively
over-estimate the amount of free space needed.  So in some cases we
may need to force the inode to be written to disk asynchronously in
to avoid spurious quota failures.

http://bugzilla.kernel.org/show_bug.cgi?id=14936

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-30 14:20:45 -05:00
Aneesh Kumar K.V
515f41c33a ext4: Ensure zeroout blocks have no dirty metadata
This fixes a bug (found by Curt Wohlgemuth) in which new blocks
returned from an extent created with ext4_ext_zeroout() can have dirty
metadata still associated with them.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-29 23:39:06 -05:00
Frederic Weisbecker
98ea3f50bc reiserfs: Fix remaining in-reclaim-fs <-> reclaim-fs-on locking inversion
Commit 500f5a0bf5
(reiserfs: Fix possible recursive lock) fixed a vmalloc under reiserfs
lock that triggered a lockdep warning because of a
IN-FS-RECLAIM <-> RECLAIM-FS-ON locking dependency inversion.

But this patch has ommitted another vmalloc call in the same path
that allocates the journal. Relax the lock for this one too.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2009-12-29 22:34:59 +01:00
Richard Kennedy
2faf2e19dd ext4: return correct wbc.nr_to_write in ext4_da_writepages
When ext4_da_writepages increases the nr_to_write in writeback_control
then it must always re-base the return value.  Originally there was a
(misguided) attempt prevent wbc.nr_to_write from going negative.  In
fact, it's necessary to allow nr_to_write to be negative so that
wb_writeback() can correctly calculate how many pages were actually
written.  

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-25 15:46:07 -05:00
Tobias Klauser
33e189bd57 nilfs2: Storage class should be before const qualifier
The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-12-25 13:01:50 +09:00
Jiro SEKIBA
5ee5814832 nilfs2: trivial coding style fix
This is a trivial style fix patch to mend errors/warnings
reported by "checkpatch.pl --file".

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-12-25 13:01:50 +09:00
Linus Torvalds
45e62974fb Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c
  ocfs2/trivial: Use proper mask for 2 places in hearbeat.c
  Ocfs2: Let ocfs2 support fiemap for symlink and fast symlink.
  Ocfs2: Should ocfs2 support fiemap for S_IFDIR inode?
  ocfs2: Use FIEMAP_EXTENT_SHARED
  fiemap: Add new extent flag FIEMAP_EXTENT_SHARED
  ocfs2: replace u8 by __u8 in ocfs2_fs.h
  ocfs2: explicit declare uninitialized var in user_cluster_connect()
  ocfs2-devel: remove redundant OCFS2_MOUNT_POSIX_ACL check in ocfs2_get_acl_nolock()
  ocfs2: return -EAGAIN instead of EAGAIN in dlm
  ocfs2/cluster: Make fence method configurable - v2
  ocfs2: Set MS_POSIXACL on remount
  ocfs2: Make acl use the default
  ocfs2: Always include ACL support
2009-12-24 12:59:11 -08:00
Tao Ma
8ff6af881d ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c
In ocfs2_value_metas_in_xattr_header, we should Use
le16_to_cpu for ocfs2_extent_list.l_next_free_rec.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-23 17:52:16 -08:00
Tao Ma
b31d308ddc ocfs2/trivial: Use proper mask for 2 places in hearbeat.c
I just noticed today that there are 2 places of "mlog(0,...)"
in  fs/ocfs2/cluster/heartbeat.c, but actually have no default
mask prefix in that file.
So change them to mlog(ML_HEARTBEAT,...).

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-23 17:52:13 -08:00
Tristan Ye
86239d59e2 Ocfs2: Let ocfs2 support fiemap for symlink and fast symlink.
For fast symlink, it can be treated the same as inlined files since
the data extent we want to return of both case all were stored in
metadata block. For symlink, it can be simply treated the same as we
did for regular files.

Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-23 17:52:09 -08:00
Linus Torvalds
f793067eb9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  devtmpfs: unlock mutex in case of string allocation error
  Driver core: export platform_device_register_data as a GPL symbol
  driver core: Prevent reference to freed memory on error path
  Driver-core: Fix bogus 0 error return in device_add()
  Driver core: driver_attribute parameters can often be const*
  Driver core: bin_attribute parameters can often be const*
  Driver core: device_attribute parameters can often be const*
  Doc/stable rules: add new cherry-pick logic
  vfs: get_sb_single() - do not pass options twice
  devtmpfs: Convert dirlock to a mutex
2009-12-23 13:35:03 -08:00
Linus Torvalds
849254e9bb Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: Set i_nlink properly during reflink.
  ocfs2: Add reflinked file's inode to inode hash eariler.
  ocfs2: refcounttree.c cleanup.
  ocfs2: Find proper end cpos for a leaf refcount block.
2009-12-23 13:33:07 -08:00
Sage Weil
93cea5bebf ceph: use ceph_pagelist for mds reconnect message; change encoding (protocol change)
Use the ceph_pagelist to encode the MDS reconnect message.  We change the
message encoding (protocol change!) at the same time to make our life
easier (we don't know how many snaprealms we have when we start encoding).

An empty message implies the session is closed/does not exist.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 12:21:51 -08:00
Sage Weil
58bb3b374b ceph: support ceph_pagelist for message payload
The ceph_pagelist is a simple list of whole pages, strung together via
their lru list_head.  It facilitates encoding to a "buffer" of unknown
size.  Allow its use in place of the ceph_msg page vector.

This will be used to fix the huge buffer preallocation woes of MDS
reconnection.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 12:12:31 -08:00
Phil Carmody
66ecb92be9 Driver core: bin_attribute parameters can often be const*
Many struct bin_attribute descriptors are purely read-only
structures, and there's no need to change them. Therefore
make the promise not to, which will let those descriptors
be put in a ro section.

Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23 11:23:43 -08:00
Kay Sievers
9329d1beae vfs: get_sb_single() - do not pass options twice
Filesystem code usually destroys the option buffer while
parsing it. This leads to errors when the same buffer is
passed twice. In case we fill a new superblock do not call
remount.

This is needed to quite a warning that the debugfs code
causes every boot.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23 11:23:43 -08:00
Sage Weil
04a419f908 ceph: add feature bits to connection handshake (protocol change)
Define supported and required feature set.  Fail connection if the server
requires features we do not support (TAG_FEATURES), or if the server does
not support features we require.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 09:30:21 -08:00
Sage Weil
6df058c025 ceph: include transaction id in ceph_msg_header (protocol change)
Many (most?) message types include a transaction id.  By including it in
the fixed size header, we always have it available even when we are unable
to allocate memory for the (larger, variable sized) message body.  This
will allow us to error out the appropriate request instead of (silently)
dropping the reply.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:22 -08:00
Sage Weil
0cf90ab5b0 ceph: more informative msgpool errors
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:21 -08:00
Sage Weil
350b1c32ea ceph: control access to page vector for incoming data
When we issue an OSD read, we specify a vector of pages that the data is to
be read into.  The request may be sent multiple times, to multiple OSDs, if
the osdmap changes, which means we can get more than one reply.

Only read data into the page vector if the reply is coming from the
OSD we last sent the request to.  Keep track of which connection is using
the vector by taking a reference.  If another connection was already
using the vector before and a new reply comes in on the right connection,
revoke the pages from the other connection.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:20 -08:00
Sage Weil
ec302645f4 ceph: use connection mutex to protect read and write stages
Use a single mutex (previously out_mutex) to protect both read and write
activity from concurrent ceph_con_* calls.  Drop the mutex when doing
callbacks to avoid nested locking (the callback may need to call something
like ceph_con_close).

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:19 -08:00
Sage Weil
529cfcc46f ceph: unregister canceled/timed out osd requests
Canceled or timed out osd requests were getting left in the request list
and never deallocated (until umount).  Unregister if they are canceled
(control-c) or time out.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:19 -08:00
Sage Weil
e0e3271074 ceph: only unregister registered bdi
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:18 -08:00
Sage Weil
5dacf09121 ceph: do not touch_caps while iterating over caps list
Avoid confusing iterate_session_caps(), flag the session while we are
iterating so that __touch_cap does not rearrange items on the list.

All other modifiers of session->s_caps do so under the protection of
s_mutex.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23 08:17:14 -08:00
Andrew Morton
3ebfdf885a jbd2: don't use __GFP_NOFAIL in journal_init_common()
It triggers the warning in get_page_from_freelist(), and it isn't
appropriate to use __GFP_NOFAIL here anyway.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14843

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23 08:05:15 -05:00
Eric Sandeen
c8afb44682 ext4: flush delalloc blocks when space is low
Creating many small files in rapid succession on a small
filesystem can lead to spurious ENOSPC; on a 104MB filesystem:

for i in `seq 1 22500`; do
    echo -n > $SCRATCH_MNT/$i
    echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i
done

leads to ENOSPC even though after a sync, 40% of the fs is free
again.

This is because we reserve worst-case metadata for delalloc writes,
and when data is allocated that worst-case reservation is not
usually needed.

When freespace is low, kicking off an async writeback will start
converting that worst-case space usage into something more realistic,
almost always freeing up space to continue.

This resolves the testcase for me, and survives all 4 generic
ENOSPC tests in xfstests.

We'll still need a hard synchronous sync to squeeze out the last bit,
but this fixes things up to a large degree.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23 07:58:12 -05:00
Eric Sandeen
17bd55d037 fs-writeback: Add helper function to start writeback if idle
ext4, at least, would like to start pushing on writeback if it starts
to get close to ENOSPC when reserving worst-case blocks for delalloc
writes.  Writing out delalloc data will convert those worst-case
predictions into usually smaller actual usage, freeing up space
before we hit ENOSPC based on this speculation.

Thanks to Jens for the suggestion for the helper function,
& the naming help.

I've made the helper return status on whether writeback was
started even though I don't plan to use it in the ext4 patch;
it seems like it would be potentially useful to test this
in some cases.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
2009-12-23 07:57:07 -05:00
Julia Lawall
d3533d72e7 ext4: Eliminate potential double free on error path
b_entry_name and buffer are initially NULL, are initialized within a loop
to the result of calling kmalloc, and are freed at the bottom of this loop.
The loop contains gotos to cleanup, which also frees b_entry_name and
buffer.  Some of these gotos are before the reinitializations of
b_entry_name and buffer.  To maintain the invariant that b_entry_name and
buffer are NULL at the top of the loop, and thus acceptable arguments to
kfree, these variables are now set to NULL after the kfrees.

This seems to be the simplest solution.  A more complicated solution
would be to introduce more labels in the error handling code at the end of
the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier E;
expression E1;
iterator I;
statement S;
@@

*kfree(E);
... when != E = E1
    when != I(E,...) S
    when != &E
*kfree(E);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23 07:52:31 -05:00
Andrew Morton
a6b43e3825 ext4: fix unsigned long long printk warning in super.c
sparc64 allmodconfig:

fs/ext4/super.c: In function `lifetime_write_kbytes_show':
fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4)
fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4)

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23 07:48:08 -05:00
Jan Kara
869835dfad quota: Improve checking of quota file header
When we are asked for vfsv0 quota format and the file is in vfsv1
format (or vice versa), refuse to use the quota file. Also return
with error when we don't like the header of quota file.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:13 +01:00
Yin Kangkai
765f836190 jbd: jbd-debug and jbd2-debug should be writable
jbd-debug and jbd2-debug is currently read-only (S_IRUGO), which is not
correct. Make it writable so that we can start debuging.

Signed-off-by: Yin Kangkai <kangkai.yin@intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:13 +01:00
Dmitry Monakhov
39bc680a81 ext4: fix sleep inside spinlock issue with quota and dealloc (#14739)
Unlock i_block_reservation_lock before vfs_dq_reserve_block().
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=14739

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:12 +01:00
Dmitry Monakhov
d21cd8f163 ext4: Fix potential quota deadlock
We have to delay vfs_dq_claim_space() until allocation context destruction.
Currently we have following call-trace:
ext4_mb_new_blocks()
  /* task is already holding ac->alloc_semp */
 ->ext4_mb_mark_diskspace_used
    ->vfs_dq_claim_space()  /*  acquire dqptr_sem here. Possible deadlock */
 ->ext4_mb_release_context() /* drop ac->alloc_semp here */

Let's move quota claiming to ext4_da_update_reserve_space()

 =======================================================
 [ INFO: possible circular locking dependency detected ]
 2.6.32-rc7 #18
 -------------------------------------------------------
 write-truncate-/3465 is trying to acquire lock:
  (&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0

 but task is already holding lock:
  (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #3 (&meta_group_info[i]->alloc_sem){++++..}:
        [<c017d04b>] __lock_acquire+0xd7b/0x1260
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c0527191>] down_read+0x51/0x90
        [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
        [<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870
        [<c029c9d3>] ext4_free_blocks+0x73/0x130
        [<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0
        [<c02a8087>] ext4_truncate+0x187/0x5e0
        [<c01e0f7b>] vmtruncate+0x6b/0x70
        [<c022ec02>] inode_setattr+0x62/0x190
        [<c02a2d7a>] ext4_setattr+0x25a/0x370
        [<c022ee81>] notify_change+0x151/0x340
        [<c021349d>] do_truncate+0x6d/0xa0
        [<c0221034>] may_open+0x1d4/0x200
        [<c022412b>] do_filp_open+0x1eb/0x910
        [<c021244d>] do_sys_open+0x6d/0x140
        [<c021258e>] sys_open+0x2e/0x40
        [<c0103100>] sysenter_do_call+0x12/0x32

 -> #2 (&ei->i_data_sem){++++..}:
        [<c017d04b>] __lock_acquire+0xd7b/0x1260
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c0527191>] down_read+0x51/0x90
        [<c02a5787>] ext4_get_blocks+0x47/0x450
        [<c02a74c1>] ext4_getblk+0x61/0x1d0
        [<c02a7a7f>] ext4_bread+0x1f/0xa0
        [<c02bcddc>] ext4_quota_write+0x12c/0x310
        [<c0262d23>] qtree_write_dquot+0x93/0x120
        [<c0261708>] v2_write_dquot+0x28/0x30
        [<c025d3fb>] dquot_commit+0xab/0xf0
        [<c02be977>] ext4_write_dquot+0x77/0x90
        [<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50
        [<c025e321>] dquot_alloc_inode+0x101/0x180
        [<c029fec2>] ext4_new_inode+0x602/0xf00
        [<c02ad789>] ext4_create+0x89/0x150
        [<c0221ff2>] vfs_create+0xa2/0xc0
        [<c02246e7>] do_filp_open+0x7a7/0x910
        [<c021244d>] do_sys_open+0x6d/0x140
        [<c021258e>] sys_open+0x2e/0x40
        [<c0103100>] sysenter_do_call+0x12/0x32

 -> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}:
        [<c017d04b>] __lock_acquire+0xd7b/0x1260
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c0526505>] mutex_lock_nested+0x65/0x2d0
        [<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0
        [<c02610af>] vfs_quota_on_path+0x5f/0x70
        [<c02bc812>] ext4_quota_on+0x112/0x190
        [<c026345a>] sys_quotactl+0x44a/0x8a0
        [<c0103100>] sysenter_do_call+0x12/0x32

 -> #0 (&s->s_dquot.dqptr_sem){++++..}:
        [<c017d361>] __lock_acquire+0x1091/0x1260
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c0527191>] down_read+0x51/0x90
        [<c025e73b>] dquot_claim_space+0x3b/0x1b0
        [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
        [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
        [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
        [<c02a5966>] ext4_get_blocks+0x226/0x450
        [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
        [<c02a6ed6>] ext4_da_writepages+0x506/0x790
        [<c01de272>] do_writepages+0x22/0x50
        [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
        [<c01d7b9b>] filemap_flush+0x2b/0x30
        [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
        [<c029e595>] ext4_release_file+0x75/0xb0
        [<c0216b59>] __fput+0xf9/0x210
        [<c0216c97>] fput+0x27/0x30
        [<c02122dc>] filp_close+0x4c/0x80
        [<c014510e>] put_files_struct+0x6e/0xd0
        [<c01451b7>] exit_files+0x47/0x60
        [<c0146a24>] do_exit+0x144/0x710
        [<c0147028>] do_group_exit+0x38/0xa0
        [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
        [<c0102849>] do_notify_resume+0xb9/0x890
        [<c01032d2>] work_notifysig+0x13/0x21

 other info that might help us debug this:

 3 locks held by write-truncate-/3465:
  #0:  (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0
  #1:  (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450
  #2:  (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370

 stack backtrace:
 Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18
 Call Trace:
  [<c0524cb3>] ? printk+0x1d/0x22
  [<c017ac9a>] print_circular_bug+0xca/0xd0
  [<c017d361>] __lock_acquire+0x1091/0x1260
  [<c016bca2>] ? sched_clock_local+0xd2/0x170
  [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
  [<c017d5ea>] lock_acquire+0xba/0xd0
  [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
  [<c0527191>] down_read+0x51/0x90
  [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
  [<c025e73b>] dquot_claim_space+0x3b/0x1b0
  [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
  [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
  [<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280
  [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
  [<c016bca2>] ? sched_clock_local+0xd2/0x170
  [<c016be60>] ? sched_clock_cpu+0x120/0x160
  [<c016beef>] ? cpu_clock+0x4f/0x60
  [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
  [<c052712c>] ? down_write+0x8c/0xa0
  [<c02a5966>] ext4_get_blocks+0x226/0x450
  [<c016be60>] ? sched_clock_cpu+0x120/0x160
  [<c016beef>] ? cpu_clock+0x4f/0x60
  [<c017908b>] ? trace_hardirqs_off+0xb/0x10
  [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
  [<c01d69cc>] ? find_get_pages_tag+0x16c/0x180
  [<c01d6860>] ? find_get_pages_tag+0x0/0x180
  [<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0
  [<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40
  [<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0
  [<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0
  [<c02a6ed6>] ext4_da_writepages+0x506/0x790
  [<c016beef>] ? cpu_clock+0x4f/0x60
  [<c016bca2>] ? sched_clock_local+0xd2/0x170
  [<c016be60>] ? sched_clock_cpu+0x120/0x160
  [<c016be60>] ? sched_clock_cpu+0x120/0x160
  [<c02a69d0>] ? ext4_da_writepages+0x0/0x790
  [<c01de272>] do_writepages+0x22/0x50
  [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
  [<c01d7b9b>] filemap_flush+0x2b/0x30
  [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
  [<c029e595>] ext4_release_file+0x75/0xb0
  [<c0216b59>] __fput+0xf9/0x210
  [<c0216c97>] fput+0x27/0x30
  [<c02122dc>] filp_close+0x4c/0x80
  [<c014510e>] put_files_struct+0x6e/0xd0
  [<c01451b7>] exit_files+0x47/0x60
  [<c0146a24>] do_exit+0x144/0x710
  [<c017b163>] ? lock_release_holdtime+0x33/0x210
  [<c0528137>] ? _spin_unlock_irq+0x27/0x30
  [<c0147028>] do_group_exit+0x38/0xa0
  [<c017babb>] ? trace_hardirqs_on+0xb/0x10
  [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
  [<c0102849>] do_notify_resume+0xb9/0x890
  [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
  [<c017b163>] ? lock_release_holdtime+0x33/0x210
  [<c0165b50>] ? autoremove_wake_function+0x0/0x50
  [<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190
  [<c017babb>] ? trace_hardirqs_on+0xb/0x10
  [<c0300ba4>] ? security_file_permission+0x14/0x20
  [<c0215761>] ? vfs_write+0x131/0x190
  [<c0214f50>] ? do_sync_write+0x0/0x120
  [<c0103115>] ? sysenter_do_call+0x27/0x32
  [<c01032d2>] work_notifysig+0x13/0x21

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:12 +01:00
Jan Kara
82fdfa928c quota: Fix 64-bit limits setting on 32-bit archs
Fix warnings:
fs/quota/quota_v2.c: In function ‘v2_read_file_info’:
fs/quota/quota_v2.c:123: warning: integer constant is too large for ‘long’ type
fs/quota/quota_v2.c:124: warning: integer constant is too large for ‘long’ type

Reported-by: Jerry Leo <jerryleo860202@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:12 +01:00
Eric Sandeen
96d2a495c2 ext3: Replace lock/unlock_super() with an explicit lock for resizing
Use a separate lock to protect s_groups_count and the other block
group descriptors which get changed via an on-line resize operation,
so we can stop overloading the use of lock_super().

Port of ext4 commit 32ed5058ce by
Theodore Ts'o <tytso@mit.edu>.

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:12 +01:00
Eric Sandeen
b8a052d016 ext3: Replace lock/unlock_super() with an explicit lock for the orphan list
Use a separate lock to protect the orphan list, so we can stop
overloading the use of lock_super().

Port of ext4 commit 3b9d4ed266
by Theodore Ts'o <tytso@mit.edu>.

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:11 +01:00
Eric Sandeen
4854a5f0cb ext3: ext3_mark_recovery_complete() doesn't need to use lock_super
The function ext3_mark_recovery_complete() is called from two call
paths: either (a) while mounting the filesystem, in which case there's
no danger of any other CPU calling write_super() until the mount is
completed, and (b) while remounting the filesystem read-write, in
which case the fs core has already locked the superblock.  This also
allows us to take out a very vile unlock_super()/lock_super() pair in
ext3_remount().

Port of ext4 commit a63c9eb2ce by
Theodore Ts'o <tytso@mit.edu>.

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:44:11 +01:00
Eric Sandeen
ed505ee454 ext3: Remove outdated comment about lock_super()
ext3_fill_super() is no longer called by read_super(), and it is no
longer called with the superblock locked.  The
unlock_super()/lock_super() is no longer present, so this comment is
entirely superfluous.

Port of ext4 commit 32ed5058ce by
Theodore Ts'o <tytso@mit.edu>.

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:43:50 +01:00
Dmitry Monakhov
dc52dd3a3a quota: Move duplicated code to separate functions
- for(..) { mark_dquot_dirty(); } -> mark_all_dquot_dirty()
- for(..) { dput(); }             -> dqput_all()

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23 13:33:55 +01:00