linux-pinenote

Author	SHA1	Message	Date
Christoph Hellwig	fc0063c447	xfs: reduce ioend latency There is no reason to queue up ioends for processing in user context unless we actually need it. Just complete ioends that do not convert unwritten extents or need a size update from the end_io context. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:15:00 -05:00
Christoph Hellwig	c859cdd1da	xfs: defer AIO/DIO completions We really shouldn't complete AIO or DIO requests until we have finished the unwritten extent conversion and size update. This means fsync never has to pick up any ioends as all work has been completed when signalling I/O completion. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:15:00 -05:00
Christoph Hellwig	398d25ef23	xfs: remove dead ENODEV handling in xfs_destroy_ioend No driver returns ENODEV from it bio completion handler, not has this ever been documented. Remove the dead code dealing with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:15:00 -05:00
Christoph Hellwig	c4e1c098ee	xfs: use the "delwri" terminology consistently And also remove the strange local lock and delwri list pointers in a few functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:15:00 -05:00
Christoph Hellwig	c2b006c1da	xfs: let xfs_bwrite callers handle the xfs_buf_relse Remove the xfs_buf_relse from xfs_bwrite and let the caller handle it to mirror the delwri and read paths. Also remove the mount pointer passed to xfs_bwrite, which is superflous now that we have a mount pointer in the buftarg. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:15:00 -05:00
Christoph Hellwig	61551f1ee5	xfs: call xfs_buf_delwri_queue directly Unify the ways we add buffers to the delwri queue by always calling xfs_buf_delwri_queue directly. The xfs_bdwrite functions is removed and opencoded in its callers, and the two places setting XBF_DELWRI while a buffer is locked and expecting xfs_buf_unlock to pick it up are converted to call xfs_buf_delwri_queue directly, too. Also replace the XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue to make the explicit queuing/dequeuing more obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:14:59 -05:00
Christoph Hellwig	5a8ee6bafd	xfs: move more delwri setup into xfs_buf_delwri_queue Do not transfer a reference held by the caller to the buffer on the list, or decrement it in xfs_buf_delwri_queue, but instead grab a new reference if needed, and let the caller drop its own reference. Also move setting of the XBF_DELWRI and XBF_ASYNC flags into xfs_buf_delwri_queue, and only do it if needed. Note that for now xfs_buf_unlock already has XBF_DELWRI, but that will change in the following patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:14:59 -05:00
Christoph Hellwig	527cfdf19d	xfs: remove the unlock argument to xfs_buf_delwri_queue We can just unlock the buffer in the caller, and the decrement of b_hold would also be needed in the !unlock, we just never hit that case currently given that the caller handles that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:14:59 -05:00
Christoph Hellwig	375ec69d2e	xfs: remove delwri buffer handling from xfs_buf_iorequest We cannot ever reach xfs_buf_iorequest for a buffer with XBF_DELWRI set, given that all write handlers make sure that the buffer is remove from the delwri queue before, and we never do reads with the XBF_DELWRI flag set (which the code would not handle correctly anyway). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:14:59 -05:00
Dave Chinner	7271d243f9	xfs: don't serialise adjacent concurrent direct IO appending writes For append write workloads, extending the file requires a certain amount of exclusive locking to be done up front to ensure sanity in things like ensuring that we've zeroed any allocated regions between the old EOF and the start of the new IO. For single threads, this typically isn't a problem, and for large IOs we don't serialise enough for it to be a problem for two threads on really fast block devices. However for smaller IO and larger thread counts we have a problem. Take 4 concurrent sequential, single block sized and aligned IOs. After the first IO is submitted but before it completes, we end up with this state: IO 1 IO 2 IO 3 IO 4 +-------+-------+-------+-------+ ^ ^ \| \| \| \| \| \| \| \- ip->i_new_size \- ip->i_size And the IO is done without exclusive locking because offset <= ip->i_size. When we submit IO 2, we see offset > ip->i_size, and grab the IO lock exclusive, because there is a chance we need to do EOF zeroing. However, there is already an IO in progress that avoids the need for IO zeroing because offset <= ip->i_new_size. hence we could avoid holding the IO lock exlcusive for this. Hence after submission of the second IO, we'd end up this state: IO 1 IO 2 IO 3 IO 4 +-------+-------+-------+-------+ ^ ^ \| \| \| \| \| \| \| \- ip->i_new_size \- ip->i_size There is no need to grab the i_mutex of the IO lock in exclusive mode if we don't need to invalidate the page cache. Taking these locks on every direct IO effective serialises them as taking the IO lock in exclusive mode has to wait for all shared holders to drop the lock. That only happens when IO is complete, so effective it prevents dispatch of concurrent direct IO writes to the same inode. And so you can see that for the third concurrent IO, we'd avoid exclusive locking for the same reason we avoided the exclusive lock for the second IO. Fixing this is a bit more complex than that, because we need to hold a write-submission local value of ip->i_new_size to that clearing the value is only done if no other thread has updated it before our IO completes..... Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:14:59 -05:00
Dave Chinner	0c38a2512d	xfs: don't serialise direct IO reads on page cache checks There is no need to grab the i_mutex of the IO lock in exclusive mode if we don't need to invalidate the page cache. Taking these locks on every direct IO effective serialises them as taking the IO lock in exclusive mode has to wait for all shared holders to drop the lock. That only happens when IO is complete, so effective it prevents dispatch of concurrent direct IO reads to the same inode. Fix this by taking the IO lock shared to check the page cache state, and only then drop it and take the IO lock exclusively if there is work to be done. Hence for the normal direct IO case, no exclusive locking will occur. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Joern Engel <joern@logfs.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:14:59 -05:00
Sachin Prabhu	875cd04381	cifs: Display strictcache mount option in /proc/mounts Commit `d39454ffe4` adds a strictcache mount option. This patch allows the display of this mount option in /proc/mounts when listing shares mounted with the strictcache mount option. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2011-10-11 13:13:18 -05:00
J. Bruce Fields	b6d2f1ca3c	nfsd4: more robust ignoring of WANT bits in OPEN Mask out the WANT bits right at the start instead of on each use. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-10-11 12:15:15 -04:00
J. Bruce Fields	a084daf512	nfsd4: move name-length checks to xdr Again, these checks are better in the xdr code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-10-11 12:15:01 -04:00
Christoph Hellwig	0030807c66	xfs: revert to using a kthread for AIL pushing Currently we have a few issues with the way the workqueue code is used to implement AIL pushing: - it accidentally uses the same workqueue as the syncer action, and thus can be prevented from running if there are enough sync actions active in the system. - it doesn't use the HIGHPRI flag to queue at the head of the queue of work items At this point I'm not confident enough in getting all the workqueue flags and tweaks right to provide a perfectly reliable execution context for AIL pushing, which is the most important piece in XFS to make forward progress when the log fills. Revert back to use a kthread per filesystem which fixes all the above issues at the cost of having a task struct and stack around for each mounted filesystem. In addition this also gives us much better ways to diagnose any issues involving hung AIL pushing and removes a small amount of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Tested-by: Stefan Priebe <s.priebe@profihost.ag> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 11:02:49 -05:00
Christoph Hellwig	17b38471c3	xfs: force the log if we encounter pinned buffers in .iop_pushbuf We need to check for pinned buffers even in .iop_pushbuf given that inode items flush into the same buffers that may be pinned directly due operations on the unlinked inode list operating directly on buffers. To do this add a return value to .iop_pushbuf that tells the AIL push about this and use the existing log force mechanisms to unpin it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Tested-by: Stefan Priebe <s.priebe@profihost.ag> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 11:02:48 -05:00
Christoph Hellwig	bc6e588a89	xfs: do not update xa_last_pushed_lsn for locked items If an item was locked we should not update xa_last_pushed_lsn and thus skip it when restarting the AIL scan as we need to be able to lock and write it out as soon as possible. Otherwise heavy lock contention might starve AIL pushing too easily, especially given the larger backoff once we moved xa_last_pushed_lsn all the way to the target lsn. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Tested-by: Stefan Priebe <s.priebe@profihost.ag> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 11:02:48 -05:00
Chris Mason	f7f43cc841	Btrfs: make sure not to defrag extents past i_size The btrfs file defrag code will loop through the extents and force COW on them. But there is a concurrent truncate in the middle of the defrag, it might end up defragging the same range over and over again. The problem is that writepage won't go through and do anything on pages past i_size, so the cow won't happen, so the file will appear to still be fragmented. defrag will end up hitting the same extents again and again. In the worst case, the truncate can actually live lock with the defrag because the defrag keeps creating new ordered extents which the truncate code keeps waiting on. The fix here is to make defrag check for i_size inside the main loop, instead of just once before the looping starts. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-10-11 11:45:55 -04:00
J. Bruce Fields	04f9e664b2	nfsd4: move access/deny validity checks to xdr code I'd rather put more of these sorts of checks into standardized xdr decoders for the various types rather than have them cluttering up the core logic in nfs4proc.c and nfs4state.c. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-10-11 08:53:12 -04:00
J. Bruce Fields	c30e92df30	nfsd4: ignore WANT bits in open downgrade We don't use WANT bits yet--and sending them can probably trigger a BUG() further down. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-10-10 18:05:20 -04:00
J. Bruce Fields	b31b30e5c7	nfsd4: cleanup state.h comments These comments are mostly out of date. Reported-by: Bryan Schumaker <bjschuma@netapp.com>	2011-10-10 18:04:46 -04:00
J. Bruce Fields	6409a5a65d	nfsd4: clean up downgrading code In response to some review comments, get rid of the somewhat obscure for-loop with bitops, and improve a comment. Reported-by: Steve Dickson <steved@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-10-10 18:04:45 -04:00
J. Bruce Fields	71c3bcd713	nfsd4: fix state lock usage in LOCKU In commit `5ec094c109` "nfsd4: extend state lock over seqid replay logic" I modified the exit logic of all the seqid-based procedures except nfsd4_locku(). Fix the oversight. The result of the bug was a double-unlock while handling the LOCKU procedure, and a warning like: [ 142.150014] WARNING: at kernel/mutex-debug.c:78 debug_mutex_unlock+0xda/0xe0() ... [ 142.152927] Pid: 742, comm: nfsd Not tainted 3.1.0-rc1-SLIM+ #9 [ 142.152927] Call Trace: [ 142.152927] [<ffffffff8105fa4f>] warn_slowpath_common+0x7f/0xc0 [ 142.152927] [<ffffffff8105faaa>] warn_slowpath_null+0x1a/0x20 [ 142.152927] [<ffffffff810960ca>] debug_mutex_unlock+0xda/0xe0 [ 142.152927] [<ffffffff813e4200>] __mutex_unlock_slowpath+0x80/0x140 [ 142.152927] [<ffffffff813e42ce>] mutex_unlock+0xe/0x10 [ 142.152927] [<ffffffffa03bd3f5>] nfs4_lock_state+0x35/0x40 [nfsd] [ 142.152927] [<ffffffffa03b0b71>] nfsd4_proc_compound+0x2a1/0x690 [nfsd] [ 142.152927] [<ffffffffa039f9fb>] nfsd_dispatch+0xeb/0x230 [nfsd] [ 142.152927] [<ffffffffa02b1055>] svc_process_common+0x345/0x690 [sunrpc] [ 142.152927] [<ffffffff81058d10>] ? try_to_wake_up+0x280/0x280 [ 142.152927] [<ffffffffa02b16e2>] svc_process+0x102/0x150 [sunrpc] [ 142.152927] [<ffffffffa039f0bd>] nfsd+0xbd/0x160 [nfsd] [ 142.152927] [<ffffffffa039f000>] ? 0xffffffffa039efff [ 142.152927] [<ffffffff8108230c>] kthread+0x8c/0xa0 [ 142.152927] [<ffffffff813e8694>] kernel_thread_helper+0x4/0x10 [ 142.152927] [<ffffffff81082280>] ? kthread_worker_fn+0x190/0x190 [ 142.152927] [<ffffffff813e8690>] ? gs_change+0x13/0x13 Reported-by: Bryan Schumaker <bjschuma@netapp.com> Tested-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-10-10 18:04:45 -04:00
Li Zefan	2a0f7f5769	Btrfs: fix recursive auto-defrag Follow those steps: # mount -o autodefrag /dev/sda7 /mnt # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1 # sync # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc and then it'll go into a loop: writeback -> defrag -> writeback ... It's because writeback writes [8K, 200K] and then writes [0, 8K]. I tried to make writeback know if the pages are dirtied by defrag, but the patch was a bit intrusive. Here I simply set writeback_index when we defrag a file. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-10-10 15:43:34 -04:00
Joe Perches	a40ecd7b3c	udf: Rename udf_warning to udf_warn Rename udf_warning to udf_warn for consistency with normal logging uses of pr_warn. Rename function udf_warning to _udf_warn. Remove __func__ from uses and move __func__ to a new udf_warn macro that calls _udf_warn. Add \n's to uses of udf_warn, remove \n from _udf_warn. Coalesce formats. Reviewed-by: NamJae Jeon <linkinjeon@gmail.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-10-10 19:29:01 +02:00
Joe Perches	8076c363da	udf: Rename udf_error to udf_err Rename udf_error to udf_err for consistency with normal logging uses of pr_err. Rename function udf_err to _udf_err. Remove __func__ from uses and move __func__ to a new udf_err macro that calls _udf_err. Some of the udf_error uses had \n terminations, some did not so standardize \n's to udf_err uses, remove \n from _udf_err function. Coalesce udf_err formats. One message prefixed with udf_read_super is now prefixed with udf_fill_super. Reviewed-by: NamJae Jeon <linkinjeon@gmail.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-10-10 19:29:01 +02:00
Joe Perches	7e273e3b41	udf: Promote some debugging messages to udf_error If there is a problem with a scratched disc or loader, it's valuable to know which error occurred. Convert some debug messages to udf_error, neaten those messages too. Add the calculated tag checksum and the read checksum to error message. Make udf_error a public function and move the logging prototypes together. Original-patch-by: NamJae Jeon <linkinjeon@gmail.com> Reviewed-by: NamJae Jeon <linkinjeon@gmail.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-10-10 19:26:24 +02:00
Tao Ma	40bfa16dac	ext3: Remove the obsolete broken EXT3_IOC32_WAIT_FOR_READONLY. There are no user of EXT3_IOC32_WAIT_FOR_READONLY and also it is broken. No one set the set_ro_timer, no one wake up us and our state is set to TASK_INTERRUPTIBLE not RUNNING. So remove it. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-10-10 18:25:59 +02:00
Linus Torvalds	65112dccf8	Merge git://git.samba.org/sfrench/cifs-2.6 * git://git.samba.org/sfrench/cifs-2.6: [CIFS] Fix first time message on mount, ntlmv2 upgrade delayed to 3.2	2011-10-10 14:53:11 +12:00
Fabrice Jouhaud	d44651d0f9	ext4: fix ext4 so it works without CONFIG_PROC_FS This fixes a bug which was introduced in `dd68314ccf`. The problem came from the test of the return value of proc_mkdir which is always false without procfs, and this would initialization of ext4. Signed-off-by: Fabrice Jouhaud <yargil@free.fr> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-10-08 16:26:03 -04:00
Tao Ma	6ee3b21224	ext4: use le32_to_cpu for ext4_extent_idx.ei_block in ext4_ext_search_left() ext4_extent_idx.e_block is __le32, so use le32_to_cpu() in ext4_ext_search_left(). Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-10-08 16:08:34 -04:00
Tao Ma	7fd59c83b0	ext4: remove the obsolete/broken EXT4_IOC_WAIT_FOR_READONLY ioctl There are no users of the EXT4_IOC_WAIT_FOR_READONLY ioctl, and it is also broken. No one sets the set_ro_timer, no one wakes up us and our state is set to TASK_INTERRUPTIBLE not RUNNING. So remove it. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-10-08 15:56:35 -04:00
Tao Ma	df3ab17072	ext4: fix the comment describing ext4_ext_search_right() The comment describing what ext4_ext_search_right() does is incorrect. We return 0 in phys when logical is the 'largest' allocated block, not smallest. Fix a few other typos while we're at it. Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Tao Ma <boyu.mt@taobao.com>	2011-10-08 15:53:49 -04:00
Lukas Czerner	4113c4caa4	ext4: remove deprecated oldalloc For a long time now orlov is the default block allocator in the ext4. It performs better than the old one and no one seems to claim otherwise so we can safely drop it and make oldalloc and orlov mount option deprecated. This is a part of the effort to reduce number of ext4 options hence the test matrix. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-10-08 14:34:47 -04:00
Steve French	9d1e397b7b	[CIFS] Fix first time message on mount, ntlmv2 upgrade delayed to 3.2 Microsoft has a bug with ntlmv2 that requires use of ntlmssp, but we didn't get the required information on when/how to use ntlmssp to old (but once very popular) legacy servers (various NT4 fixpacks for example) until too late to merge for 3.1. Will upgrade to NTLMv2 in NTLMSSP in 3.2 Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2011-10-07 20:17:56 -05:00
Tao Ma	dcf2d804ed	ext4: Free resources in some error path in ext4_fill_super Some of the error path in ext4_fill_super don't release the resouces properly. So this patch just try to release them in the right way. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-10-06 12:10:11 -04:00
Tao Ma	7aa0baeaba	ext4: Free resources in ext4_mb_init()'s error paths In commit `79a77c5ac`, we move ext4_mb_init_backend after the allocation of s_locality_group to avoid memory leak in error path, but there are still some other error paths in ext4_mb_init that need to do the same work. So this patch adds all the error patch for ext4_mb_init. And all the pointers are reset to NULL in case the caller may double free them. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-10-06 10:22:28 -04:00
Namjae Jeon	bc1123239a	udf: Add readpages support for udf. Use mpage_readpages() instead of multiple calls to udf_readpage() to reduce the CPU utilization and make performance higher. Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-10-06 00:13:48 +02:00
H Hartley Sweeten	3ee77f2091	ext3/balloc.c: local functions should be static This quites the sparse noise: warning: symbol 'ext3_trim_all_free' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz>	2011-10-05 01:29:22 +02:00
Boaz Harrosh	d866d875f6	ore/exofs: Change the type of the devices array (API change) In the pNFS obj-LD the device table at the layout level needs to point to a device_cache node, where it is possible and likely that many layouts will point to the same device-nodes. In Exofs we have a more orderly structure where we have a single array of devices that repeats twice for a round-robin view of the device table This patch moves to a model that can be used by the pNFS obj-LD where struct ore_components holds an array of ore_dev-pointers. (ore_dev is newly defined and contains a struct osd_dev *od member) Each pointer in the array of pointers will point to a bigger user-defined dev_struct. That can be accessed by use of the container_of macro. In Exofs an __alloc_dev_table() function allocates the ore_dev-pointers array as well as an exofs_dev array, in one allocation and does the addresses dance to set everything pointing correctly. It still keeps the double allocation trick for the inodes round-robin view of the table. The device table is always allocated dynamically, also for the single device case. So it is unconditionally freed at umount. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-10-04 12:13:59 +02:00
Linus Torvalds	7fd21be75d	Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux * 'btrfs-3.0' of git://github.com/chrismason/linux: Btrfs: force a page fault if we have a shorty copy on a page boundary	2011-10-03 12:17:44 -07:00
Boaz Harrosh	eb507bc189	ore: Make ore_striping_info and ore_calc_stripe_info public The struct ore_striping_info will be used later in other structures. And ore_calc_stripe_info as well. Rename them make struct ore_striping_info public. ore_calc_stripe_info is still static, will be made public on first use. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-10-03 17:07:51 +02:00
Boaz Harrosh	8d2d83a835	exofs: Remove unused data_map member from exofs_sb_info The struct pnfs_osd_data_map data_map member of exofs_sb_info was never used after mount. In fact all it's members were duplicated by the ore_layout structure. So just remove the duplicated information. Also removed some stupid, but perfectly supported, restrictions on layout parameters. The case where num_devices is not divisible by mirror_count+1 is perfectly fine since the rotating device view will eventually use all the devices it can get. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com>	2011-10-03 17:07:51 +02:00
Boaz Harrosh	5bf696dad4	exofs: Rename struct ore_components comps => oc ore_components already has a comps member so this leads to things like comps->comps which is annoying. the name oc was already used in new code. So rename all old usage of ore_components comps => ore_components oc. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-10-03 17:07:50 +02:00
H Hartley Sweeten	de74b05ace	exofs/super.c: local functions should be static This quiets the following sparse noise: warning: symbol 'exofs_sync_fs' was not declared. Should it be static? warning: symbol 'exofs_free_sbi' was not declared. Should it be static? warning: symbol 'exofs_get_parent' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-10-03 17:07:29 +02:00
H Hartley Sweeten	1958c7c284	exofs/ore.c: local functions should be static This quiets the sparse noise: warning: symbol '_calc_trunk_info' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-10-03 17:06:47 +02:00
Wu Fengguang	b00949aa2d	writeback: per-bdi background threshold One thing puzzled me is that in JBOD case, the per-disk writeout performance is smaller than the corresponding single-disk case even when they have comparable bdi_thresh. Tracing shows find that in single disk case, bdi_writeback is always kept high while in JBOD case, it could drop low from time to time and correspondingly bdi_reclaimable could sometimes rush high. The fix is to watch bdi_reclaimable and kick background writeback as soon as it goes high. This resembles the global background threshold but in per-bdi manner. The trick is, as long as bdi_reclaimable does not go high, bdi_writeback naturally won't go low because bdi_reclaimable+bdi_writeback ~= bdi_thresh. With less fluctuated writeback pages, JBOD performance is observed to increase noticeably in various cases. vmstat:nr_written values before/after patch: 3.1.0-rc4-wo-underrun+ 3.1.0-rc4-bgthresh3+ ------------------------ ------------------------ 125596480 +25.9% 158179363 JBOD-10HDD-16G/ext4-100dd-1M-24p-16384M-20:10-X 61790815 +110.4% 130032231 JBOD-10HDD-16G/ext4-10dd-1M-24p-16384M-20:10-X 58853546 -0.1% 58823828 JBOD-10HDD-16G/ext4-1dd-1M-24p-16384M-20:10-X 110159811 +24.7% 137355377 JBOD-10HDD-16G/xfs-100dd-1M-24p-16384M-20:10-X 69544762 +10.8% 77080047 JBOD-10HDD-16G/xfs-10dd-1M-24p-16384M-20:10-X 50644862 +0.5% 50890006 JBOD-10HDD-16G/xfs-1dd-1M-24p-16384M-20:10-X 42677090 +28.0% 54643527 JBOD-10HDD-thresh=100M/ext4-100dd-1M-24p-16384M-100M:10-X 47491324 +13.3% 53785605 JBOD-10HDD-thresh=100M/ext4-10dd-1M-24p-16384M-100M:10-X 52548986 +0.9% 53001031 JBOD-10HDD-thresh=100M/ext4-1dd-1M-24p-16384M-100M:10-X 26783091 +36.8% 36650248 JBOD-10HDD-thresh=100M/xfs-100dd-1M-24p-16384M-100M:10-X 35526347 +14.0% 40492312 JBOD-10HDD-thresh=100M/xfs-10dd-1M-24p-16384M-100M:10-X 44670723 -1.1% `44177606` JBOD-10HDD-thresh=100M/xfs-1dd-1M-24p-16384M-100M:10-X 127996037 +22.4% 156719990 JBOD-10HDD-thresh=2G/ext4-100dd-1M-24p-16384M-2048M:10-X 57518856 +3.8% 59677625 JBOD-10HDD-thresh=2G/ext4-10dd-1M-24p-16384M-2048M:10-X 51919909 +12.2% 58269894 JBOD-10HDD-thresh=2G/ext4-1dd-1M-24p-16384M-2048M:10-X 86410514 +79.0% 154660433 JBOD-10HDD-thresh=2G/xfs-100dd-1M-24p-16384M-2048M:10-X 40132519 +38.6% 55617893 JBOD-10HDD-thresh=2G/xfs-10dd-1M-24p-16384M-2048M:10-X 48423248 +7.5% 52042927 JBOD-10HDD-thresh=2G/xfs-1dd-1M-24p-16384M-2048M:10-X 206041046 +44.1% 296846536 JBOD-10HDD-thresh=4G/xfs-100dd-1M-24p-16384M-4096M:10-X 72312903 -19.4% 58272885 JBOD-10HDD-thresh=4G/xfs-10dd-1M-24p-16384M-4096M:10-X 50635672 -0.5% 50384787 JBOD-10HDD-thresh=4G/xfs-1dd-1M-24p-16384M-4096M:10-X 68308534 +115.7% 147324758 JBOD-10HDD-thresh=800M/ext4-100dd-1M-24p-16384M-800M:10-X 57882933 +14.5% 66269621 JBOD-10HDD-thresh=800M/ext4-10dd-1M-24p-16384M-800M:10-X 52183472 +12.8% 58855181 JBOD-10HDD-thresh=800M/ext4-1dd-1M-24p-16384M-800M:10-X 53788956 +94.2% 104460352 JBOD-10HDD-thresh=800M/xfs-100dd-1M-24p-16384M-800M:10-X 44493342 +35.5% 60298210 JBOD-10HDD-thresh=800M/xfs-10dd-1M-24p-16384M-800M:10-X 42641209 +18.9% 50681038 JBOD-10HDD-thresh=800M/xfs-1dd-1M-24p-16384M-800M:10-X Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:58 +08:00
Wu Fengguang	af6a311384	writeback: add bg_threshold parameter to __bdi_update_bandwidth() No behavior change. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:56 +08:00
Arne Jansen	7a26285eea	btrfs: use readahead API for scrub Scrub uses a simple tree-enumeration to bring the relevant portions of the extent- and csum-tree into the page cache before starting the scrub-I/O. This is now replaced by using the new readahead-API. During readahead the scrub is being accounted as paused, so it won't hold off transaction commits. This change raises the average disk bandwith utilisation on my test volume from 70% to 90%. On another volume, the time for a test run went down from 89s to 43s. Changes v5: - reada1/2 are now of type struct reada_control * Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-10-02 08:48:45 +02:00
Arne Jansen	4bb31e928d	btrfs: hooks for readahead This adds the hooks needed for readahead. In the readpage_end_io_hook, the extent state is checked for the EXTENT_READAHEAD flag. Only in this case the readahead hook is called, to keep the impact on non-ra as low as possible. Additionally, a hook for a failed IO is added, otherwise readahead would wait indefinitely for the extent to finish. Changes for v2: - eliminate race condition Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-10-02 08:48:44 +02:00

... 24 25 26 27 28 ...

25,556 commits