Commit graph

71391 commits

Author SHA1 Message Date
Jens Axboe
51cf22495a aha1542: convert to use the data buffer accessors
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:20:59 +02:00
Jens Axboe
d274a9878b ide-scsi: sg chaining support
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:20:52 +02:00
Jens Axboe
2f08fe5221 qlogicpti: sg chaining support
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:14:19 +02:00
Jens Axboe
8145bfe463 aic94xx: sg chaining support
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:14:18 +02:00
Jens Axboe
a044189137 qla1280: sg chaining support
Interesting hardware setup...

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:14:18 +02:00
Jens Axboe
b0f655d0ef scsi generic: sg chaining support
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:14:18 +02:00
Jens Axboe
852e034de7 scsi_debug: support sg chaining
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:14:18 +02:00
Jens Axboe
8726021626 libata: convert to using sg helpers
This converts libata to using the sg helpers for looking up sg
elements, instead of doing it manually.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:14:12 +02:00
Jens Axboe
a8474ce23a SCSI: support for allocating large scatterlists
This is what enables large commands. If we need to allocate an
sgtable that doesn't fit in a single page, allocate several
SCSI_MAX_SG_SEGMENTS sized tables and chain them together.

SCSI defaults to large chained sg tables, if the arch supports it.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:12:53 +02:00
Jens Axboe
0cde8d9510 scsi: simplify scsi_free_sgtable()
Just pass in the command, no point in passing in the scatterlist
and scatterlist pool index seperately.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:12:37 +02:00
saeed bishara
4c2f6d4c28 use sg helper function in DMA mapping documentation
Signed-off-by: Saeed Bishara <saeed.bishara@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:08:54 +02:00
Jens Axboe
563063a808 ll_rw_blk: temporarily enable max_segments tweaking
Expose this setting for now, so that users can play with enabling
large commands without defaulting it to on globally. This is a debug
patch, it will be dropped for the final versions.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:08:53 +02:00
Jens Axboe
70eb8040dc Add chained sg support to linux/scatterlist.h
The core of the patch - allow the last sg element in a scatterlist
table to point to the start of a new table. We overload the LSB of
the page pointer to indicate whether this is a valid sg entry, or
merely a link to the next list.

Includes a fix from Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
correcting the ifdef ARCH_HAS_SG_CHAIN guarding sg_last().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:08:51 +02:00
Jens Axboe
c6132da170 scsi: convert to using sg helpers
This converts the SCSI mid layer to using the sg helpers for looking up
sg elements, instead of doing it manually.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:08:49 +02:00
Jens Axboe
f565913ef8 block: convert to using sg helpers
Convert the main rq mapper (blk_rq_map_sg()) to the sg helper setup.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:07:11 +02:00
Jens Axboe
96b418c960 Add sg helpers for iterating over a scatterlist table
First step to being able to change the scatterlist setup without
having to modify drivers (a lot :-)

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:07:10 +02:00
Jens Axboe
ab83407e9e crypto: don't pollute the global namespace with sg_next()
It's a subsystem function, prefix it as such.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:07:09 +02:00
Laurent Riffard
7e3da6c4b9 pktcdvd: don't rely on bio_init() preserving bio->bi_destructor
Signed-off-by: Laurent Riffard <laurent.riffard@free.fr>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:05:09 +02:00
Jens Axboe
761a15e7ac pktcdvd: don't rely on bio_init() preserving bio->bi_io_vec
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:05:08 +02:00
Adrian Bunk
bb879463b5 remove ide_get_error_location()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:05:06 +02:00
Jens Axboe
fd5d806266 block: convert blkdev_issue_flush() to use empty barriers
Then we can get rid of ->issue_flush_fn() and all the driver private
implementations of that.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:05:02 +02:00
Jens Axboe
bf2de6f5a4 block: Initial support for data-less (or empty) barrier support
This implements functionality to pass down or insert a barrier
in a queue, without having data attached to it. The ->prepare_flush_fn()
infrastructure from data barriers are reused to provide this
functionality.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:03:56 +02:00
Jens Axboe
c07e2b4129 block: factor our bio_check_eod()
End of device check is done twice in __generic_make_request() and it's
fully inlined each time.  Factor out bio_check_eod().

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:03:55 +02:00
Jens Axboe
a0cd128542 block: add end_queued_request() and end_dequeued_request() helpers
We can use this helper in the elevator core for BLKPREP_KILL, and it'll
also be useful for the empty barrier patch.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:03:53 +02:00
Jens Axboe
992c5ddaf1 bio: make freeing of ->bi_io_vec conditional in bio_free()
The empty barrier patches do not carry data, so they have no
iovec attached.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:03:52 +02:00
Jens Axboe
2b94de552e bio: use memset() in bio_init()
Use memset() to clear the bio, instead of doing each field manually.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:03:51 +02:00
Jens Axboe
4fa253f33c block: ll_rw_blk.c: cosmetics
Fix ?: construct, a typo, whitespace, and similar.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:03:49 +02:00
Rob Landley
8b6800fbce Add Documentation/block/00-INDEX
Add Documentation/block/00-INDEX

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 10:11:28 +02:00
Jens Axboe
6866bef40d splice: fix double kunmap() in vmsplice copy path
The out label should not include the unmap, the only way to jump
there already has unmapped the source.

00002000
       f7c21a00 00000000 00000000 c0489036 00018e32 00000002 00000000
00001000
Call Trace:
 [<c0487dd9>] pipe_to_user+0xca/0xd3
 [<c0488233>] __splice_from_pipe+0x53/0x1bd
 [<c0454947>] ------------[ cut here ]------------
filemap_fault+0x221/0x380
 [<c0487d0f>] pipe_to_user+0x0/0xd3
 [<c0489036>] sys_vmsplice+0x3b7/0x422
 [<c045ec3f>] kernel BUG at mm/highmem.c:206!
handle_mm_fault+0x4d5/0x8eb
 [<c041ed5b>] kmap_atomic+0x1c/0x20
 [<c045d33d>] unmap_vmas+0x3d1/0x584
 [<c045f717>] free_pgtables+0x90/0xa0
 [<c041d84b>] pgd_dtor+0x0/0x1
 [<c044d665>] audit_syscall_exit+0x2aa/0x2c6
 [<c0407817>] do_syscall_trace+0x124/0x169
 [<c0404df2>] syscall_call+0x7/0xb
 =======================
Code: 2d 00 d0 5b 00 25 00 00 e0 ff 29 invalid opcode: 0000 [#1]
c2 89 d0 c1 e8 0c 8b 14 85 a0 6c 7c c0 4a 85 d2 89 14 85 a0 6c 7c c0 74 07
31 c9 4a 75 15 eb 04 <0f> 0b eb fe 31 c9 81 3d 78 38 6d c0 78 38 6d c0 0f
95 c1 b0 01
EIP: [<c045bbc3>] kunmap_high+0x51/0x8e SS:ESP 0068:f5960df0
SMP
Modules linked in: netconsole autofs4 hidp nfs lockd nfs_acl rfcomm l2cap
bluetooth sunrpc ipv6 ib_iser rdma_cm ib_cm iw_cmib_sa ib_mad ib_core
ib_addr iscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath
dm_mod video output sbs batteryac parport_pc lp parport sg i2c_piix4
i2c_core floppy cfi_probe gen_probe scb2_flash mtd chipreg tg3 e1000 button
ide_cd serio_raw cdrom aic7xxx scsi_transport_spi sd_mod scsi_mod ext3 jbd
ehci_hcd ohci_hcd uhci_hcd
CPU:    3
EIP:    0060:[<c045bbc3>]    Not tainted VLI
EFLAGS: 00010246   (2.6.23 #1)
EIP is at kunmap_high+0x51/0x8e

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 10:01:29 +02:00
Alan D. Brunelle
23c76983e2 Some IO scheduler cleanup in Documentation/block
as-iosched.txt:
  o  Changed IO scheduler selection text to a reference to the
     switching-sched.txt file.

  o  Fixed typo: 'for up time...' -> 'for up to...'

  o  Added short description of the est_time file.

deadline-iosched.txt:
  o  Changed IO scheduler selection text to a reference to the
     switching-sched.txt file.

  o  Removed references to non-existent seek-cost and stream_unit.

  o  Fixed typo: 'write_starved' -> 'writes_starved'

switching-sched.txt:
  o  Added in boot-time argument to set the default IO scheduler. (From
     as-iosched.txt)

  o  Added in sysfs mount instructions. (From deadline-iosched.txt)

Signed-off-by: Alan D. Brunelle <Alan.Brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 09:59:55 +02:00
Rob Landley
26bbb29a2a Update Jens Axboe's email in Documentation/*
Jens Axboe's old email address bounces.

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 09:59:55 +02:00
Jeff Garzik
87ad900164 drivers/block/cpqarray,cciss: kill unused var
The recent bio work and subsequent fixups created unused variables.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 09:59:55 +02:00
Arjan van de Ven
7344be053a bsg: mark struct file_operations const
struct file_operations is generally const (to avoid false sharing and get compile time errors on accidental writing to this shared structure); bsg recently added one of these without the const keyword. Patch below marks it const....

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 09:59:54 +02:00
Tim Shimmin
150f29ef2e [XFS] no longer using io_vnode, as was remaining from 23 cherrypick
Because we cherrypicked SGI-Modid xfs-linux-melb:xfs-kern:29675a
and it depended on the sgi mod which removed io_vnode (which was
not cherrypicked in 23) it was hand modified.
This fixes things back up (to the originial mod) now we have moved
on again.

Reviewed-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 16:20:12 +10:00
Tim Shimmin
479ba36bbb [XFS] Remove STATIC which was missing from prior manual merge
Removes STATIC on xfs_freeze function which was not manually
applied for SGI-Modid: xfs-linux-melb:xfs-kern:29504a.

Reviewed-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 15:32:57 +10:00
Tim Shimmin
cd514bdaa8 [XFS] Put back the QUEUE_ORDERED_NONE test in the barrier check.
Put back the QUEUE_ORDERED_NONE test which caused us grief in sles when it
was taken out as, IIRC, it allowed md/lvm to be thought of as supporting
barriers when they weren't in some configurations. This patch will be
reverting what went in as part of a change for the SGI-pv 964544
(SGI-Modid: xfs-linux-melb:xfs-kern:28568a).

SGI-PV: 971783
SGI-Modid: xfs-linux-melb:xfs-kern:29882a

Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
2007-10-16 14:23:21 +10:00
Lachlan McIlroy
bebf963fec [XFS] Turn off XBF_ASYNC flag before re-reading superblock.
SGI-PV: 971603
SGI-Modid: xfs-linux-melb:xfs-kern:29871a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 14:22:39 +10:00
Lachlan McIlroy
e893bffd4c [XFS] avoid race in sync_inodes() that can fail to write out all dirty data
In xfs_fs_sync_super() treat a sync the same as a filesystem freeze. This
is needed to force the log to disk for inodes which are not marked dirty
in the Linux inode (the inodes are marked dirty on completion of the log
I/O) and so sync_inodes() will not flush them.

In xfs_fs_write_inode() a synchronous flush will not get an EAGAIN from
xfs_inode_flush() and if an asynchronous flush returns EAGAIN we should
pass it on to the caller. If we get an error while flushing the inode then
re-dirty it so we can try again later.

SGI-PV: 971670
SGI-Modid: xfs-linux-melb:xfs-kern:29860a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 14:22:28 +10:00
Lachlan McIlroy
c2cba57e83 [XFS] This fix prevents bulkstat from spinning in an infinite loop.
Here 'agino' increments through the inodes in an allocation group. At the
end of the innermost 'for' loop it will hold the value of the next inode
to look at (ie the first inode in the next cluster/chunk). Assigning
'lastino' to 'agino' resets it to the last inode in the last inode cluster
we just looked at. This causes us to look up the very same cluster and
examine all the inodes all over again, and again, and again...

We also want to set 'lastino' for the cases when we're not interested in
the inode so that the next call to bulkstat won't re-examine the same
uninteresting inodes.

SGI-PV: 971064
SGI-Modid: xfs-linux-melb:xfs-kern:29840a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 14:21:56 +10:00
Christoph Hellwig
3e5daf05a0 [XFS] simplify xfs_create/mknod/symlink prototype
Simplify the prototype for xfs_create/xfs_mkdir/xfs_symlink by not passing
down a bhv_vattr_t that just hogs stack space. Instead pass down the mode
in a mode_t and in case of xfs_create the rdev as a scalar type as well.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29794a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 14:15:32 +10:00
Christoph Hellwig
c83bfab1fa [XFS] avoid xfs_getattr in XFS_IOC_FSGETXATTR ioctl
No need to call into xfs_getattr and put a big bhv_vattr_t on the stack
just to get a little information from the XFS inode.

Add a helper called xfs_ioc_fsgetxattr instead that deals with retrieving
the information in a clean way.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29780a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 12:21:48 +10:00
Vlad Apostolov
859d718279 [XFS] get_bulkall() could return incorrect inode state
In the following scenario xfs_bulkstat() returns incorrect stale inode
state:

1. File_A is created and its inode synced to disk. 2. File_A is unlinked
and doesn't exist anymore. 3. Filesystem sync is invoked. 4. File_B is
created. File_B happens to reclaim File_A's inode. 5. xfs_bulkstat() is
called and detects File_B but reports the

incorrect File_A inode state.

Explanation for the incorrect inode state is that inodes are not
immediately synced on file create for performance reasons. This leaves the
on-disk inode buffer uninitialized (or with old state from a previous
generation inode) and this is what xfs_bulkstat() would report.

The patch marks the on-disk inode buffer "dirty" on unlink. When the inode
is reclaimed (by a new file create), xfs_bulkstat() would filter this
inode by the "dirty" mark. Once the inode is flushed to disk, the on-disk
buffer "dirty" mark is automatically removed and a following
xfs_bulkstat() would return the correct inode state.

Marking the on-disk inode buffer "dirty" on unlink is achieved by setting
the on-disk di_nlink field to 0. Note that the in-core di_nlink has
already been set to 0 and a corresponding transaction logged by
xfs_droplink(). This is an exception from the rule that any on-disk inode
buffer changes has to be followed by a disk write (inode flush).
Synchronizing the in-core to on-disk di_nlink values in advance (before
the actual inode flush to disk) should be fine in this case because the
inode is already unlinked and it would never change its di_nlink again for
this inode generation.

SGI-PV: 970842
SGI-Modid: xfs-linux-melb:xfs-kern:29757a

Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Mark Goodwin <markgw@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 12:21:15 +10:00
Christoph Hellwig
ba532a980b [XFS] Kill unused IOMAP_EOF flag
SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29705a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 12:20:54 +10:00
Vlad Apostolov
574342f4ad [XFS] fix when DMAPI mount option processing happens
Fix for a regression caused by a recent patch
that moved the DMAPI mount option processing inside xfs_parseargs(). The
DMAPI mount option used to be processed in the DMAPI module loaded before
xfs_parseargs() was invoked.

SGI-PV: 970451
SGI-Modid: xfs-linux-melb:xfs-kern:29683a

Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 12:20:39 +10:00
Lachlan McIlroy
5903c4956f [XFS] ensure file size is logged on synchronous writes
Synchronous writes currently log inode changes before syncing pages to
disk. Since the file size is updated on I/O completion we wont be writing
out the updated file size and if we crash the file will have the wrong
size. This change moves the logging after the syncing of the pages to
ensure we log the correct file size.

SGI-PV: 970334
SGI-Modid: xfs-linux-melb:xfs-kern:29649a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 12:18:38 +10:00
Christoph Hellwig
cc92e7ac8d [XFS] growlock should be a mutex
m_growlock only needs plain binary mutex semantics, so use a struct mutex
instead of a semaphore for it.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29512a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 12:18:09 +10:00
Christoph Hellwig
0adba5363c [XFS] replace some large xfs_log_priv.h macros by proper functions
... or in the case of XLOG_TIC_ADD_OPHDR remove a useless macro entirely.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29511a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 12:17:56 +10:00
Christoph Hellwig
b267ce9952 [XFS] kill struct bhv_vfs
Now that struct bhv_vfs doesn't have any members left we can kill it and
go directly from the super_block to the xfs_mount everywhere.

SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29509a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 12:17:27 +10:00
Christoph Hellwig
7439449670 [XFS] move syncing related members from struct bhv_vfs to struct xfs_mount
SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29508a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 12:16:35 +10:00
Christoph Hellwig
bd186aa901 [XFS] kill the vfs_flags member in struct bhv_vfs
All flags are added to xfs_mount's m_flag instead. Note that the 32bit
inode flag was duplicated in both of them, but only cleared in the mount
when it was not nessecary due to the filesystem beeing small enough. Two
flags are still required here - one to indicate the mount option setting,
and one to indicate if it applies or not.

SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29507a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-10-16 11:45:57 +10:00