Commit graph

3,495 commits

Author SHA1 Message Date
Matthew Wilcox
36c14ed9ca NVMe: Use PRP2 for the nvme_identify ioctl
DMA the result straight to userspace instead of bounce-buffering in the
kernel.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox
53c9577e9c NVMe: Fix admin IRQ claim on real hardware
The admin IRQ is supposed to use the pin-based (or single message MSI)
interrupt.  Accomplish this by filling in entry[0]'s vector with the
INTx irq number.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox
821234603b NVMe: Rename 'cycle' to 'phase'
It's called the phase bit in the current draft

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox
1b23484bd0 NVMe: Implement per-CPU queues
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox
b3b06812e1 NVMe: Reduce set_queue_count arguments by one
sq_count and cq_count are always the same, so just call it 'count'.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox
3001082cac NVMe: Factor out queue_request_irq()
Two callers with an almost identical long string of arguments, and
introducing a third soon.  Time to factor out the commonalities.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox
b60503ba43 NVMe: New driver
This driver is for devices that follow the NVM Express standard

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Michael S. Tsirkin
5087a50e66 virtio-blk: use ida to allocate disk index
Based on a patch by Mark Wu <dwu@redhat.com>

Current index allocation in virtio-blk is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while.  It also could cause confusion about the disk
name in the case of hot-plugging disks.
Change virtio-blk to use ida to allocate index, instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-02 11:41:02 +10:30
Paul Gortmaker
0c8d44f239 block: Fix files that are modules and hence need module.h
We want to remove the implicit everywhere presence of module.h
so fix up the people relying on that implicit presence in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:13 -04:00
Paul Gortmaker
d5decd3b95 block: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros
These files were getting <linux/module.h> via an implicit include
path, but we want to crush those out of existence since they cost
time during compiles of processing thousands of lines of headers
for no reason.  Give them the lightweight header that just contains
the EXPORT_SYMBOL infrastructure.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:12 -04:00
Michael S. Tsirkin
a0eda62552 virtio-blk: use ida to allocate disk index
Based on a patch by Mark Wu <dwu@redhat.com>

Current index allocation in virtio-blk is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while.  It also could cause confusion about the disk
name in the case of hot-plugging disks.
Change virtio-blk to use ida to allocate index, instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-31 08:05:36 +01:00
Linus Torvalds
97d2eb13a0 Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client
* 'for-linus' of git://ceph.newdream.net/git/ceph-client:
  libceph: fix double-free of page vector
  ceph: fix 32-bit ino numbers
  libceph: force resend of osd requests if we skip an osdmap
  ceph: use kernel DNS resolver
  ceph: fix ceph_monc_init memory leak
  ceph: let the set_layout ioctl set single traits
  Revert "ceph: don't truncate dirty pages in invalidate work thread"
  ceph: replace leading spaces with tabs
  libceph: warn on msg allocation failures
  libceph: don't complain on msgpool alloc failures
  libceph: always preallocate mon connection
  libceph: create messenger with client
  ceph: document ioctls
  ceph: implement (optional) max read size
  ceph: rename rsize -> rasize
  ceph: make readpages fully async
2011-10-28 16:42:18 -07:00
David Vrabel
2d073846b8 block: xen-blkback: use API provided by xenbus module to map rings
The xenbus module provides xenbus_map_ring_valloc() and
xenbus_map_ring_vfree().  Use these to map the ring pages granted by
the frontend.

Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-26 10:02:35 -04:00
Sage Weil
6ab00d465a libceph: create messenger with client
This simplifies the init/shutdown paths, and makes client->msgr available
during the rest of the setup process.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Linus Torvalds
59e5253417 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
  MAINTAINERS: linux-m32r is moderated for non-subscribers
  linux@lists.openrisc.net is moderated for non-subscribers
  Drop default from "DM365 codec select" choice
  parisc: Kconfig: cleanup Kernel page size default
  Kconfig: remove redundant CONFIG_ prefix on two symbols
  cris: remove arch/cris/arch-v32/lib/nand_init.S
  microblaze: add missing CONFIG_ prefixes
  h8300: drop puzzling Kconfig dependencies
  MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
  tty: drop superfluous dependency in Kconfig
  ARM: mxc: fix Kconfig typo 'i.MX51'
  Fix file references in Kconfig files
  aic7xxx: fix Kconfig references to READMEs
  Fix file references in drivers/ide/
  thinkpad_acpi: Fix printk typo 'bluestooth'
  bcmring: drop commented out line in Kconfig
  btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
  doc: raw1394: Trivial typo fix
  CIFS: Don't free volume_info->UNC until we are entirely done with it.
  treewide: Correct spelling of successfully in comments
  ...
2011-10-25 12:11:02 +02:00
Linus Torvalds
31018acd4c Merge branches 'stable/bug.fixes-3.2' and 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/p2m/debugfs: Make type_name more obvious.
  xen/p2m/debugfs: Fix potential pointer exception.
  xen/enlighten: Fix compile warnings and set cx to known value.
  xen/xenbus: Remove the unnecessary check.
  xen/irq: If we fail during msi_capability_init return proper error code.
  xen/events: Don't check the info for NULL as it is already done.
  xen/events: BUG() when we can't allocate our event->irq array.

* 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fix selfballooning and ensure it doesn't go too far
  xen/gntdev: Fix sleep-inside-spinlock
  xen: modify kernel mappings corresponding to granted pages
  xen: add an "highmem" parameter to alloc_xenballooned_pages
  xen/p2m: Use SetPagePrivate and its friends for M2P overrides.
  xen/p2m: Make debug/xen/mmu/p2m visible again.
  Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
2011-10-25 09:17:47 +02:00
Jens Axboe
83157223de Merge branch 'for-linus' into for-3.2/core 2011-10-24 16:24:38 +02:00
Mike Miller
ab5dbebe33 cciss: add small delay when using PCI Power Management to reset for kump
The P600 requires a small delay when changing states. Otherwise we may think
the board did not reset and we bail. This for kdump only and is particular
to the P600.

Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-20 22:21:52 +02:00
Jens Axboe
b8d8bdfe31 Merge branch 'stable/for-jens-3.2' of git://oss.oracle.com/git/kwilk/xen into for-3.2/drivers 2011-10-20 15:10:59 +02:00
Jens Axboe
5c04b426f2 Merge branch 'v3.1-rc10' into for-3.2/core
Conflicts:
	block/blk-core.c
	include/linux/blkdev.h

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:30:42 +02:00
Konrad Rzeszutek Wilk
6927d92091 xen/blkback: Fix two races in the handling of barrier requests.
There are two windows of opportunity to cause a race when
processing a barrier request. This patch fixes this.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-17 14:28:57 -04:00
Christoph Hellwig
456be1484f loop: remove the incorrect write_begin/write_end shortcut
Currently the loop device tries to call directly into write_begin/write_end
instead of going through ->write if it can.  This is a fairly nasty shortcut
as write_begin and write_end are only callbacks for the generic write code
and expect to be called with filesystem specific locks held.

This code currently causes various issues for clustered filesystems as it
doesn't take the required cluster locks, and it also causes issues for XFS
as it doesn't properly lock against the swapext ioctl as called by the
defragmentation tools.  This in case causes data corruption if
defragmentation hits a busy loop device in the wrong time window, as
reported by RH QA.

The reason why we have this shortcut is that it saves a data copy when
doing a transformation on the loop device, which is the technical term
for using cryptoloop (or an XOR transformation).  Given that cryptoloop
has been deprecated in favour of dm-crypt my opinion is that we should
simply drop this shortcut instead of finding complicated ways to to
introduce a formal interface for this shortcut.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-17 12:57:20 +02:00
Konrad Rzeszutek Wilk
dda1852802 xen/blkback: Check for proper operation.
The patch titled: "xen/blkback: Fix the inhibition to map pages
when discarding sector ranges." had the right idea except that
it used the wrong comparison operator. It had == instead of !=.

This fixes the bug where all (except discard) operations would
have been ignored.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-14 12:29:55 -04:00
Lars Ellenberg
3cb7a2a90f drbd: get rid of drbd_bcast_ee, it is of no use anymore
This function was used to broadcast the (leading part of the)
bio payload in case we see a data integrity error.  It could be received
from userland with the drbdsetup events subcommand,
to have a peek into the payload that caused the checksum mismatch,
and guess from there what may have caused the mismatch,
mainly to guess wether it was modification of in-flight data,
or data corruption by broken hardware or software bugs.

Meanwhile we support bios that are larger than the maximum payload a
netlink datagram can carry.
And we have means to reliably detect modification of in-flight data by
calculating, and comparing, the checksum before and after sendmsg.
There is no need to carry this around anymore.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:48:08 +02:00
Lars Ellenberg
569083c08d drbd: fix drbd_delete_device: remove vnr from volumes; idr_remove(); synchronize_rcu(); before cleanup
Still missing: rcu_readlock() on the various call sites that
access/iterate over those idrs.

We don't need a specific write lock, as we only modify from
configuration context, which is already strictly serialized.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:48:07 +02:00
Lars Ellenberg
da4a75d2ef drbd: introduce a bio_set to allocate housekeeping bios from
Don't rely on availability of bios from the global fs_bio_set,
we should use our own bio_set for meta data IO.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:48:06 +02:00
Lars Ellenberg
9db4e77f8c drbd: use the newly introduced page pool for bitmap IO
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:48:05 +02:00
Lars Ellenberg
35abf59424 drbd: add page pool to be used for meta data IO
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:48:04 +02:00
Lars Ellenberg
3c13b680ce drbd: only wakeup if something changed in update_peer_seq
This commit got it wrong:
    drbd: Make the peer_seq updating code more obvious

    Make it more clear that update_peer_seq() is supposed to wake up the
    seq_wait queue whenever the sequence number changes.

We don't need to wake up everytime we receive a sequence number
that is _different_ from our currently stored "newest" sequence number,
but only if we receive a sequence number _newer_ than what we already
have, when we actually change mdev->peer_seq.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:48:04 +02:00
Lars Ellenberg
2c4a48d097 drbd: remove unused define
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:48:02 +02:00
Philipp Reisner
81a5d60ecf drbd: Replaced the minor_table array by an idr
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:48:01 +02:00
Philipp Reisner
774b305518 drbd: Implemented new commands to create/delete connections/minors
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:48:00 +02:00
Philipp Reisner
80883197da drbd: Converted drbd_nl_(net_conf|disconnect)() from mdev to tconn
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:48:00 +02:00
Philipp Reisner
1aba4d7fcf drbd: Preparing the connector interface to operator on connections
Up to now it only operated on minor numbers. Now it can work also
on named connections.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:59 +02:00
Philipp Reisner
2f5cdd0b2c drbd: Converted the transfer log from mdev to tconn
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:58 +02:00
Philipp Reisner
49559d87fd drbd: Improved the dec_*() macros
Now those can be used with a struct drbd_conf * that has an other
name than 'mdev'.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:57 +02:00
Philipp Reisner
3f9cbe937e drbd: Removed the mdev parameter from the ..to_tags() and ...from_tags() functions
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:56 +02:00
Philipp Reisner
0e29d163f7 drbd: Reworked the unconfiguring and thread stopping code
* Moved CONFIG_PENDING and DEVICE_DYING from mdev to tconn.
* Renamed drbd_reconfig_start() and drbd_reconfig_done() to
  conn_reconfig_start() and conn_reconfig_done().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:55 +02:00
Andreas Gruenbacher
c66342d949 drbd: Remove left-over function prototypes
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:55 +02:00
Andreas Gruenbacher
7201b972de drbd: Replace get_asender_cmd() with its implementation
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:54 +02:00
Andreas Gruenbacher
6e849ce88c drbd: Get rid of P_MAX_CMD
Instead of artificially enlarging the command decoding arrays to
P_MAX_CMD entries, check if an index is within the valid range using the
ARRAY_SIZE() macro.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:53 +02:00
Andreas Gruenbacher
1b3bb47d52 drbd: Remove redundant check
Opening a device only succeeds on a primary node, or when explicitly
setting the allow_oos module parameter to allow opening the device
read-only on a secondary node.  There is no other way that a request can
get into drbd_make_request(), so this code cannot trigger.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:52 +02:00
Andreas Gruenbacher
7be8da0798 drbd: Improve how conflicting writes are handled
The previous algorithm for dealing with overlapping concurrent writes
was generating unnecessary warnings for scenarios which could be
legitimate, and did not always handle partially overlapping requests
correctly.  Improve it algorithm as follows:

* While local or remote write requests are in progress, conflicting new
  local write requests will be delayed (commit 82172f7).

* When a conflict between a local and remote write request is detected,
  the node with the discard flag decides how to resolve the conflict: It
  will ask its peer to discard conflicting requests which are fully
  contained in the local request and retry requests which overlap only
  partially.  This involves a protocol change.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:51 +02:00
Andreas Gruenbacher
71b1c1eb9c drbd: Use ping-timeout when waiting for missing ack packets
When the node with the discard flag resolves write conflicts in
dual-primary mode, it may determine that its peer has sent ack packets
on the metadata socket which did not arrive, yet.  Wait for the next ack
with ping-timeout instead of a hard-coded 30 seconds.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:51 +02:00
Andreas Gruenbacher
8ccf218e9f drbd: Replace atomic_add_return with atomic_inc_return
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:50 +02:00
Andreas Gruenbacher
206d358941 drbd: Concurrent write detection fix
Commit 9b1e63e changed the concurrent write detection algorithm to only insert
peer requests into write_requests tree after determining that there is no
conflict.  With this change, new conflicting local requests could be added
while the algorithm runs, but this case was not handled correctly.  Instead of
making the algorithm deal with this case, switch back to adding peer requests
to the write_requests tree immediately: this improves fairness.

When a peer request is discarded, remove that request from the write_requests

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:49 +02:00
Andreas Gruenbacher
8050e6d005 drbd: Use container_of() instead of casting
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:48 +02:00
Lars Ellenberg
9676c76097 drbd: fix a wrong likely(), updated comments
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:47 +02:00
Lars Ellenberg
c9d963a46d drbd: silence some log messages on bitmap IO
Summary log messages meant for global bitmap IO
should not be printed for bitmap IO caused by
activity log transactions.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:47 +02:00
Lars Ellenberg
7ad651b522 drbd: new on-disk activity log transaction format
Use a new on-disk transaction format for the activity log, which allows
for multiple changes to the active set per transaction.

Using 4k transaction blocks, we can now get rid of the work-around code
to deal with devices not supporting 512 byte logical block size.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-10-14 16:47:46 +02:00