Commit graph

4,663 commits

Author SHA1 Message Date
Lars Ellenberg
06f10adbdb drbd: prepare for more than 32 bit flags
- struct drbd_conf { ... unsigned long flags; ... }
 + struct drbd_conf { ... unsigned long drbd_flags[N]; ... }

And introduce wrapper functions for test/set/clear bit operations
on this member.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:18 +01:00
Lars Ellenberg
44edfb0d78 drbd: wait for meta data IO completion even with failed disk, unless force-detached
The intention of force-detach is to be able to deal with a completely
unresponsive lower level IO stack, which does not even deliver error
completions anymore, but no completion at all.

In all other cases, we must still wait for the meta data IO completion.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:18 +01:00
Lars Ellenberg
8b45a5c8a1 drbd: a few more GFP_KERNEL -> GFP_NOIO
This has not yet been observed, but conceivably, when using GFP_KERNEL
allocations from drbd_md_sync(), drbd_flush_after_epoch() or
receive_SyncParam(), we could trigger additional IO to our own device,
or an other device in a criss-cross setup, and end up in a local
deadlock, or potentially a distributed deadlock in a criss-cross setup
involving the peer blocked in a similar way waiting for us to make
progress.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:18 +01:00
Lars Ellenberg
0b143d4382 drbd: fix potential deadlock during bitmap (re-)allocation
The former comment arguing that GFP_KERNEL was good enough was wrong: it
did not take resize into account at all, and assumed the only path
leading here was the normal attach on a still secondary device, so no
deadlock would be possible.

Both resize on a Primary, or attach on a diskless Primary,
could potentially deadlock.

drbd_bm_resize() is called while IO to the respective device is
suspended, so we must use GFP_NOIO to avoid potential deadlock.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:18 +01:00
Lars Ellenberg
7fb907c15f drbd: panic on delayed completion of aborted requests
"aborting" requests, or force-detaching the disk, is intended for
completely blocked/hung local backing devices which do no longer
complete requests at all, not even do error completions.  In this
situation, usually a hard-reset and failover is the only way out.

By "aborting", basically faking a local error-completion,
we allow for a more graceful swichover by cleanly migrating services.
Still the affected node has to be rebooted "soon".

By completing these requests, we allow the upper layers to re-use
the associated data pages.

If later the local backing device "recovers", and now DMAs some data
from disk into the original request pages, in the best case it will
just put random data into unused pages; but typically it will corrupt
meanwhile completely unrelated data, causing all sorts of damage.

Which means delayed successful completion,
especially for READ requests,
is a reason to panic().

We assume that a delayed *error* completion is OK,
though we still will complain noisily about it.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:17 +01:00
Philipp Reisner
dbd0820c6f drbd: Remove dead code
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:17 +01:00
Philipp Reisner
599377acb7 drbd: Avoid NetworkFailure state during disconnect
Disconnecting is a cluster wide state change. In case the peer node agrees
to the state transition, it sends back the fact on the meta-data connection
and closes both sockets.

In case the node node that initiated the state transfer sees the closing
action on the data-socket, before the P_STATE_CHG_REPLY packet, it was
going into one of the network failure states.

At least with the fencing option set to something else thatn "dont-care",
the unclean shutdown of the connection causes a short IO freeze or
a fence operation.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:17 +01:00
Philipp Reisner
c12a3d8c84 drbd: Fix a potential issue with the DISCARD_CONCURRENT flag
The DISCARD_CONCURRENT flag should be set on one node and cleared on the
other node.
As the code was before it was theoretical possible that a node accepts the
meta socket, but has to close it later on, and keeps the DISCARD_CONCURRENT
flag.
Correct this by moving the clear_bit(DISCARD_CONCURRENT) where the packet
gets sent.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:01 +01:00
Lars Ellenberg
02b91b5526 drbd: introduce stop-sector to online verify
We now can schedule only a specific range of sectors for online verify,
or interrupt a running verify without interrupting the connection.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:01 +01:00
Philipp Reisner
9f2247bb9b drbd: Protect accesses to the uuid set with a spinlock
There is at least the worker context, the receiver context, the context of
receiving netlink packts and processes reading a sysfs attribute that access
the uuids.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:01 +01:00
Dave Chinner
a1ecac3b06 loop: Make explicit loop device destruction lazy
xfstests has always had random failures of tests due to loop devices
failing to be torn down and hence leaving filesytems that cannot be
unmounted. This causes test runs to immediately stop.

Over the past 6 or 7 years we've added hacks like explicit unmount
-d commands for loop mounts, losetup -d after unmount -d fails, etc,
but still the problems persist.  Recently, the frequency of loop
related failures increased again to the point that xfstests 259 will
reliably fail with a stray loop device that was not torn down.

That is despite the fact the test is above as simple as it gets -
loop 5 or 6 times running mkfs.xfs with different paramters:

        lofile=$(losetup -f)
        losetup $lofile "$testfile"
        "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
        sync
        losetup -d $lofile

And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
every time the test is run.

Turns out that blkid is running simultaneously with losetup -d, and
so it sees an elevated reference count and returns EBUSY.  But why
is blkid running? It's obvious, isn't it? udev has decided to try
and find out what is on the block device as a result of a creation
notification. And it is racing with mkfs, so might still be scanning
the device when mkfs finishes and we try to tear it down.

So, make losetup -d force autoremove behaviour. That is, when the
last reference goes away, tear down the device. xfstests wants it
*gone*, not causing random teardown failures when we know that all
the operations the tests have specifically run on the device have
completed and are no longer referencing the loop device.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:37:31 +01:00
Selvan Mani
4453bc88f0 mtip32xx:Added appropriate timeout value for secure erase
Added appropriate timeout value for secure erase based on identify device data

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:37:27 +01:00
Oliver Chick
1f999572f2 xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool
Changing the type of bdev parameters to be unsigned int :1, rather than bool.
This is more consistent with the types of other features in the block drivers.

Signed-off-by: Oliver Chick <oliver.chick@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:37:20 +01:00
Akinobu Mita
b7010ede43 cciss: select CONFIG_CHECK_SIGNATURE
The patch cciss-use-check_signature.patch in -mm tree introduced
a build error:

drivers/built-in.o: In function `CISS_signature_present':
drivers/block/cciss.c:4270: undefined reference to `check_signature'

Add missing CONFIG_CHECK_SIGNATURE to fix this issue.

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Fengguang Wu <wfg@linux.intel.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Acked-by: "Stephen M. Cameron" <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:37:00 +01:00
Wei Yongjun
2541aa799f cciss: remove unneeded memset()
The memory return by kzalloc() or kmem_cache_zalloc() has already be set
to zero, so remove useless memset(0).

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:36:58 +01:00
Wei Yongjun
654dbef214 xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-30 08:36:27 +01:00
Herton Ronaldo Krzesinski
1a4ae43e4f floppy: remove dr, reuse drive on do_floppy_init
This is a small cleanup, that also may turn error handling of
unitialized disks more readable. We don't need a separate variable to
track allocated disks, remove dr and reuse drive variable instead.

Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:36:07 +01:00
Herton Ronaldo Krzesinski
8d3ab4ebfd floppy: use common function to check if floppies can be registered
The same checks to see if a drive can be or is registered are
repeated through the code, factor out the checks in a common function
and replace the repeated checks with it.

Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski
d60e7ec18c floppy: properly handle failure on add_disk loop
On floppy initialization, if something failed inside the loop we call
add_disk, there was no cleanup of previous iterations in the error
handling.

Cc: stable@vger.kernel.org
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski
238ab78469 floppy: do put_disk on current dr if blk_init_queue fails
If blk_init_queue fails, we do not call put_disk on the current dr
(dr is decremented first in the error handling loop).

Cc: stable@vger.kernel.org
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski
b54e1f8889 floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop
Since commit 070ad7e ("floppy: convert to delayed work and single-thread
wq"), we end up calling alloc_ordered_workqueue multiple times inside
the loop, which shouldn't be intended. Besides the leak, other side
effect in the current code is if blk_init_queue fails, we would end up
calling unregister_blkdev even if we didn't call yet register_blkdev.

Just moved the allocation of floppy_wq before the loop, and adjusted the
code accordingly.

Cc: stable@vger.kernel.org # 3.5+
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:34:24 +01:00
Konrad Rzeszutek Wilk
2911758f14 xen/blkback: Fix compile warning
drivers/block/xen-blkback/xenbus.c:260:5: warning: symbol 'xenvbd_sysfs_addif' was not declared. Should it be static?
drivers/block/xen-blkback/xenbus.c:284:6: warning: symbol 'xenvbd_sysfs_delif' was not declared. Should it be static?

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-30 08:32:43 +01:00
Alex Elder
069a4b5690 rbd: kill rbd_device->rbd_opts
The rbd_device structure has an embedded rbd_options structure.
Such a structure is needed to work with the generic ceph argument
parsing code, but there's no need to keep it around once argument
parsing is done.

Use a local variable to hold the rbd options used in parsing in
rbd_get_client(), and just transfer its content (it's just a
read_only flag) into the field in the rbd_mapping sub-structure
that requires that information.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-10-26 17:18:08 -05:00
Alex Elder
e5cfeed281 rbd: simplify rbd_merge_bvec()
The aim of this patch is to make what's going on rbd_merge_bvec() a
bit more obvious than it was before.  This was an issue when a
recent btrfs bug led us to question whether the merge function was
working correctly.

Use "obj" rather than "chunk" to indicate the units whose boundaries
we care about we call (rados) "objects".

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-10-26 17:18:08 -05:00
Alex Elder
d4b125e9eb rbd: increase maximum snapshot name length
Change RBD_MAX_SNAP_NAME_LEN to be based on NAME_MAX.  That is a
practical limit for the length of a snapshot name (based on the
presence of a directory using the name under /sys/bus/rbd to
represent the snapshot).

The /sys entry is created by prefixing it with "snap_"; define that
prefix symbolically, and take its length into account in defining
the snapshot name length limit.

Enforce the limit in rbd_add_parse_args().  Also delete a dout()
call in that function that was not meant to be committed.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-26 17:18:08 -05:00
Alex Elder
db2388b6ee rbd: verify rbd image order value
This adds a verification that an rbd image's object order is
within the upper and lower bounds supported by this implementation.

It must be at least 9 (SECTOR_SHIFT), because the Linux bio system
assumes that minimum granularity.

It also must be less than 32 (at the moment anyway) because there
exist spots in the code that store the size of a "segment" (object
backing an rbd image) in a signed int variable, which can be 32 bits
including the sign.  We should be able to relax this limit once
we've verified the code uses 64-bit types where needed.

Note that the CLI tool already limits the order to the range 12-25.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-26 17:18:08 -05:00
Alex Elder
4634246db8 rbd: consolidate rbd_do_op() calls
The two calls to rbd_do_op() from rbd_rq_fn() differ only in the
value passed for the snapshot id and the snapshot context.

For reads the snapshot always comes from the mapping, and for writes
the snapshot id is always CEPH_NOSNAP.

The snapshot context is always null for reads.  For writes, the
snapshot context always comes from the rbd header, but it is
acquired under protection of header semaphore and could change
thereafter, so we can't simply use what's available inside
rbd_do_op().

Eliminate the snapid parameter from rbd_do_op(), and set it
based on the I/O direction inside that function instead.  Always
pass the snapshot context acquired in the caller, but reset it
to a null pointer inside rbd_do_op() if the operation is a read.

As a result, there is no difference in the read and write calls
to rbd_do_op() made in rbd_rq_fn(), so just call it unconditionally.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-26 17:18:08 -05:00
Alex Elder
ff2e4bb5b3 rbd: drop rbd_do_op() opcode and flags
The only callers of rbd_do_op() are in rbd_rq_fn(), where call one
is used for writes and the other used for reads.  The request passed
to rbd_do_op() already encodes the I/O direction, and that
information can be used inside the function to set the opcode and
flags value (rather than passing them in as arguments).

So get rid of the opcode and flags arguments to rbd_do_op().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-26 17:18:08 -05:00
Alex Elder
13f4042c05 rbd: kill rbd_req_{read,write}()
Both rbd_req_read() and rbd_req_write() are simple wrapper routines
for rbd_do_op(), and each is only called once.  Replace each wrapper
call with a direct call to rbd_do_op(), and get rid of the wrapper
functions.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-26 17:18:08 -05:00
Alex Elder
be466c1cc3 rbd: fix read-only option name
The name of the "read-only" mapping option was inadvertently changed
in this commit:

    f84344f3 rbd: separate mapping info in rbd_dev

Revert that hunk to return it to what it should be.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-26 17:18:08 -05:00
Alex Elder
a0ea3a40fd rbd: zero return code in rbd_dev_image_id()
When rbd_dev_probe() calls rbd_dev_image_id() it expects to get
a 0 return code if successful, but it is getting a positive value.

The reason is that rbd_dev_image_id() returns the value it gets from
rbd_req_sync_exec(), which returns the number of bytes read in as a
result of the request.  (This ultimately comes from
ceph_copy_from_page_vector() in rbd_req_sync_op()).

Force the return value to 0 when successful in rbd_dev_image_id().
Do the same in rbd_dev_v2_object_prefix().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-10-26 17:18:08 -05:00
Alex Elder
b213e0b1a6 rbd: fix bug in rbd_dev_id_put()
In rbd_dev_id_put(), there's a loop that's intended to determine
the maximum device id in use.  But it isn't doing that at all,
the effect of how it's written is to simply use the just-put id
number, which ignores whole purpose of this function.

Fix the bug.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-26 17:18:08 -05:00
Kees Cook
b8977285ec drivers/block: remove CONFIG_EXPERIMENTAL
This config item has not carried much meaning for a while now and is
almost always enabled by default. As agreed during the Linux kernel
summit, remove it.

CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Asai Thambi S P <asamymuthupa@micron.com>
CC: Pete Zaitcev <zaitcev@redhat.com>
CC: Cong Wang <xiyou.wangcong@gmail.com>
CC: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-23 22:30:38 +02:00
Linus Torvalds
ce40be7a82 Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block
Pull block IO update from Jens Axboe:
 "Core block IO bits for 3.7.  Not a huge round this time, it contains:

   - First series from Kent cleaning up and generalizing bio allocation
     and freeing.

   - WRITE_SAME support from Martin.

   - Mikulas patches to prevent O_DIRECT crashes when someone changes
     the block size of a device.

   - Make bio_split() work on data-less bio's (like trim/discards).

   - A few other minor fixups."

Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew
Morton.  It is due to the VM no longer using a prio-tree (see commit
6b2dbba8b6: "mm: replace vma prio_tree with an interval tree").

So make set_blocksize() use mapping_mapped() instead of open-coding the
internal VM knowledge that has changed.

* 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits)
  block: makes bio_split support bio without data
  scatterlist: refactor the sg_nents
  scatterlist: add sg_nents
  fs: fix include/percpu-rwsem.h export error
  percpu-rw-semaphore: fix documentation typos
  fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared
  blockdev: turn a rw semaphore into a percpu rw semaphore
  Fix a crash when block device is read and block size is changed at the same time
  block: fix request_queue->flags initialization
  block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
  block: ioctl to zero block ranges
  block: Make blkdev_issue_zeroout use WRITE SAME
  block: Implement support for WRITE SAME
  block: Consolidate command flag and queue limit checks for merges
  block: Clean up special command handling logic
  block/blk-tag.c: Remove useless kfree
  block: remove the duplicated setting for congestion_threshold
  block: reject invalid queue attribute values
  block: Add bio_clone_bioset(), bio_clone_kmalloc()
  block: Consolidate bio_alloc_bioset(), bio_kmalloc()
  ...
2012-10-11 09:04:23 +09:00
Alex Elder
35152979e6 rbd: activate v2 image support
Now that v2 images support is fully implemented, have
rbd_dev_v2_probe() return 0 to indicate a successful probe.

(Note that an image that implements layering will fail
the probe early because of the feature chekc.)

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-10 07:44:01 -07:00
Alex Elder
d889140c4a rbd: implement feature checks
Version 2 images have two sets of feature bit fields.  The first
indicates features possibly used by the image.  The second indicates
features that the client *must* support in order to use the image.

When an image (or snapshot) is first examined, we need to make sure
that the local implementation supports the image's required
features.  If not, fail the probe for the image.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-10 07:43:51 -07:00
Alex Elder
117973fb4c rbd: define rbd_dev_v2_refresh()
Define a new function rbd_dev_v2_refresh() to update/refresh the
snapshot context for a format version 2 rbd image.  This function
will update anything that is not fixed for the life of an rbd
image--at the moment this is mainly the snapshot context and (for
a base mapping) the size.

Update rbd_refresh_header() so it selects which function to use
based on the image format.

Rename __rbd_refresh_header() to be rbd_dev_v1_refresh()
to be consistent with the naming of its version 2 counterpart.
Similarly rename rbd_refresh_header() to be rbd_dev_refresh().

Unrelated--we use rbd_image_format_valid() here.  Delete the other
use of it, which was primarily put in place to ensure that function
was referenced at the time it was defined.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-10 07:43:39 -07:00
Alex Elder
9478554ae5 rbd: define rbd_update_mapping_size()
Encapsulate the code that handles updating the size of a mapping
after an rbd image has been refreshed.  This is done in anticipation
of the next patch, which will make this common code for format 1 and
2 images.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-10 07:43:28 -07:00
Linus Torvalds
7035cdf36d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "The bulk of this pull is a series from Alex that refactors and cleans
  up the RBD code to lay the groundwork for supporting the new image
  format and evolving feature set.  There are also some cleanups in
  libceph, and for ceph there's fixed validation of file striping
  layouts and a bugfix in the code handling a shrinking MDS cluster."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
  ceph: avoid 32-bit page index overflow
  ceph: return EIO on invalid layout on GET_DATALOC ioctl
  rbd: BUG on invalid layout
  ceph: propagate layout error on osd request creation
  libceph: check for invalid mapping
  ceph: convert to use le32_add_cpu()
  ceph: Fix oops when handling mdsmap that decreases max_mds
  rbd: update remaining header fields for v2
  rbd: get snapshot name for a v2 image
  rbd: get the snapshot context for a v2 image
  rbd: get image features for a v2 image
  rbd: get the object prefix for a v2 rbd image
  rbd: add code to get the size of a v2 rbd image
  rbd: lay out header probe infrastructure
  rbd: encapsulate code that gets snapshot info
  rbd: add an rbd features field
  rbd: don't use index in __rbd_add_snap_dev()
  rbd: kill create_snap sysfs entry
  rbd: define rbd_dev_image_id()
  rbd: define some new format constants
  ...
2012-10-08 06:38:18 +09:00
Linus Torvalds
dc92b1f9ab Merge branch 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio changes from Rusty Russell:
 "New workflow: same git trees pulled by linux-next get sent straight to
  Linus.  Git is awkward at shuffling patches compared with quilt or mq,
  but that doesn't happen often once things get into my -next branch."

* 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (24 commits)
  lguest: fix occasional crash in example launcher.
  virtio-blk: Disable callback in virtblk_done()
  virtio_mmio: Don't attempt to create empty virtqueues
  virtio_mmio: fix off by one error allocating queue
  drivers/virtio/virtio_pci.c: fix error return code
  virtio: don't crash when device is buggy
  virtio: remove CONFIG_VIRTIO_RING
  virtio: add help to CONFIG_VIRTIO option.
  virtio: support reserved vqs
  virtio: introduce an API to set affinity for a virtqueue
  virtio-ring: move queue_index to vring_virtqueue
  virtio_balloon: not EXPERIMENTAL any more.
  virtio-balloon: dependency fix
  virtio-blk: fix NULL checking in virtblk_alloc_req()
  virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path
  virtio-blk: Add bio-based IO path for virtio-blk
  virtio: console: fix error handling in init() function
  tools: Fix pthread flag for Makefile of trace-agent used by virtio-trace
  tools: Add guest trace agent as a user tool
  virtio/console: Allocate scatterlist according to the current pipe size
  ...
2012-10-07 21:04:56 +09:00
Linus Torvalds
f1c6872e49 Features:
* Allow a Linux guest to boot as initial domain and as normal guests
    on Xen on ARM (specifically ARMv7 with virtualized extensions).
    PV console, block and network frontend/backends are working.
 Bug-fixes:
  * Fix compile linux-next fallout.
  * Fix PVHVM bootup crashing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQbJELAAoJEFjIrFwIi8fJSI4H/32qrQKyF5IIkFKHTN9FYDC1
 OxEGc4y47DIQpGUd/PgZ/i6h9Iyhj+I6pb4lCevykwgd0j83noepluZlCIcJnTfL
 HVXNiRIQKqFhqKdjTANxVM4APup+7Lqrvqj6OZfUuoxaZ3tSTLhabJ/7UXf2+9xy
 g2RfZtbSdQ1sukQ/A2MeGQNT79rh7v7PrYQUYSrqytjSjSLPTqRf75HWQ+eapIAH
 X3aVz8Tn6nTixZWvZOK7rAaD4awsFxGP6E46oFekB02f4x9nWHJiCZiXwb35lORb
 tz9F9td99f6N4fPJ9LgcYTaCPwzVnceZKqE9hGfip4uT+0WrEqDxq8QmBqI5YtI=
 =gxJD
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull ADM Xen support from Konrad Rzeszutek Wilk:

  Features:
   * Allow a Linux guest to boot as initial domain and as normal guests
     on Xen on ARM (specifically ARMv7 with virtualized extensions).  PV
     console, block and network frontend/backends are working.
  Bug-fixes:
   * Fix compile linux-next fallout.
   * Fix PVHVM bootup crashing.

  The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports
  ARMv7 platforms.

  The goal in implementing this architecture is to exploit the hardware
  as much as possible.  That means use as little as possible of PV
  operations (so no PV MMU) - and use existing PV drivers for I/Os
  (network, block, console, etc).  This is similar to how PVHVM guests
  operate in X86 platform nowadays - except that on ARM there is no need
  for QEMU.  The end result is that we share a lot of the generic Xen
  drivers and infrastructure.

  Details on how to compile/boot/etc are available at this Wiki:

    http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions

  and this blog has links to a technical discussion/presentations on the
  overall architecture:

    http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/

* tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits)
  xen/xen_initial_domain: check that xen_start_info is initialized
  xen: mark xen_init_IRQ __init
  xen/Makefile: fix dom-y build
  arm: introduce a DTS for Xen unprivileged virtual machines
  MAINTAINERS: add myself as Xen ARM maintainer
  xen/arm: compile netback
  xen/arm: compile blkfront and blkback
  xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree
  xen/arm: receive Xen events on ARM
  xen/arm: initialize grant_table on ARM
  xen/arm: get privilege status
  xen/arm: introduce CONFIG_XEN on ARM
  xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM
  xen/arm: Introduce xen_ulong_t for unsigned long
  xen/arm: Xen detection and shared_info page mapping
  docs: Xen ARM DT bindings
  xen/arm: empty implementation of grant_table arch specific functions
  xen/arm: sync_bitops
  xen/arm: page.h definitions
  xen/arm: hypercalls
  ...
2012-10-07 07:13:01 +09:00
Ed Cashin
322c9ec009 aoe: update aoe-internal version number to 50
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:30 +09:00
Ed Cashin
1ac9e60262 aoe: remove unused code
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:30 +09:00
Ed Cashin
08b6062351 aoe: make dynamic block minor numbers the default
Because udev use is so widespread, making the old static mapping the
default is too conservative, given the severe limitations it places on
usable AoE addresses.  Storage virtualization and larger shelves have made
the old limitations too confining.

These changes make the dynamic block device minor numbers the default,
removing the limitations on usable AoE addresses.

The static arrangement is still available with aoe_dyndevs=0, and the
aoe-stat tool from the userland aoetools package, the user space
counterpart to the aoe driver, recognizes the case where there is a
mismatch between the minor number in sysfs and the minor number in a
special device file.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:29 +09:00
Ed Cashin
7159e969d1 aoe: update and specify AoE address guards and error messages
In general, specific is better when it comes to messages about AoE usage
problems.  Also, explicit checks for the AoE broadcast addresses are
added.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:29 +09:00
Ed Cashin
4bcce1a355 aoe: retain static block device numbers for backwards compatibility
The old mapping between AoE target shelf and slot addresses and the block
device minor number is retained as a backwards-compatible feature, with a
new "aoe_dyndevs" module parameter available for enabling dynamic block
device minor numbers.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:29 +09:00
Ed Cashin
0c96621458 aoe: support more AoE addresses with dynamic block device minor numbers
The ATA over Ethernet protocol uses a major (shelf) and minor (slot)
address to identify a particular storage target.  These changes remove an
artificial limitation the aoe driver imposes on the use of AoE addresses.
For example, without these changes, the slot address has a maximum of 15,
but users commonly use slot numbers much greater than that.

The AoE shelf and slot address space is often used sparsely.  Instead of
using a static mapping between AoE addresses and the block device minor
number, the block device minor numbers are now allocated on demand.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:28 +09:00
Ed Cashin
fea05a26c3 aoe: update copyright year in touched files
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:28 +09:00
Ed Cashin
7392fbe5ad aoe: update internal version number to 49
The internal version number of the aoe driver appears in a console message
when the driver loads and is usually obtained by the user with the
userland aoe-version tool, part of the aoetools.[1]

Although this patchset includes bugfixes backported from higher-numbered
versions published on the coraid.com website, it is a form of version 49.

1. http://aoetools.sourceforge.net/

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:27 +09:00
Ed Cashin
b21faa25c6 aoe: remove unused code and add cosmetic improvements
This change removes some unused code and attempts to increase code
consistency.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:27 +09:00