linux-pinenote

Author	SHA1	Message	Date
Philipp Reisner	6b75dced00	drbd: conn_khelper() for user mode callbacks for connections Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:32 +01:00
Philipp Reisner	047e95e259	drbd: Allow volumes to become primary only on one side Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:31 +01:00
Lars Ellenberg	40cbf085f5	drbd: fix conn_reconfig_start without conn_reconfig_done in drbd_adm_attach If drbd_adm_attach failed early, it left the CONFIG_PENDING bit on, blocking any further conn_reconfig_start on that connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:31 +01:00
Philipp Reisner	e4f78edee1	drbd: Separate connection state changes from minor dev state changes #2 New function got_conn_RqSReply() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:30 +01:00
Philipp Reisner	f19e4f8ba7	drbd: Converted got_Ping() and got_PingAck() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:30 +01:00
Philipp Reisner	a4fbda8eca	drbd: Allow packet handler functions that take a connection (meta connection) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:29 +01:00
Philipp Reisner	dfafcc8a7b	drbd: Separate connection state changes from minor dev state changes #1 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:28 +01:00
Philipp Reisner	7204624c5e	drbd: Converted receive_protocol() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:28 +01:00
Philipp Reisner	d9ae84e790	drbd: Allow packet handler functions that take a connection That is necessary in case a connection does not have a volume 0 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:27 +01:00
Philipp Reisner	8169e41b3e	drbd: Moved CONN_DRY_RUN to the per connection (tconn) flags Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:27 +01:00
Philipp Reisner	38fa9988fa	drbd: Do not modify the connection state with something else that conn_request_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:26 +01:00
Philipp Reisner	34f646bd57	drbd: Allow two diskless minors to be connected In the context of drbd-8.4 it no longer makes sense to dissalow that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:25 +01:00
Philipp Reisner	2325eb661f	drbd: New minors have to intherit the connection state form their connection Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:25 +01:00
Philipp Reisner	082a3439a2	drbd: process_done_ee() has to handle unconfigured devices now Took the chance and converted tconn_process_done_ee() to use idr_for_each_entry() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:24 +01:00
Philipp Reisner	2de876efa6	drbd: Ignore packets for non existing volumes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:23 +01:00
Lars Ellenberg	85f75dd763	drbd: introduce in-kernel "down" command This greatly simplifies deconfiguration of whole resources. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:23 +01:00
Lars Ellenberg	3c5e5f6afd	drbd: add forgotten spin_unlock somehow a "goto abort" was introduced with commit drbd: Extracted is_valid_transition() out of sanitize_state() which left drbd_req_state still holding the spin lock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:22 +01:00
Lars Ellenberg	527f4b24e5	drbd: bail out if a config requrest is over-determined, and not matching We have resources resp. connections, volumes, and minor numbers. A config request may specifies all three of them. If it turns out that the minor belongs to a different connection, or a different volume number in the same connection, that configuration request is invalid. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:21 +01:00
Lars Ellenberg	38f19616d2	drbd: new-connection and new-minor succeed, if the object already exists Follow O_CREAT semantics when creating connection or minor device/volume objects. If we need O_CREAT\|O_EXCL semantics some time down the road, we can add NLM_F_EXCL to the netlink message flags. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:21 +01:00
Lars Ellenberg	cffec5b2fe	drbd: Allow a Diskless Secondary volume to be removed Even if the connection is still established. We should be able to reduce a volume from a replication group, without taking the whole group offline. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:20 +01:00
Lars Ellenberg	d0456c72df	drbd: simplify conn_all_vols_unconf, make it bool Get rid of a temporary variable and, funny bitand assignment. Just short circuit, returning false, once we encounter the first still configured volume. FIXME verify call sites for need of rcu_read_lock or stronger. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:19 +01:00
Lars Ellenberg	543cc10b4c	drbd: drbd_adm_get_status needs to show some more detail We want to see existing connection objects, even if they do not currently have volumes attached. Change the .dumpit variant of drbd_adm_get_status to iterate not over minor devices, but over connections + volumes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:19 +01:00
Lars Ellenberg	8432b31457	drbd: allow holes in minor and volume id allocation s/idr_get_new/idr_get_new_above/ Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:17 +01:00
Lars Ellenberg	3b98c0c209	drbd: switch configuration interface from connector to genetlink Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:17 +01:00
Alex Elder	9e15b77d9a	rbd: get additional info in parent spec When a layered rbd image has a parent, that parent is identified only by its pool id, image id, and snapshot id. Images that have been mapped also record names for those three id's. Add code to look up these names for parent images so they match mapped images more closely. Skip doing this for an image if it already has its pool name defined (this will be the case for images mapped by the user). It is possible that an the name of a parent image can't be determined, even if the image id is valid. If this occurs it does not preclude correct operation, so don't treat this as an error. On the other hand, defined pools will always have both an id and a name. And any snapshot of an image identified as a parent for a clone image will exist, and will have a name (if not it indicates some other internal error). So treat failure to get these bits of information as errors. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	86b00e0da6	rbd: get parent spec for version 2 images Add support for getting the the information identifying the parent image for rbd images that have them. The child image holds a reference to its parent image specification structure. Create a new entry "parent" in /sys/bus/rbd/image/N/ to report the identifying information for the parent image, if any. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	a92ffdf8a9	rbd: allow null image name Format 2 parent images are partially identified by their image id, but it may not be possible to determine their image name. The name is not strictly needed for correct operation, so we won't be treating it as an error if we don't know it. Handle this case gracefully in rbd_name_show(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	2c0d0a10ea	rbd: allow null image name We will know the image id for format 2 parent images, but won't initially know its image name. Avoid making the query for an image id in rbd_dev_image_id() if it's already known. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	83a0626362	rbd: encapsulate last part of probe Group the activities that now take place after an rbd_dev_probe() call into a single function, and move the call to that function into rbd_dev_probe() itself. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	c53d589337	rbd: define rbd_dev_{create,destroy}() helpers Encapsulate the creation/initialization and destruction of rbd device structures. The rbd_client and the rbd_spec structures provided on creation hold references whose ownership is transferred to the new rbd_device structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	bd4ba6554d	rbd: consolidate rbd_dev init in rbd_add() Group the allocation and initialization of fields of the rbd device structure created in rbd_add(). Move the grouped code down later in the function, just prior to the call to rbd_dev_probe(). This is for the most part simple code movement. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	9d3997fdf4	rbd: don't pass rbd_dev to rbd_get_client() The only reason rbd_dev is passed to rbd_get_client() is so its rbd_client field can get assigned. Instead, just return the rbd_client pointer as a result and have the caller do the assignment. Change rbd_put_client() so it takes an rbd_client structure, so follows the more typical symmetry with rbd_get_client(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	859c31df9c	rbd: fill rbd_spec in rbd_add_parse_args() Pass the address of an rbd_spec structure to rbd_add_parse_args(). Use it to hold the information defining the rbd image to be mapped in an rbd_add() call. Use the result in the caller to initialize the rbd_dev->id field. This means rbd_dev is no longer needed in rbd_add_parse_args(), so get rid of it. Now that this transformation of rbd_add_parse_args() is complete, correct and expand on the its header documentation to reflect the new reality. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	8b8fb99c5c	rbd: add reference counting to rbd_spec With layered images we'll share rbd_spec structures, so add a reference count to it. It neatens up some code also. A silly get/put pair is added to the alloc routine just to avoid "defined but not used" warnings. It will go away soon. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:41 -05:00
Roger Pau Monne	0a8704a51f	xen/blkback: Persistent grant maps for xen blk drivers This patch implements persistent grants for the xen-blk{front,back} mechanism. The effect of this change is to reduce the number of unmap operations performed, since they cause a (costly) TLB shootdown. This allows the I/O performance to scale better when a large number of VMs are performing I/O. Previously, the blkfront driver was supplied a bvec[] from the request queue. This was granted to dom0; dom0 performed the I/O and wrote directly into the grant-mapped memory and unmapped it; blkfront then removed foreign access for that grant. The cost of unmapping scales badly with the number of CPUs in Dom0. An experiment showed that when Dom0 has 24 VCPUs, and guests are performing parallel I/O to a ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests (at which point 650,000 IOPS are being performed in total). If more than 5 guests are used, the performance declines. By 10 guests, only 400,000 IOPS are being performed. This patch improves performance by only unmapping when the connection between blkfront and back is broken. On startup blkfront notifies blkback that it is using persistent grants, and blkback will do the same. If blkback is not capable of persistent mapping, blkfront will still use the same grants, since it is compatible with the previous protocol, and simplifies the code complexity in blkfront. To perform a read, in persistent mode, blkfront uses a separate pool of pages that it maps to dom0. When a request comes in, blkfront transmutes the request so that blkback will write into one of these free pages. Blkback keeps note of which grefs it has already mapped. When a new ring request comes to blkback, it looks to see if it has already mapped that page. If so, it will not map it again. If the page hasn't been previously mapped, it is mapped now, and a record is kept of this mapping. Blkback proceeds as usual. When blkfront is notified that blkback has completed a request, it memcpy's from the shared memory, into the bvec supplied. A record that the {gref, page} tuple is mapped, and not inflight is kept. Writes are similar, except that the memcpy is peformed from the supplied bvecs, into the shared pages, before the request is put onto the ring. Blkback stores a mapping of grefs=>{page mapped to by gref} in a red-black tree. As the grefs are not known apriori, and provide no guarantees on their ordering, we have to perform a search through this tree to find the page, for every gref we receive. This operation takes O(log n) time in the worst case. In blkfront grants are stored using a single linked list. The maximum number of grants that blkback will persistenly map is currently set to RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, to prevent a malicios guest from attempting a DoS, by supplying fresh grefs, causing the Dom0 kernel to map excessively. If a guest is using persistent grants and exceeds the maximum number of grants to map persistenly the newly passed grefs will be mapped and unmaped. Using this approach, we can have requests that mix persistent and non-persistent grants, and we need to handle them correctly. This allows us to set the maximum number of persistent grants to a lower value than RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, although setting it will lead to unpredictable performance. In writing this patch, the question arrises as to if the additional cost of performing memcpys in the guest (to/from the pool of granted pages) outweigh the gains of not performing TLB shootdowns. The answer to that question is `no'. There appears to be very little, if any additional cost to the guest of using persistent grants. There is perhaps a small saving, from the reduced number of hypercalls performed in granting, and ending foreign access. Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up the misuse of bool as int]	2012-10-30 09:50:04 -04:00
Alex Elder	0d7dbfce9d	rbd: define image specification structure Group the fields that uniquely specify an rbd image into a new reference-counted rbd_spec structure. This structure will be used to describe the desired image when mapping an image, and when probing parent images in layered rbd devices. Replace the set of fields in the rbd device structure with a pointer to a dynamically allocated rbd_spec. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:30 -05:00
Alex Elder	dc79b113d6	rbd: have rbd_add_parse_args() return error Change the interface to rbd_add_parse_args() so it returns an error code rather than a pointer. Return the ceph_options result via a pointer whose address is passed as an argument. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:30 -05:00
Alex Elder	4e9afeba7a	rbd: pass and populate rbd_options structure Have the caller pass the address of an rbd_options structure to rbd_add_parse_args(), to be initialized with the information gleaned as a result of the parse. I know, this is another near-reversal of a recent change... Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	819d52bf72	rbd: remove snap_name arg from rbd_add_parse_args() The snapshot name returned by rbd_add_parse_args() just gets saved in the rbd_dev eventually. So just do that inside that function and do away with the snap_name argument, both in rbd_add_parse_args() and rbd_dev_set_mapping(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	f28e565a1b	rbd: remove options args from rbd_add_parse_args() They "options" argument to rbd_add_parse_args() (and it's partner options_size) is now only needed within the function, so there's no need to have the caller allocate and pass the options buffer. Just allocate the options buffer within the function using dup_token(). Also distinguish between failures due to failed memory allocation and failing because a required argument was missing. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	e5c3553404	rbd: get rid of snap_name_len The value returned in the "snap_name_len" argument to rbd_add_parse_args() is never actually used, so get rid of it. The snap_name_len recorded in rbd_dev_v2_snap_name() is not useful either, so get rid of that too. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	0ddebc0c6c	rbd: do all argument parsing in one place This patch makes rbd_add_parse_args() be the single place all argument parsing occurs for an image map request: - Move the ceph_parse_options() call into that function - Use local variables rather than parameters to hold the list of monitor addresses supplied - Rather than returning it, pass the snapshot name (and its length) back via parameters - Have the function return a ceph_options structure pointer Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	78cea76e05	rbd: move ceph_parse_options() call up Move option parsing out of rbd_get_client() and into its caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	daba5fdb4c	rbd: rename snap_exists field A Boolean field "snap_exists" in an rbd mapping is used to indicate whether a mapped snapshot has been removed from an image's snapshot context, to stop sending requests for that snapshot as soon as we know it's gone. Generalize the interpretation of this field so it applies to non-snapshot (i.e. "head") mappings. That is, define its value to be false until the mapping has been set, and then define it to be true for both snapshot mappings or head mappings. Rename the field "exists" to reflect the broader interpretation. The rbd_mapping structure is on its way out, so move the field back into the rbd_device structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	971f839a76	rbd: move snap info out of rbd_mapping struct Moving the snap_id and snap_name fields into the separate rbd_mapping structure was misguided. (And in time, perhaps we'll do away with that structure altogether...) Move these fields back into struct rbd_device. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	86992098e7	rbd: make pool_id a 64 bit value If a format 2 image has a parent, its pool id will be specified using a 64-bit value. Change the pool id we save for an image to match that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	41f38c2b2f	rbd: remove snapshots on error in rbd_add() If rbd_dev_snaps_update() has ever been called for an rbd device structure there could be snapshot structures on its snaps list. In rbd_add(), this function is called but a subsequent error path neglected to clean up any of these snapshots. Add a call to rbd_remove_all_snaps() in the appropriate spot to remedy this. Change a couple of error labels to be a little clearer while there. Drop the leading underscores from the function name; there's nothing special about that function that they might signify. As suggested in review, the leading underscores in __rbd_remove_snap_dev() have been removed as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:28 -05:00
Alex Elder	f7760dad28	rbd: simplify rbd_rq_fn() When processing a request, rbd_rq_fn() makes clones of the bio's in the request's bio chain and submits the results to osd's to be satisfied. If a request bio straddles the boundary between objects backing the rbd image, it must be represented by two cloned bio's, one for the first part (at the end of one object) and one for the second (at the beginning of the next object). This has been handled by a function bio_chain_clone(), which includes an interface only a mother could love, and which has been found to have other problems. This patch defines two new fairly generic bio functions (one which replaces bio_chain_clone()) to help out the situation, and then revises rbd_rq_fn() to make use of them. First, bio_clone_range() clones a portion of a single bio, starting at a given offset within the bio and including only as many bytes as requested. As a convenience, a request to clone the entire bio is passed directly to bio_clone(). Second, bio_chain_clone_range() performs a similar function, producing a chain of cloned bio's covering a sub-range of the source chain. No bio_pair structures are used, and if successful the result will represent exactly the specified range. Using bio_chain_clone_range() makes bio_rq_fn() a little easier to understand, because it avoids the need to pass very much state information between consecutive calls. By avoiding the need to track a bio_pair structure, it also eliminates the problem described here: http://tracker.newdream.net/issues/2933 Note that a block request (and therefore the complete length of a bio chain processed in rbd_rq_fn()) is an unsigned int, while the result of rbd_segment_length() is u64. This change makes this range trunctation explicit, and trips a bug if the the segment boundary is too far off. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:28 -05:00
Lars Ellenberg	ccae7868b0	drbd: log request sector offset and size for IO errors Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	a2a3c74f24	drbd: always write bitmap on detach If we detach due to local read-error (which sets a bit in the bitmap), stay Primary, and then re-attach (which re-reads the bitmap from disk), we potentially lost the "out-of-sync" (or, "bad block") information in the bitmap. Always (try to) write out the changed bitmap pages before going diskless. That way, we don't lose the bit for the bad block, the next resync will fetch it from the peer, and rewrite it locally, which may result in block reallocation in some lower layer (or the hardware), and thereby "heal" the bad blocks. If the bitmap writeout errors out as well, we will (again: try to) mark the "we need a full sync" bit in our super block, if it was a READ error; writes are covered by the activity log already. If that superblock does not make it to disk either, we are sorry. Maybe we just lost an entire disk or controller (or iSCSI connection), and there actually are no bad blocks at all, so we don't need to re-fetch from the peer, there is no "auto-healing" necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00

... 34 35 36 37 38 ...

4,663 commits