If a synctarget lost connection while being WFSyncUUID,
due to "state sanitizing", the attempted state change to SyncTarget
looked like an "invalidate" to after_state_ch() later,
thus caused a full sync on next handshake (Bug #318).
drbd0: PingAck did not arrive in time.
drbd0: peer( Primary -> Unknown ) conn( WFSyncUUID -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
from : { cs:NetworkFailure ro:Secondary/Unknown ds:UpToDate/DUnknown r--- }
to : { cs:SyncTarget ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
after sanizising, resulted in
state: { cs:NetworkFailure ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
drbd0: disk( UpToDate -> Inconsistent )
Fix:
don't mask state transition errors in "sanitizing",
so the requested state change to SyncTarget fails,
instead of being implicitly "remaped" to invalidate.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we cannot satisfy a request (because our disk just broke),
we still need to drain the payload. Or we'll get a protocol error
when interpreting the payload as DRBD packet header.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
BUG trace would look like:
lc_find
drbd_rs_complete_io
got_OVResult
drbd_asender
Could be triggered by explicit, or IO-error policy based,
detach during online-verify.
We may only dereference mdev->resync, if we first get_ldev(), as the
disk may break any time, causing mdev->resync to disappear once all
ldev references have been returned.
Already in flight online-verify requests or replies may still come in,
which we then need to ignore.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Just in case we have some pending meta data changes to sync, do it
before we call our userland helper, as that may take some time,
or even cause a hard reboot.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
addendum to baa33ae4eaa4477b60af7c434c0ddd1d182c1ae7
The race:
drbd_md_sync()
if (!test_and_clear_bit(MD_DIRTY, &mdev->flags))
return;
==> RACE with drbd_md_mark_dirty() rearming the timer.
del_timer(&mdev->md_sync_timer);
Fixed by moving the del_timer before the test_and_clear_bit.
Additionally only rearm the timer in drbd_md_mark_dirty, if MD_DIRTY was
not already set, reduce the grace period from five to one second, and
add an ifdef'ed debuging aid to find code paths missing an explicit
drbd_md_sync, if any, as those are the only relevant ones for this race.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The actual race happened int the drbd_start_resync() function. Where
drbd_resync_finished() -> __drbd_set_state() set STOP_SYNC_TIMER and
armed the timer.
If the timer fired before execution reaches the mod_timer statement
at the end of drbd_start_resync() the latter would cause an
unexpected call to w_make_resync_request().
Removed the STOP_SYNC_TIMER bit, and base it on the connection state.
The STOP_SYNC_TIMER bit probably originates probably the time before
the state engine.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If pacemaker (for example) decided to initialize minor devices not in
the exact sync-after dependency order, the configuration partially
failed with an error "The sync-after minor number is invalid". (Bugz. #322)
We can avoid that by implicitly creating unconfigured minor devices,
if others depend on them.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If a drbd_nl_net_conf hits the small window between the state change
to C_STANDALONE and the corresponding cleanup in after_state_ch,
that cleanup would throw away stuff we now need again,
and later trigger BUG_ON()s.
Fixed by properly serializing the new config request with
any pending cleanup.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When the complete device is marked as out of sync, we can disable
updates of the on disk AL. Currently AL updates are only disabled
if one uses the "invalidate-remote" command on an unconnected,
primary device, or when at attach time all bits in the bitmap are
set.
As of now, AL updated do not get disabled when a all bits becomes
set due to application writes to an unconnected DRBD device.
While this is a missing feature, it is not considered important,
and might get added later.
BTW, after initializing a "one legged" DRBD device
drbdadm create-md resX
drbdadm -- --force primary resX
AL updates also get disabled, until the first connect.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Now we have multiple BIOs per ee, packets with a 32 bit length field,
it gets time to use these goodies.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we intent to use the block_id member of an epoch entry,
we may not use the digest member.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We now track the data rate of locally submitted resync related requests,
and can thus detect non-resync activity on the lower level device.
If the current sync rate is above c-min-rate, and the lower level device
appears to be busy, we throttle the resyncer.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
also canonicalize the return values of read_for_csum
and drbd_rs_begin_io to return -ESOMETHING, or 0 for success.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The current resync speed as displayed in /proc/drbd fluctuates a lot.
Using an array of rolling marks makes this calculation much more stable.
We used to have this (a long time ago with 0.7), but it got lost somehow.
If "stalled", do not discard the rest of the information, just add a
" (stalled)" tag to the progress line.
This patch also shortens a spinlock critical section somewhat, and
reduces the number of atomic operations in put_ldev.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The commit 288f422ec1
drbd: Track all IO requests on the TL, not writes only
moved a list_add_tail(req, ) into a region where req
may have just been freed due to conflict detection.
Fix this by adding a proper cleanup section for that code path.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We get the following when building on ppc64 due to lack of include of
<asm/io.h>:
In file included from drivers/spi/spi_fsl_espi.c:25:0:
drivers/spi/spi_fsl_lib.h: In function 'mpc8xxx_spi_write_reg':
drivers/spi/spi_fsl_lib.h:88:2: error: implicit declaration of function 'out_be32'
drivers/spi/spi_fsl_lib.h: In function 'mpc8xxx_spi_read_reg':
drivers/spi/spi_fsl_lib.h:93:2: error: implicit declaration of function 'in_be32'
drivers/spi/spi_fsl_espi.c: In function 'fsl_espi_remove':
drivers/spi/spi_fsl_espi.c:571:2: error: implicit declaration of function 'iounmap'
drivers/spi/spi_fsl_espi.c: In function 'fsl_espi_probe':
drivers/spi/spi_fsl_espi.c:602:2: error: implicit declaration of function 'ioremap'
drivers/spi/spi_fsl_espi.c:602:24: warning: assignment makes pointer from integer without a cast
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Make sure the initial insertation of the catalog entry already contains
the device number by calling init_special_inode early and setting writing
out the dev field of the on-disk permission structure. The latter is
facilitated by sharing the almost identical hfsplus_set_perms helpers
between initial catalog entry creating and ->write_inode.
Unless we crashed just after mknod this bug was harmless as the inode
is marked dirty at the end of hfsplus_mknod, and hfsplus_write_inode
will update the catalog entry to contain the correct value.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
The rootflags field in hfsplus_inode_info only caches the immutable and
append-only flags in the VFS inode, so we can easily get rid of it.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
HFS implements hardlink by using indirect catalog entries that refer to a hidden
directly. The link target is cached in the dev field in the HFS+ specific
inode, which is also used for the device number for device files, and inside
for passing the nlink value of the indirect node from hfsplus_cat_write_inode
to a helper function. Now if we happen to write out the indirect node while
hfsplus_link is creating the catalog entry we'll get a link pointing to the
linkid of the current nlink value. This can easily be reproduced by a large
enough loop of local git-clone operations.
Stop abusing the dev field in the HFS+ inode for short term storage by
refactoring the way the permission structure in the catalog entry is
set up, and rename the dev field to linkid to avoid any confusion.
While we're at it also prevent creating hard links to special files, as
the HFS+ dev and linkid share the same space in the on-disk structure.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
hfs seems prone to bad things when it encounters on disk corruption. Many
values are read from disk, and used as lengths to memcpy, as an example.
This patch fixes up several of these problematic cases.
o sanity check the on-disk maximum key lengths on mount
(these are set to a defined value at mkfs time and shouldn't differ)
o check on-disk node keylens against the maximum key length for each tree
o fix hfs_btree_open so that going out via free_tree: doesn't wind
up in hfs_releasepage, which wants to follow the very pointer
we were trying to set up:
HFS_SB(sb)->cat_tree = hfs_btree_open()
.
failure gets to hfs_releasepage and tries to follow HFS_SB(sb)->cat_tree
Tested with the fsfuzzer; it survives more than it used to.
[hch: ported of commit cf05946250 from hfs]
[hch: added the fixes from 5581d018ed3493d226e7a4d645d9c8a5af6c36b]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
oops and fs corruption; the latter can happen even on valid fs in case of oom.
[hch: port of commit 3d10a15d69 from hfs]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
A particular fsfuzzer run caused an hfs file system to crash on mount. This
is due to a corrupted MDB extent record causing a miscalculation of
HFSPLUS_I(inode)->first_blocks for the extent tree. If the extent records
are zereod out, then it won't trigger the first_blocks special case and
instead falls through to the extent code, which we're in the middle
of initializing.
This patch catches the 0 size extent records, reports the corruption,
and fails the mount.
[hch: ported of commit 47f365eb57 from hfs]
Reported-by: Ramon de Carvalho Valle <rcvalle@linux.vnet.ibm.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
We may not free tl_hash when IO is suspended, since we can not wait
until ap_bio_cnt reaches zero.
We can do this after susp reched 0, since then tl_clear was called
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
After disconnect (most likely mdev->net_cnt == 0) and we are
still in an unstable state (!drbd_state_is_stable()). When we
get an IO request in drbd_get_max_buffers() (called from
__inc_ap_bio_cond(), called from inc_ap_bio()) we wake up
misc_wait. Misc_wait is also used in inc_ap_bio() to sleep
until the outcome of __inc_ap_bio_cond() changes. => Busy loop!
Solution: Have a dedicated wait queue for get_net_conf() and
put_net_conf().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Make sure the state engine can deny two primaries to connect
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When a fencing policy of "resource-and-stonith" is configured,
and DRBD looses connection to it's peer, we can delay the
creation of a new current-UUID until IO gets thawed.
That allows one to deploy fence-peer handlers that actually
commit suicide on the machine they get started.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Since we can not thaw the transfer log, the next logical step is
to allow reconnects while the fence-peer handler runs.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
State transitions in the space of non-allowed states used
to be very noisy. Reduce that, since that has little value
for the majority of the user base.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>