linux-uconsole/drivers/md
BingJing Chang ccae23ff45 md/raid5: fix data corruption of replacements after originals dropped
[ Upstream commit d63e2fc804 ]

During raid5 replacement, the stripes can be marked with R5_NeedReplace
flag. Data can be read from being-replaced devices and written to
replacing spares without reading all other devices. (It's 'replace'
mode. s.replacing = 1) If a being-replaced device is dropped, the
replacement progress will be interrupted and resumed with pure recovery
mode. However, existing stripes before being interrupted cannot read
from the dropped device anymore. It prints lots of WARN_ON messages.
And it results in data corruption because existing stripes write
problematic data into its replacement device and update the progress.

\# Erase disks (1MB + 2GB)
dd if=/dev/zero of=/dev/sda bs=1MB count=2049
dd if=/dev/zero of=/dev/sdb bs=1MB count=2049
dd if=/dev/zero of=/dev/sdc bs=1MB count=2049
dd if=/dev/zero of=/dev/sdd bs=1MB count=2049
mdadm -C /dev/md0 -amd -R -l5 -n3 -x0 /dev/sd[abc] -z 2097152
\# Ensure array stores non-zero data
dd if=/root/data_4GB.iso of=/dev/md0 bs=1MB
\# Start replacement
mdadm /dev/md0 -a /dev/sdd
mdadm /dev/md0 --replace /dev/sda

Then, Hot-plug out /dev/sda during recovery, and wait for recovery done.
echo check > /sys/block/md0/md/sync_action
cat /sys/block/md0/md/mismatch_cnt # it will be greater than 0.

Soon after you hot-plug out /dev/sda, you will see many WARN_ON
messages. The replacement recovery will be interrupted shortly. After
the recovery finishes, it will result in data corruption.

Actually, it's just an unhandled case of replacement. In commit
<f94c0b6658> (md/raid5: fix interaction of 'replace' and 'recovery'.),
if a NeedReplace device is not UPTODATE then that is an error, the
commit just simply print WARN_ON but also mark these corrupted stripes
with R5_WantReplace. (it means it's ready for writes.)

To fix this case, we can leverage 'sync and replace' mode mentioned in
commit <9a3e1101b8> (md/raid5: detect and handle replacements during
recovery.). We can add logics to detect and use 'sync and replace' mode
for these stripes.

Reported-by: Alex Chen <alexchen@synology.com>
Reviewed-by: Alex Wu <alexwu@synology.com>
Reviewed-by: Chung-Chiang Cheng <cccheng@synology.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19 22:48:57 +02:00
..
bcache bcache: release dc->writeback_lock properly in bch_writeback_thread() 2018-09-09 20:04:36 +02:00
persistent-data dm btree: fix serious bug in btree_split_beneath() 2018-01-23 19:50:16 +01:00
bitmap.c md/bitmap: disable bitmap_resize for file-backed bitmaps. 2017-09-27 11:00:14 +02:00
bitmap.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
dm-bio-prison.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm-bio-prison.h dm bio prison: add dm_cell_promote_or_release() 2015-05-29 14:19:06 -04:00
dm-bio-record.h
dm-bufio.c dm bufio: don't take the lock in dm_bufio_shrink_count 2018-07-11 16:03:51 +02:00
dm-bufio.h
dm-builtin.c
dm-cache-block-types.h
dm-cache-metadata.c dm cache metadata: save in-core policy_hint_size to on-disk superblock 2018-09-09 20:04:33 +02:00
dm-cache-metadata.h dm cache: make sure every metadata function checks fail_io 2016-04-12 09:08:40 -07:00
dm-cache-policy-cleaner.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-cache-policy-internal.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-policy-mq.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-cache-policy-smq.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-cache-policy.c
dm-cache-policy.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-target.c dm cache: fix corruption seen when using cache > 2TB 2017-03-12 06:37:26 +01:00
dm-crypt.c dm crypt: mark key as invalid until properly loaded 2017-01-06 11:16:15 +01:00
dm-delay.c dm delay: document that offsets are specified in sectors 2015-10-31 19:06:05 -04:00
dm-era-target.c dm era: save spacemap metadata root after the pre-commit 2017-05-20 14:27:00 +02:00
dm-exception-store.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-exception-store.h dm snapshot: fix hung bios when copy error occurs 2016-03-03 15:07:14 -08:00
dm-flakey.c dm flakey: return -EINVAL on interval bounds error in flakey_ctr() 2017-01-06 11:16:15 +01:00
dm-io.c dm io: fix duplicate bio completion due to missing ref count 2018-03-11 16:19:47 +01:00
dm-ioctl.c dm ioctl: remove double parentheses 2018-04-08 11:51:57 +02:00
dm-kcopyd.c dm kcopyd: avoid softlockup in run_complete_job 2018-09-15 09:40:39 +02:00
dm-linear.c dm linear: remove redundant target name from error messages 2015-10-31 19:06:03 -04:00
dm-log-userspace-base.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c dm log writes: fix bug with too large bios 2016-10-07 15:23:47 +02:00
dm-log.c
dm-mpath.c dm mpath: check if path's request_queue is dying in activate_path() 2016-10-28 03:01:28 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid.c dm raid: fix round up of default region size 2015-10-02 12:02:31 -04:00
dm-raid1.c dm mirror: fix read error on recovery after default leg failure 2016-11-10 16:36:35 +01:00
dm-region-hash.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-round-robin.c
dm-service-time.c
dm-snap-persistent.c dm snapshot: fix hung bios when copy error occurs 2016-03-03 15:07:14 -08:00
dm-snap-transient.c dm snapshot: fix hung bios when copy error occurs 2016-03-03 15:07:14 -08:00
dm-snap.c dm snapshot: disallow the COW and origin devices from being identical 2016-04-12 09:08:39 -07:00
dm-stats.c dm stats: fix a leaked s->histogram_boundaries array 2017-03-12 06:37:26 +01:00
dm-stats.h dm stats: support precise timestamps 2015-06-17 12:40:40 -04:00
dm-stripe.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-switch.c dm switch: simplify conditional in alloc_region_table() 2015-10-31 19:06:06 -04:00
dm-sysfs.c
dm-table.c dm snapshot: disallow the COW and origin devices from being identical 2016-04-12 09:08:39 -07:00
dm-target.c
dm-thin-metadata.c dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6 2018-01-23 19:50:17 +01:00
dm-thin-metadata.h dm thin metadata: add dm_thin_remove_range() 2015-06-11 17:13:04 -04:00
dm-thin.c dm thin: handle running out of data space vs concurrent discard 2018-07-03 11:21:35 +02:00
dm-uevent.c
dm-uevent.h
dm-verity.c dm: refactor ioctl handling 2015-10-31 19:05:59 -04:00
dm-zero.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm.c dm: correctly handle chained bios in dec_pending() 2018-02-22 15:45:01 +01:00
dm.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
faulty.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
Kconfig dm raid: select the Kconfig option CONFIG_MD_RAID0 2017-05-25 14:30:07 +02:00
linear.c md/linear: shutup lockdep warnning 2017-10-21 17:09:05 +02:00
linear.h md linear: fix a race between linear_add() and linear_congested() 2017-03-12 06:37:30 +01:00
Makefile raid5: add basic stripe log 2015-10-24 17:16:19 +11:00
md-cluster.c md-cluster: fix potential lock issue in add_new_disk 2018-04-13 19:50:09 +02:00
md-cluster.h md-cluster: Fix adding of new disk with new reload code 2015-10-12 03:35:30 -05:00
md.c md: fix NULL dereference of mddev->pers in remove_and_add_spares() 2018-08-06 16:24:35 +02:00
md.h md/raid: only permit hot-add of compatible integrity profiles 2016-02-17 12:30:57 -08:00
multipath.c md: multipath: don't hardcopy bio in .make_request path 2016-04-12 09:08:57 -07:00
multipath.h
raid0.c md/raid0: apply base queue limits *before* disk_stack_limits 2015-10-02 17:23:44 +10:00
raid0.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
raid1.c md/raid1: fix NULL pointer dereference 2018-05-30 07:49:01 +02:00
raid1.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
raid5-cache.c raid5-cache: start raid5 readonly if journal is missing 2015-11-01 13:48:29 +11:00
raid5.c md/raid5: fix data corruption of replacements after originals dropped 2018-09-19 22:48:57 +02:00
raid5.h RAID5: revert e9e4c377e2 to fix a livelock 2016-04-12 09:08:57 -07:00
raid10.c md/raid10: fix that replacement cannot complete recovery after reassemble 2018-08-24 13:26:57 +02:00
raid10.h md/raid10: ensure device failure recorded before write request returns. 2015-08-31 19:43:45 +02:00