linux-uconsole/drivers/md
Mike Snitzer 052ef26b08 dm thin: handle running out of data space vs concurrent discard
commit a685557fbb upstream.

Discards issued to a DM thin device can complete to userspace (via
fstrim) _before_ the metadata changes associated with the discards is
reflected in the thinp superblock (e.g. free blocks).  As such, if a
user constructs a test that loops repeatedly over these steps, block
allocation can fail due to discards not having completed yet:
1) fill thin device via filesystem file
2) remove file
3) fstrim

From initial report, here:
https://www.redhat.com/archives/dm-devel/2018-April/msg00022.html

"The root cause of this issue is that dm-thin will first remove
mapping and increase corresponding blocks' reference count to prevent
them from being reused before DISCARD bios get processed by the
underlying layers. However. increasing blocks' reference count could
also increase the nr_allocated_this_transaction in struct sm_disk
which makes smd->old_ll.nr_allocated +
smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks.
In this case, alloc_data_block() will never commit metadata to reset
the begin pointer of struct sm_disk, because sm_disk_get_nr_free()
always return an underflow value."

While there is room for improvement to the space-map accounting that
thinp is making use of: the reality is this test is inherently racey and
will result in the previous iteration's fstrim's discard(s) completing
vs concurrent block allocation, via dd, in the next iteration of the
loop.

No amount of space map accounting improvements will be able to allow
user's to use a block before a discard of that block has completed.

So the best we can really do is allow DM thinp to gracefully handle such
aggressive use of all the pool's data by degrading the pool into
out-of-data-space (OODS) mode.  We _should_ get that behaviour already
(if space map accounting didn't falsely cause alloc_data_block() to
believe free space was available).. but short of that we handle the
current reality that dm_pool_alloc_data_block() can return -ENOSPC.

Reported-by: Dennis Yang <dennisyang@qnap.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:21:35 +02:00
..
bcache bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set 2018-05-30 07:49:11 +02:00
persistent-data dm btree: fix serious bug in btree_split_beneath() 2018-01-23 19:50:16 +01:00
bitmap.c md/bitmap: disable bitmap_resize for file-backed bitmaps. 2017-09-27 11:00:14 +02:00
bitmap.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
dm-bio-prison.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm-bio-prison.h dm bio prison: add dm_cell_promote_or_release() 2015-05-29 14:19:06 -04:00
dm-bio-record.h
dm-bufio.c dm bufio: fix shrinker scans when (nr_to_scan < retain_target) 2018-01-17 09:35:24 +01:00
dm-bufio.h
dm-builtin.c
dm-cache-block-types.h
dm-cache-metadata.c dm cache metadata: fail operations if fail_io mode has been established 2017-05-25 14:30:08 +02:00
dm-cache-metadata.h dm cache: make sure every metadata function checks fail_io 2016-04-12 09:08:40 -07:00
dm-cache-policy-cleaner.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-cache-policy-internal.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-policy-mq.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-cache-policy-smq.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-cache-policy.c
dm-cache-policy.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-target.c dm cache: fix corruption seen when using cache > 2TB 2017-03-12 06:37:26 +01:00
dm-crypt.c dm crypt: mark key as invalid until properly loaded 2017-01-06 11:16:15 +01:00
dm-delay.c dm delay: document that offsets are specified in sectors 2015-10-31 19:06:05 -04:00
dm-era-target.c dm era: save spacemap metadata root after the pre-commit 2017-05-20 14:27:00 +02:00
dm-exception-store.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-exception-store.h dm snapshot: fix hung bios when copy error occurs 2016-03-03 15:07:14 -08:00
dm-flakey.c dm flakey: return -EINVAL on interval bounds error in flakey_ctr() 2017-01-06 11:16:15 +01:00
dm-io.c dm io: fix duplicate bio completion due to missing ref count 2018-03-11 16:19:47 +01:00
dm-ioctl.c dm ioctl: remove double parentheses 2018-04-08 11:51:57 +02:00
dm-kcopyd.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
dm-linear.c dm linear: remove redundant target name from error messages 2015-10-31 19:06:03 -04:00
dm-log-userspace-base.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c dm log writes: fix bug with too large bios 2016-10-07 15:23:47 +02:00
dm-log.c
dm-mpath.c dm mpath: check if path's request_queue is dying in activate_path() 2016-10-28 03:01:28 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid.c dm raid: fix round up of default region size 2015-10-02 12:02:31 -04:00
dm-raid1.c dm mirror: fix read error on recovery after default leg failure 2016-11-10 16:36:35 +01:00
dm-region-hash.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-round-robin.c
dm-service-time.c
dm-snap-persistent.c dm snapshot: fix hung bios when copy error occurs 2016-03-03 15:07:14 -08:00
dm-snap-transient.c dm snapshot: fix hung bios when copy error occurs 2016-03-03 15:07:14 -08:00
dm-snap.c dm snapshot: disallow the COW and origin devices from being identical 2016-04-12 09:08:39 -07:00
dm-stats.c dm stats: fix a leaked s->histogram_boundaries array 2017-03-12 06:37:26 +01:00
dm-stats.h dm stats: support precise timestamps 2015-06-17 12:40:40 -04:00
dm-stripe.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-switch.c dm switch: simplify conditional in alloc_region_table() 2015-10-31 19:06:06 -04:00
dm-sysfs.c
dm-table.c dm snapshot: disallow the COW and origin devices from being identical 2016-04-12 09:08:39 -07:00
dm-target.c
dm-thin-metadata.c dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6 2018-01-23 19:50:17 +01:00
dm-thin-metadata.h dm thin metadata: add dm_thin_remove_range() 2015-06-11 17:13:04 -04:00
dm-thin.c dm thin: handle running out of data space vs concurrent discard 2018-07-03 11:21:35 +02:00
dm-uevent.c
dm-uevent.h
dm-verity.c dm: refactor ioctl handling 2015-10-31 19:05:59 -04:00
dm-zero.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm.c dm: correctly handle chained bios in dec_pending() 2018-02-22 15:45:01 +01:00
dm.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
faulty.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
Kconfig dm raid: select the Kconfig option CONFIG_MD_RAID0 2017-05-25 14:30:07 +02:00
linear.c md/linear: shutup lockdep warnning 2017-10-21 17:09:05 +02:00
linear.h md linear: fix a race between linear_add() and linear_congested() 2017-03-12 06:37:30 +01:00
Makefile raid5: add basic stripe log 2015-10-24 17:16:19 +11:00
md-cluster.c md-cluster: fix potential lock issue in add_new_disk 2018-04-13 19:50:09 +02:00
md-cluster.h md-cluster: Fix adding of new disk with new reload code 2015-10-12 03:35:30 -05:00
md.c md: fix two problems with setting the "re-add" device state. 2018-07-03 11:21:31 +02:00
md.h md/raid: only permit hot-add of compatible integrity profiles 2016-02-17 12:30:57 -08:00
multipath.c md: multipath: don't hardcopy bio in .make_request path 2016-04-12 09:08:57 -07:00
multipath.h
raid0.c md/raid0: apply base queue limits *before* disk_stack_limits 2015-10-02 17:23:44 +10:00
raid0.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
raid1.c md/raid1: fix NULL pointer dereference 2018-05-30 07:49:01 +02:00
raid1.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
raid5-cache.c raid5-cache: start raid5 readonly if journal is missing 2015-11-01 13:48:29 +11:00
raid5.c md: raid5: avoid string overflow warning 2018-05-30 07:49:00 +02:00
raid5.h RAID5: revert e9e4c377e2 to fix a livelock 2016-04-12 09:08:57 -07:00
raid10.c md raid10: fix NULL deference in handle_write_completed() 2018-05-30 07:48:59 +02:00
raid10.h md/raid10: ensure device failure recorded before write request returns. 2015-08-31 19:43:45 +02:00