blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
blk-mq uses percpu_ref for its usage counter which tracks the number of in-flight commands and used to synchronously drain the queue on freeze. percpu_ref shutdown takes measureable wallclock time as it involves a sched RCU grace period. This means that draining a blk-mq takes measureable wallclock time. One would think that this shouldn't matter as queue shutdown should be a rare event which takes place asynchronously w.r.t. userland. Unfortunately, SCSI probing involves synchronously setting up and then tearing down a lot of request_queues back-to-back for non-existent LUNs. This means that SCSI probing may take above ten seconds when scsi-mq is used. [ 0.949892] scsi host0: Virtio SCSI HBA [ 1.007864] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 1.1. PQ: 0 ANSI: 5 [ 1.021299] scsi 0:0:1:0: Direct-Access QEMU QEMU HARDDISK 1.1. PQ: 0 ANSI: 5 [ 1.520356] tsc: Refined TSC clocksource calibration: 2491.910 MHz <stall> [ 16.186549] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 16.190478] sd 0:0:1:0: Attached scsi generic sg1 type 0 [ 16.194099] osd: LOADED open-osd 0.2.1 [ 16.203202] sd 0:0:0:0: [sda] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB) [ 16.208478] sd 0:0:0:0: [sda] Write Protect is off [ 16.211439] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 16.218771] sd 0:0:1:0: [sdb] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB) [ 16.223264] sd 0:0:1:0: [sdb] Write Protect is off [ 16.225682] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA This is also the reason why request_queues start in bypass mode which is ended on blk_register_queue() as shutting down a fully functional queue also involves a RCU grace period and the queues for non-existent SCSI devices never reach registration. blk-mq basically needs to do the same thing - start the mq in a degraded mode which is faster to shut down and then make it fully functional only after the queue reaches registration. percpu_ref recently grew facilities to force atomic operation until explicitly switched to percpu mode, which can be used for this purpose. This patch makes blk-mq initialize q->mq_usage_counter in atomic mode and switch it to percpu mode only once blk_register_queue() is reached. Note that this issue was previously worked around by0a30288da1("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe") for v3.17. The temp fix was reverted in preparation of adding persistent atomic mode to percpu_ref by9eca80461a("Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe""). This patch and the prerequisite percpu_ref changes will be merged during v3.18 devel cycle. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Christoph Hellwig <hch@infradead.org> Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de Fixes:add703fda9("blk-mq: use percpu_ref for mq usage count") Reviewed-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org>
This commit is contained in:
		
					parent
					
						
							
								1cae13e75b
							
						
					
				
			
			
				commit
				
					
						17497acbdc
					
				
			
		
					 4 changed files with 21 additions and 3 deletions
				
			
		|  | @ -402,6 +402,12 @@ static void blk_mq_sysfs_init(struct request_queue *q) | ||||||
| 	} | 	} | ||||||
| } | } | ||||||
| 
 | 
 | ||||||
|  | /* see blk_register_queue() */ | ||||||
|  | void blk_mq_finish_init(struct request_queue *q) | ||||||
|  | { | ||||||
|  | 	percpu_ref_switch_to_percpu(&q->mq_usage_counter); | ||||||
|  | } | ||||||
|  | 
 | ||||||
| int blk_mq_register_disk(struct gendisk *disk) | int blk_mq_register_disk(struct gendisk *disk) | ||||||
| { | { | ||||||
| 	struct device *dev = disk_to_dev(disk); | 	struct device *dev = disk_to_dev(disk); | ||||||
|  |  | ||||||
|  | @ -1795,8 +1795,12 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set) | ||||||
| 	if (!q) | 	if (!q) | ||||||
| 		goto err_hctxs; | 		goto err_hctxs; | ||||||
| 
 | 
 | ||||||
|  | 	/*
 | ||||||
|  | 	 * Init percpu_ref in atomic mode so that it's faster to shutdown. | ||||||
|  | 	 * See blk_register_queue() for details. | ||||||
|  | 	 */ | ||||||
| 	if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release, | 	if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release, | ||||||
| 			    0, GFP_KERNEL)) | 			    PERCPU_REF_INIT_ATOMIC, GFP_KERNEL)) | ||||||
| 		goto err_map; | 		goto err_map; | ||||||
| 
 | 
 | ||||||
| 	setup_timer(&q->timeout, blk_mq_rq_timer, (unsigned long) q); | 	setup_timer(&q->timeout, blk_mq_rq_timer, (unsigned long) q); | ||||||
|  |  | ||||||
|  | @ -551,12 +551,19 @@ int blk_register_queue(struct gendisk *disk) | ||||||
| 		return -ENXIO; | 		return -ENXIO; | ||||||
| 
 | 
 | ||||||
| 	/*
 | 	/*
 | ||||||
| 	 * Initialization must be complete by now.  Finish the initial | 	 * SCSI probing may synchronously create and destroy a lot of | ||||||
| 	 * bypass from queue allocation. | 	 * request_queues for non-existent devices.  Shutting down a fully | ||||||
|  | 	 * functional queue takes measureable wallclock time as RCU grace | ||||||
|  | 	 * periods are involved.  To avoid excessive latency in these | ||||||
|  | 	 * cases, a request_queue starts out in a degraded mode which is | ||||||
|  | 	 * faster to shut down and is made fully functional here as | ||||||
|  | 	 * request_queues for non-existent devices never get registered. | ||||||
| 	 */ | 	 */ | ||||||
| 	if (!blk_queue_init_done(q)) { | 	if (!blk_queue_init_done(q)) { | ||||||
| 		queue_flag_set_unlocked(QUEUE_FLAG_INIT_DONE, q); | 		queue_flag_set_unlocked(QUEUE_FLAG_INIT_DONE, q); | ||||||
| 		blk_queue_bypass_end(q); | 		blk_queue_bypass_end(q); | ||||||
|  | 		if (q->mq_ops) | ||||||
|  | 			blk_mq_finish_init(q); | ||||||
| 	} | 	} | ||||||
| 
 | 
 | ||||||
| 	ret = blk_trace_init_sysfs(dev); | 	ret = blk_trace_init_sysfs(dev); | ||||||
|  |  | ||||||
|  | @ -140,6 +140,7 @@ enum { | ||||||
| }; | }; | ||||||
| 
 | 
 | ||||||
| struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *); | struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *); | ||||||
|  | void blk_mq_finish_init(struct request_queue *q); | ||||||
| int blk_mq_register_disk(struct gendisk *); | int blk_mq_register_disk(struct gendisk *); | ||||||
| void blk_mq_unregister_disk(struct gendisk *); | void blk_mq_unregister_disk(struct gendisk *); | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue
	
	 Tejun Heo
				Tejun Heo