linux-uconsole

Author	SHA1	Message	Date
Greg Kroah-Hartman	be3bb0daac	This is the 4.19.119 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6pj8gACgkQONu9yGCS aT5mKxAAvzC4s4XHwDDckvvu57/sED2oEtp7MgmLuyK4Ih55GyyLGx9zg1A2z+rs wQSsVW+/WeurCj4CuVciakkCvgBeY494cbnghr2lohhJZ918/XnYmPODLJhlvtcV gZ4vxk5euNqpWGsmu+X+DRBG6QuU5GYf4ox39NZdtKm5I+kt5Lw44AHSNlFP0q3y drRFc49cqSxa4WkVRixJJOTQbSHARNWiayOG4uLb4zoZFvJOTDAp7+yX5LYD7lxY 3FsQLVMSp7c/whppeGySVX0oJF/12weR9OQJZVxxhlMNggmGREwDxayBaPYqA4pa 0OO83rO1aP9j2VK3HFiK4OwatKHcu0GvGV9I4rP3u8hWvJyUzTAfdcVUXHl6of12 6hXG7F3f0TVY/OP6J2WepcQG5IbkiiAY1J0wlqbqo5MvOqESJZ/J0pGuFD1qzQ8n zaMnj2zhJHkJEfyP7Dvjo4y72eM9tWnFxKfm/PtuHWGovpP15rrsuHcs343U92Z7 zQ/Ak10tA8FpSM7dXaTd98/3FkVdQbkImkEUOpWzPjiJFGyuk8j6/ZE9rCWtlNR1 HP9cLgKB/PF/a3+kwtgGAhAHBVIA8trhSm1jRqEU7ki9sBQnV/2iR5b7UJ8xn4uA dl9HlxpiDYvIjRhHfMh6GXIhdO2T8coFzxKRztjxrM0dbaQeVKQ= =31Y5 -----END PGP SIGNATURE----- Merge 4.19.119 into android-4.19 Changes in 4.19.119 ext4: fix extent_status fragmentation for plain files drm/msm: Use the correct dma_sync calls harder bpftool: Fix printing incorrect pointer in btf_dump_ptr crypto: mxs-dcp - make symbols 'sha1_null_hash' and 'sha256_null_hash' static vti4: removed duplicate log message. arm64: Add part number for Neoverse N1 arm64: errata: Hide CTR_EL0.DIC on systems affected by Neoverse-N1 #1542419 arm64: Fake the IminLine size on systems affected by Neoverse-N1 #1542419 arm64: compat: Workaround Neoverse-N1 #1542419 for compat user-space arm64: Silence clang warning on mismatched value/register sizes watchdog: reset last_hw_keepalive time at start scsi: lpfc: Fix kasan slab-out-of-bounds error in lpfc_unreg_login scsi: lpfc: Fix crash in target side cable pulls hitting WAIT_FOR_UNREG ceph: return ceph_mdsc_do_request() errors from __get_parent() ceph: don't skip updating wanted caps when cap is stale pwm: rcar: Fix late Runtime PM enablement scsi: iscsi: Report unbind session event when the target has been removed ASoC: Intel: atom: Take the drv->lock mutex before calling sst_send_slot_map() nvme: fix deadlock caused by ANA update wrong locking kernel/gcov/fs.c: gcov_seq_next() should increase position index selftests: kmod: fix handling test numbers above 9 ipc/util.c: sysvipc_find_ipc() should increase position index kconfig: qconf: Fix a few alignment issues s390/cio: avoid duplicated 'ADD' uevents loop: Better discard support for block devices Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled" pwm: renesas-tpu: Fix late Runtime PM enablement pwm: bcm2835: Dynamically allocate base perf/core: Disable page faults when getting phys address ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN MPWIN895CL tablet xhci: Ensure link state is U3 after setting USB_SS_PORT_LS_U3 drm/amd/display: Not doing optimize bandwidth if flip pending. tracing/selftests: Turn off timeout setting virtio-blk: improve virtqueue error to BLK_STS scsi: smartpqi: fix call trace in device discovery PCI/ASPM: Allow re-enabling Clock PM net: ipv6: add net argument to ip6_dst_lookup_flow net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup blktrace: Protect q->blk_trace with RCU blktrace: fix dereference after null check f2fs: fix to avoid memory leakage in f2fs_listxattr KVM: VMX: Zero out all general purpose registers after VM-Exit KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01 KVM: Introduce a new guest mapping API kvm: fix compilation on aarch64 kvm: fix compilation on s390 kvm: fix compile on s390 part 2 KVM: Properly check if "page" is valid in kvm_vcpu_unmap x86/kvm: Introduce kvm_(un)map_gfn() x86/kvm: Cache gfn to pfn translation x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed x86/KVM: Clean up host's steal time structure cxgb4: fix adapter crash due to wrong MC size cxgb4: fix large delays in PTP synchronization ipv6: fix restrict IPV6_ADDRFORM operation macsec: avoid to set wrong mtu macvlan: fix null dereference in macvlan_device_event() net: bcmgenet: correct per TX/RX ring statistics net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node net: stmmac: dwmac-meson8b: Add missing boundary to RGMII TX clock array net/x25: Fix x25_neigh refcnt leak when receiving frame sched: etf: do not assume all sockets are full blown tcp: cache line align MAX_TCP_HEADER team: fix hang in team_mode_get() vrf: Fix IPv6 with qdisc and xfrm net: dsa: b53: Lookup VID in ARL searches when VLAN is enabled net: dsa: b53: Fix ARL register definitions net: dsa: b53: Rework ARL bin logic net: dsa: b53: b53_arl_rw_op() needs to select IVL or SVL xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish vrf: Check skb for XFRM_TRANSFORMED flag mlxsw: Fix some IS_ERR() vs NULL bugs KEYS: Avoid false positive ENOMEM error on key read ALSA: hda: Remove ASUS ROG Zenith from the blacklist ALSA: usb-audio: Add static mapping table for ALC1220-VB-based mobos ALSA: usb-audio: Add connector notifier delegation iio: core: remove extra semi-colon from devm_iio_device_register() macro iio: st_sensors: rely on odr mask to know if odr can be set iio: adc: stm32-adc: fix sleep in atomic context iio: xilinx-xadc: Fix ADC-B powerdown iio: xilinx-xadc: Fix clearing interrupt when enabling trigger iio: xilinx-xadc: Fix sequencer configuration for aux channels in simultaneous mode iio: xilinx-xadc: Make sure not exceed maximum samplerate fs/namespace.c: fix mountpoint reference counter race USB: sisusbvga: Change port variable from signed to unsigned USB: Add USB_QUIRK_DELAY_CTRL_MSG and USB_QUIRK_DELAY_INIT for Corsair K70 RGB RAPIDFIRE USB: early: Handle AMD's spec-compliant identifiers, too USB: core: Fix free-while-in-use bug in the USB S-Glibrary USB: hub: Fix handling of connect changes during sleep vmalloc: fix remap_vmalloc_range() bounds checks mm/hugetlb: fix a addressing exception caused by huge_pte_offset mm/ksm: fix NULL pointer dereference when KSM zero page is enabled tools/vm: fix cross-compile build ALSA: usx2y: Fix potential NULL dereference ALSA: hda/realtek - Fix unexpected init_amp override ALSA: hda/realtek - Add new codec supported for ALC245 ALSA: usb-audio: Fix usb audio refcnt leak when getting spdif ALSA: usb-audio: Filter out unsupported sample rates on Focusrite devices tpm/tpm_tis: Free IRQ if probing fails tpm: ibmvtpm: retry on H_CLOSED in tpm_ibmvtpm_send() KVM: s390: Return last valid slot if approx index is out-of-bounds KVM: Check validity of resolved slot when searching memslots KVM: VMX: Enable machine check support for 32bit targets tty: hvc: fix buffer overflow during hvc_alloc(). tty: rocket, avoid OOB access usb-storage: Add unusual_devs entry for JMicron JMS566 audit: check the length of userspace generated audit records ASoC: dapm: fixup dapm kcontrol widget iwlwifi: pcie: actually release queue memory in TVQM iwlwifi: mvm: beacon statistics shouldn't go backwards ARM: imx: provide v7_cpu_resume() only on ARM_CPU_SUSPEND=y powerpc/setup_64: Set cache-line-size based on cache-block-size staging: comedi: dt2815: fix writing hi byte of analog output staging: comedi: Fix comedi_device refcnt leak in comedi_open vt: don't hardcode the mem allocation upper bound vt: don't use kmalloc() for the unicode screen buffer staging: vt6656: Don't set RCR_MULTICAST or RCR_BROADCAST by default. staging: vt6656: Fix calling conditions of vnt_set_bss_mode staging: vt6656: Fix drivers TBTT timing counter. staging: vt6656: Fix pairwise key entry save. staging: vt6656: Power save stop wake_up_count wrap around. cdc-acm: close race betrween suspend() and acm_softint cdc-acm: introduce a cool down UAS: no use logging any details in case of ENODEV UAS: fix deadlock in error handling and PM flushing work usb: dwc3: gadget: Fix request completion check usb: f_fs: Clear OS Extended descriptor counts to zero in ffs_data_reset() xhci: prevent bus suspend if a roothub port detected a over-current condition serial: sh-sci: Make sure status register SCxSR is read in correct sequence xfs: Fix deadlock between AGI and AGF with RENAME_WHITEOUT s390/mm: fix page table upgrade vs 2ndary address mode accesses Linux 4.19.119 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I4b16db8472367d135a4ff68d2863c634bf093ef5	2020-04-29 17:26:17 +02:00
Jan Kara	473d7f5ed7	blktrace: Protect q->blk_trace with RCU commit `c780e86dd4` upstream. KASAN is reporting that __blk_add_trace() has a use-after-free issue when accessing q->blk_trace. Indeed the switching of block tracing (and thus eventual freeing of q->blk_trace) is completely unsynchronized with the currently running tracing and thus it can happen that the blk_trace structure is being freed just while __blk_add_trace() works on it. Protect accesses to q->blk_trace by RCU during tracing and make sure we wait for the end of RCU grace period when shutting down tracing. Luckily that is rare enough event that we can afford that. Note that postponing the freeing of blk_trace to an RCU callback should better be avoided as it could have unexpected user visible side-effects as debugfs files would be still existing for a short while block tracing has been shut down. Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711 CC: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reported-by: Tristan Madani <tristmd@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> [bwh: Backported to 4.19: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-04-29 16:31:17 +02:00
Will McVicker	7b8719ed1c	ANDROID: GKI: block: resolve ABI diff when CONFIG_BLK_DEV_BSG is unset If CONFIG_BLK_DEV_BSG is disabled on the vendor side, then struct request_queue will have an ABI diff. Remove the #ifdef in the struct so that vendors can choose whether or not to use the config. 'struct request_queue at blkdev.h:434:1' changed:¬ type size changed from 17600 to 17920 (in bits)¬ 2 data member insertions:¬ 'bsg_job_fn* request_queue::bsg_job_fn', at offset 13952 (in bits) at blkdev.h:650:1¬ 'bsg_class_device request_queue::bsg_dev', at offset 14016 (in bits) at blkdev.h:651:1¬ Signed-off-by: Will McVicker <willmcvicker@google.com> Bug: 152917514 Test: compile, verify ABI diff Change-Id: Iec2e3460b717b667fcc0409cbc4ec985cfe0f69c	2020-04-01 16:26:56 -07:00
Greg Kroah-Hartman	8cb4870403	This is the 4.19.98 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4pSYMACgkQONu9yGCS aT7Rkg/8C/AXaTp+2HxRj3ZO56uzpMBMb5duBzdzxnEnvFp+DIM7xxRX+NFI5CSK 4rjnxMd2tPsFtqiWo/bBCUcHh9gu5HJKOMFRZGaRYAXvJ/8hgahgzkBE00JiAB6r mrk9Y/pwcKxMFsAHtu3xM0oENeefXOmavVTHc9N3DQLd3hNuyTrPztBMFaDg8djR pSwh1uE2G+Z2UOdi2kXmHiEIG6NViIqp+qFYI5CUIyeKfvOEsR5nSQ97LyNQ+dUX qshARQFuk78+Ax+GNPTQXiWdzN7+SH5aw5frFtdhAN90F+XrRDj4ZXw+EkX+/M2J NZU9P/v41ESG8RWxbAZ6osAUkQ4Dgq2BQpdyRxNNjTchXc0Kr4K6BCKuhY6cGxS7 0PXPV7MsuAHYIrIvzG2lqif9gmknA0UrGVKuYJIZxBaWlHD2mEkFby0W0HIcBwir yKKK3fkFjmsGKYzh+VZVoGySWDbs7qYASWXHOCz0QCLb0CT8/ePbyxLdjY7u5KyX wDaDHXG9nm6Nu68HD/9CRnUkiK8dnsODZ0k+sBZfEa+xvHPJCdv3gnrf4SwU7dj7 ZyhO9XkFzncOJDoxYxiXTfI+zbU1ZhaDw7fk2PFvAI6P1xRS3m6rp8pDWp8iw/MX 92Sz1YzS68+otHLi+OBGxzu10PwMDtu2nUvqn68SYq6Rp0mZnnE= =2O94 -----END PGP SIGNATURE----- Merge 4.19.98 into android-4.19 Changes in 4.19.98 ARM: dts: meson8: fix the size of the PMU registers clk: qcom: gcc-sdm845: Add missing flag to votable GDSCs dt-bindings: reset: meson8b: fix duplicate reset IDs ARM: dts: imx6q-dhcom: fix rtc compatible clk: Don't try to enable critical clocks if prepare failed ASoC: msm8916-wcd-digital: Reset RX interpolation path after use iio: buffer: align the size of scan bytes to size of the largest element USB: serial: simple: Add Motorola Solutions TETRA MTP3xxx and MTP85xx USB: serial: option: Add support for Quectel RM500Q USB: serial: opticon: fix control-message timeouts USB: serial: option: add support for Quectel RM500Q in QDL mode USB: serial: suppress driver bind attributes USB: serial: ch341: handle unbound port at reset_resume USB: serial: io_edgeport: handle unbound ports on URB completion USB: serial: io_edgeport: add missing active-port sanity check USB: serial: keyspan: handle unbound ports USB: serial: quatech2: handle unbound ports scsi: fnic: fix invalid stack access scsi: mptfusion: Fix double fetch bug in ioctl ASoC: msm8916-wcd-analog: Fix selected events for MIC BIAS External1 ASoC: msm8916-wcd-analog: Fix MIC BIAS Internal1 ARM: dts: imx6q-dhcom: Fix SGTL5000 VDDIO regulator connection ALSA: dice: fix fallback from protocol extension into limited functionality ALSA: seq: Fix racy access for queue timer in proc read ALSA: usb-audio: fix sync-ep altsetting sanity check arm64: dts: allwinner: a64: olinuxino: Fix SDIO supply regulator Fix built-in early-load Intel microcode alignment block: fix an integer overflow in logical block size ARM: dts: am571x-idk: Fix gpios property to have the correct gpio number LSM: generalize flag passing to security_capable ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() usb: core: hub: Improved device recognition on remote wakeup x86/resctrl: Fix an imbalance in domain_remove_cpu() x86/CPU/AMD: Ensure clearing of SME/SEV features is maintained x86/efistub: Disable paging at mixed mode entry drm/i915: Add missing include file <linux/math64.h> x86/resctrl: Fix potential memory leak perf hists: Fix variable name's inconsistency in hists__for_each() macro perf report: Fix incorrectly added dimensions as switch perf data file mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid btrfs: rework arguments of btrfs_unlink_subvol btrfs: fix invalid removal of root ref btrfs: do not delete mismatched root refs btrfs: fix memory leak in qgroup accounting mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() ARM: dts: imx6qdl: Add Engicam i.Core 1.5 MX6 ARM: dts: imx6q-icore-mipi: Use 1.5 version of i.Core MX6DL ARM: dts: imx7: Fix Toradex Colibri iMX7S 256MB NAND flash support net: stmmac: 16KB buffer must be 16 byte aligned net: stmmac: Enable 16KB buffer size mm/huge_memory.c: make __thp_get_unmapped_area static mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment arm64: dts: agilex/stratix10: fix pmu interrupt numbers bpf: Fix incorrect verifier simulation of ARSH under ALU32 cfg80211: fix deadlocks in autodisconnect work cfg80211: fix memory leak in cfg80211_cqm_rssi_update cfg80211: fix page refcount issue in A-MSDU decap netfilter: fix a use-after-free in mtype_destroy() netfilter: arp_tables: init netns pointer in xt_tgdtor_param struct netfilter: nft_tunnel: fix null-attribute check netfilter: nf_tables: remove WARN and add NLA_STRING upper limits netfilter: nf_tables: store transaction list locally while requesting module netfilter: nf_tables: fix flowtable list del corruption NFC: pn533: fix bulk-message timeout batman-adv: Fix DAT candidate selection on little endian systems macvlan: use skb_reset_mac_header() in macvlan_queue_xmit() hv_netvsc: Fix memory leak when removing rndis device net: dsa: tag_qca: fix doubled Tx statistics net: hns: fix soft lockup when there is not enough memory net: usb: lan78xx: limit size of local TSO packets net/wan/fsl_ucc_hdlc: fix out of bounds write on array utdm_info ptp: free ptp device pin descriptors properly r8152: add missing endpoint sanity check tcp: fix marked lost packets not being retransmitted sh_eth: check sh_eth_cpu_data::dual_port when dumping registers mlxsw: spectrum: Wipe xstats.backlog of down ports mlxsw: spectrum_qdisc: Include MC TCs in Qdisc counters xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk tcp: refine rule to allow EPOLLOUT generation under mem pressure irqchip: Place CONFIG_SIFIVE_PLIC into the menu cw1200: Fix a signedness bug in cw1200_load_firmware() arm64: dts: meson-gxl-s905x-khadas-vim: fix gpio-keys-polled node cfg80211: check for set_wiphy_params tick/sched: Annotate lockless access to last_jiffies_update arm64: dts: marvell: Fix CP110 NAND controller node multi-line comment alignment Revert "arm64: dts: juno: add dma-ranges property" mtd: devices: fix mchp23k256 read and write drm/nouveau/bar/nv50: check bar1 vmm return value drm/nouveau/bar/gf100: ensure BAR is mapped drm/nouveau/mmu: qualify vmm during dtor reiserfs: fix handling of -EOPNOTSUPP in reiserfs_for_each_xattr scsi: esas2r: unlock on error in esas2r_nvram_read_direct() scsi: qla4xxx: fix double free bug scsi: bnx2i: fix potential use after free scsi: target: core: Fix a pr_debug() argument scsi: qla2xxx: Fix qla2x00_request_irqs() for MSI scsi: qla2xxx: fix rports not being mark as lost in sync fabric scan scsi: core: scsi_trace: Use get_unaligned_be*() perf probe: Fix wrong address verification clk: sprd: Use IS_ERR() to validate the return value of syscon_regmap_lookup_by_phandle() regulator: ab8500: Remove SYSCLKREQ from enum ab8505_regulator_id hwmon: (pmbus/ibm-cffps) Switch LEDs to blocking brightness call Linux 4.19.98 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I74a43a9e60734aec6d24b10374ba97de89172eca	2020-01-23 08:36:16 +01:00
Mikulas Patocka	a7f79052d1	block: fix an integer overflow in logical block size commit `ad6bf88a6c` upstream. Logical block size has type unsigned short. That means that it can be at most 32768. However, there are architectures that can run with 64k pages (for example arm64) and on these architectures, it may be possible to create block devices with 64k block size. For exmaple (run this on an architecture with 64k pages): Mount will fail with this error because it tries to read the superblock using 2-sector access: device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536 EXT4-fs (dm-0): unable to read superblock This patch changes the logical block size from unsigned short to unsigned int to avoid the overflow. Cc: stable@vger.kernel.org Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-01-23 08:21:29 +01:00
Satya Tangirala	b0a4fb22e5	BACKPORT: FROMLIST: block: Keyslot Manager for Inline Encryption Inline Encryption hardware allows software to specify an encryption context (an encryption key, crypto algorithm, data unit num, data unit size, etc.) along with a data transfer request to a storage device, and the inline encryption hardware will use that context to en/decrypt the data. The inline encryption hardware is part of the storage device, and it conceptually sits on the data path between system memory and the storage device. Inline Encryption hardware implementations often function around the concept of "keyslots". These implementations often have a limited number of "keyslots", each of which can hold an encryption context (we say that an encryption context can be "programmed" into a keyslot). Requests made to the storage device may have a keyslot associated with them, and the inline encryption hardware will en/decrypt the data in the requests using the encryption context programmed into that associated keyslot. As keyslots are limited, and programming keys may be expensive in many implementations, and multiple requests may use exactly the same encryption contexts, we introduce a Keyslot Manager to efficiently manage keyslots. The keyslot manager also functions as the interface that upper layers will use to program keys into inline encryption hardware. For more information on the Keyslot Manager, refer to documentation found in block/keyslot-manager.c and linux/keyslot-manager.h. Bug: 137270441 Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec Change-Id: I9a2dc72d61d5a3c64af379a97dd46155b41193eb Signed-off-by: Satya Tangirala <satyat@google.com> Link: https://patchwork.kernel.org/patch/11214713/	2019-11-14 14:47:49 -08:00
Bart Van Assche	c58a650736	block, scsi: Change the preempt-only flag into a counter commit `cd84a62e00` upstream. The RQF_PREEMPT flag is used for three purposes: - In the SCSI core, for making sure that power management requests are executed even if a device is in the "quiesced" state. - For domain validation by SCSI drivers that use the parallel port. - In the IDE driver, for IDE preempt requests. Rename "preempt-only" into "pm-only" because the primary purpose of this mode is power management. Since the power management core may but does not have to resume a runtime suspended device before performing system-wide suspend and since a later patch will set "pm-only" mode as long as a block device is runtime suspended, make it possible to set "pm-only" mode from more than one context. Since with this change scsi_device_quiesce() is no longer idempotent, make that function return early if it is called for a quiesced queue. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-04 09:30:57 +02:00
Jens Axboe	01c5f85aeb	blk-cgroup: increase number of supported policies After merging the iolatency policy, we potentially now have 4 policies being registered, but only support 3. This causes one of them to fail loading. Takashi reports that BFQ no longer works for him, because it fails to load due to policy registration failure. Bump to 5 policies, and also add a warning for when we have exceeded the global amount. If we have to touch this again, we should switch to a dynamic scheme instead. Reported-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-11 10:59:53 -06:00
Bart Van Assche	b1f4267cc5	block: Remove two superfluous #include directives Commit `12f5b93145` ("blk-mq: Remove generation seqeunce") removed the only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but did not remove the corresponding #include directives. Since these include directives are no longer needed, remove them. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <keith.busch@intel.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com>, Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-09 09:12:26 -06:00
Greg Edwards	359f642700	block: move bio_integrity_{intervals,bytes} into blkdev.h This allows bio_integrity_bytes() to be called from drivers instead of open coding it. Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Edwards <gedwards@ddn.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-26 15:49:41 -06:00
Tejun Heo	3f289dcb4b	block: make bdev_ops->rw_page() take a REQ_OP instead of bool `c11f0c0b5b` ("block/mm: make bdev_ops->rw_page() take a bool for read/write") replaced @op with boolean @is_write, which limited the amount of information going into ->rw_page() and more importantly page_endio(), which removed the need to expose block internals to mm. Unfortunately, we want to track discards separately and @is_write isn't enough information. This patch updates bdev_ops->rw_page() to take REQ_OP instead but leaves page_endio() to take bool @is_write. This allows the block part of operations to have enough information while not leaking it to mm. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-18 08:44:14 -06:00
Vladimir Zapolskiy	05814a1037	block: remove blkdev_entry_to_request() macro Remove blkdev_entry_to_request() macro, which remained unused through the observable history, also note that it repeats list_entry_rq() macro verbatim. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-13 08:12:26 -06:00
Josef Bacik	a79050434b	blk-rq-qos: refactor out common elements of blk-wbt blkcg-qos is going to do essentially what wbt does, only on a cgroup basis. Break out the common code that will be shared between blkcg-qos and wbt into blk-rq-qos.* so they can both utilize the same infrastructure. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Ming Lei	97889f9ac2	blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set() We have to remove synchronize_rcu() from blk_queue_cleanup(), otherwise long delay can be caused during lun probe. For removing it, we have to avoid to iterate the set->tag_list in IO path, eg, blk_mq_sched_restart(). This patch reverts 5b79413946d (Revert "blk-mq: don't handle TAG_SHARED in restart"). Given we have fixed enough IO hang issue, and there isn't any reason to restart all queues in one tags any more, see the following reasons: 1) blk-mq core can deal with shared-tags case well via blk_mq_get_driver_tag(), which can wake up queues waiting for driver tag. 2) SCSI is a bit special because it may return BLK_STS_RESOURCE if queue, target or host is ready, but SCSI built-in restart can cover all these well, see scsi_end_request(), queue will be rerun after any request initiated from this host/target is completed. In my test on scsi_debug(8 luns), this patch may improve IOPS by 20% ~ 30% when running I/O on these 8 luns concurrently. Fixes: `705cda97ee` ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Reported-by: Andrew Jones <drjones@redhat.com> Tested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Bart Van Assche	6a5ac98465	block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n Exclude zoned block device members from struct request_queue for CONFIG_BLK_DEV_ZONED == n. Avoid breaking the build by only building the code that uses these struct request_queue members if CONFIG_BLK_DEV_ZONED != n. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Bart Van Assche	7c8542b798	block: Inline blk_queue_nr_zones() Since the implementation of blk_queue_nr_zones() is trivial and since it only has a single caller, inline this function. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Bart Van Assche	6b1d83d274	block: Remove bdev_nr_zones() Remove this function since it has no callers. This function was introduced in commit `6cc77e9cb0` ("block: introduce zoned block devices zone write locking"). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Matias Bjorling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Keith Busch	15bfd21fbc	block: Fix transfer when chunk sectors exceeds max A device may have boundary restrictions where the number of sectors between boundaries exceeds its max transfer size. In this case, we need to cap the max size to the smaller of the two limits. Reported-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Tested-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Cc: <stable@vger.kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-26 12:18:27 -06:00
Christoph Hellwig	be7f99c536	block: remov blk_queue_invalidate_tags This function is entirely unused, so remove it and the tag_queue_busy member of struct request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-15 08:13:35 -06:00
Christoph Hellwig	da66126739	blk-mq: don't time out requests again that are in the timeout handler We can currently call the timeout handler again on a request that has already been handed over to the timeout handler. Prevent that with a new flag. Fixes: `12f5b931` ("blk-mq: Remove generation seqeunce") Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com> Tested-by: Andrew Randrianasulu <randrianasulu@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-14 08:39:46 -06:00
Kent Overstreet	338aa96d56	block: convert bounce, q->bio_split to bioset_init()/mempool_init() Convert the core block functionality to embedded bio sets. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-30 15:33:32 -06:00
Jens Axboe	0b7576d8eb	block: move ->timeout request member After the recent timeout handling changes, we have two holes in the struct. Move the timeout near the deadline, killing both, and moving related members closer together. On my config on x86-64, this shrinks struct request from 312 to 304 bytes. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-29 08:59:21 -06:00
Christoph Hellwig	88b0cfad28	block: document the blk_eh_timer_return values Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-29 08:59:21 -06:00
Christoph Hellwig	f6e7d48a78	block: remove BLK_EH_HANDLED Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-29 08:59:21 -06:00
Christoph Hellwig	6600593cbd	block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE The BLK_EH_NOT_HANDLED implies nothing happen, but very often that is not what is happening - instead the driver already completed the command. Fix the symbolic name to reflect that a little better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-29 08:59:21 -06:00
Keith Busch	12f5b93145	blk-mq: Remove generation seqeunce This patch simplifies the timeout handling by relying on the request reference counting to ensure the iterator is operating on an inflight and truly timed out request. Since the reference counting prevents the tag from being reallocated, the block layer no longer needs to prevent drivers from completing their requests while the timeout handler is operating on it: a driver completing a request is allowed to proceed to the next state without additional syncronization with the block layer. This also removes any need for generation sequence numbers since the request lifetime is prevented from being reallocated as a new sequence while timeout handling is operating on it. To enables this a refcount is added to struct request so that request users can be sure they're operating on the same request without it changing while they're processing it. The request's tag won't be released for reuse until both the timeout handler and the completion are done with it. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: slight cleanups, added back submission side hctx lock, use cmpxchg for completions] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-29 08:59:21 -06:00
Christoph Hellwig	ff005a0662	block: sanitize blk_get_request calling conventions Switch everyone to blk_get_request_flags, and then rename blk_get_request_flags to blk_get_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-14 08:55:12 -06:00
Omar Sandoval	522a777566	block: consolidate struct request timestamp fields Currently, struct request has four timestamp fields: - A start time, set at get_request time, in jiffies, used for iostats - An I/O start time, set at start_request time, in ktime nanoseconds, used for blk-stats (i.e., wbt, kyber, hybrid polling) - Another start time and another I/O start time, used for cfq and bfq These can all be consolidated into one start time and one I/O start time, both in ktime nanoseconds, shaving off up to 16 bytes from struct request depending on the kernel config. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-09 08:33:09 -06:00
Omar Sandoval	84c7afcebe	block: use ktime_get_ns() instead of sched_clock() for cfq and bfq cfq and bfq have some internal fields that use sched_clock() which can trivially use ktime_get_ns() instead. Their timestamp fields in struct request can also use ktime_get_ns(), which resolves the 8 year old comment added by commit `28f4197e5d` ("block: disable preemption before using sched_clock()"). Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-09 08:33:06 -06:00
Omar Sandoval	544ccc8dc9	block: get rid of struct blk_issue_stat struct blk_issue_stat squashes three things into one u64: - The time the driver started working on a request - The original size of the request (for the io.low controller) - Flags for writeback throttling It turns out that on x86_64, we have a 4 byte hole in struct request which we can fill with the non-timestamp fields from blk_issue_stat, simplifying things quite a bit. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-09 08:33:05 -06:00
Linus Torvalds	3442097b76	SCSI fixes on 20180425 8 bug fixes, one spelling update and one tracepoint addition. The most serious is probably the mpt3sas write same fix because it means anyone using these controllers sees errors when modern filesystems try to issue discards. Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWuDoQyYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishU0eAP0QvCH8 NF2L35OadCr7I1Nvcb8h/OKsVtF6IIpFDD/0DAEA/FwV9wxTknA2OoSWhFzxPfMY EkQR56i7DQAvX3Agrno= =jMLe -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Eight bug fixes, one spelling update and one tracepoint addition. The most serious is probably the mptsas write same fix because it means anyone using these controllers sees errors when modern filesystems try to issue discards" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target: fix crash with iscsi target and dvd scsi: sd_zbc: Avoid that resetting a zone fails sporadically scsi: sd: Defer spinning up drive while SANITIZE is in progress scsi: megaraid_sas: Do not log an error if FW successfully initializes. scsi: ufs: add trace event for ufs upiu scsi: core: remove reference to scsi_show_extd_sense() scsi: mptsas: Disable WRITE SAME scsi: fnic: fix spelling mistake in fnic stats "Abord" -> "Abort" scsi: scsi_debug: IMMED related delay adjustments scsi: iscsi: respond to netlink with unicast when appropriate	2018-04-25 21:13:40 -07:00
Bart Van Assche	ccce20fc79	scsi: sd_zbc: Avoid that resetting a zone fails sporadically Since SCSI scanning occurs asynchronously, since sd_revalidate_disk() is called from sd_probe_async() and since sd_revalidate_disk() calls sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called concurrently with blkdev_report_zones() and/or blkdev_reset_zones(). That can cause these functions to fail with -EIO because sd_zbc_read_zones() e.g. sets q->nr_zones to zero before restoring it to the actual value, even if no drive characteristics have changed. Avoid that this can happen by making the following changes: - Protect the code that updates zone information with blk_queue_enter() and blk_queue_exit(). - Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that these functions do not modify struct scsi_disk before all zone information has been obtained. Note: since commit `055f6e18e0` ("block: Make q_usage_counter also track legacy requests"; kernel v4.15) the request queue freezing mechanism also affects legacy request queues. Fixes: `89d9475610` ("sd: Implement support for ZBC devices") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: stable@vger.kernel.org # v4.16 Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-19 00:04:10 -04:00
Dave Chinner	0ce9144471	block: add blk_queue_fua() helper function So we can check FUA support status from the iomap direct IO code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-18 08:24:29 -06:00
Bart Van Assche	233bde21aa	block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> It happens often while I'm preparing a patch for a block driver that I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT available for this driver? Do I have to introduce definitions of these constants before I can use these constants? To avoid this confusion, move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the <linux/blkdev.h> header file such that these become available for all block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h header file conditional to avoid that including that header file after <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE redefinition. Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have not been removed from uapi header files nor from NAND drivers in which these constants are used for another purpose than converting block layer offsets and sizes into a number of sectors. Cc: David S. Miller <davem@davemloft.net> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-17 14:45:23 -06:00
Bart Van Assche	8a0ac14b8d	block: Move the queue_flag_*() functions from a public into a private header file This patch helps to avoid that new code gets introduced in block drivers that manipulates queue flags without holding the queue lock when that lock should be held. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-08 14:13:48 -07:00
Bart Van Assche	1db2008b79	block: Complain if queue_flag_(set\|clear)_unlocked() is abused Since it is not safe to use queue_flag_(set\|clear)_unlocked() without holding the queue lock after the sysfs entries for a queue have been created, complain if this happens. Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-08 14:13:48 -07:00
Bart Van Assche	8814ce8a0f	block: Introduce blk_queue_flag_{set,clear,test_and_{set,clear}}() Introduce functions that modify the queue flags and that protect these modifications with the request queue lock. Except for moving one wake_up_all() call from inside to outside a critical section, this patch does not change any functionality. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-08 14:13:48 -07:00
Bart Van Assche	66f91322f3	block: Reorder the queue flag manipulation function definitions Move the definition of queue_flag_clear_unlocked() up and move the definition of queue_in_flight() down such that all queue flag manipulation function definitions become contiguous. This patch does not change any functionality. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-03-08 14:13:48 -07:00
Bart Van Assche	5ee0524ba1	block: Add 'lock' as third argument to blk_alloc_queue_node() This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-02-28 12:23:35 -07:00
Minwoo Im	096392e071	block: fix a typo in comment of BLK_MQ_POLL_STATS_BKTS Update comment typo _consisitent_ to _consistent_ from following commit. commit `0206319fdf` ("blk-mq: Fix poll_stat for new size-based bucketing.") Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-02-15 08:27:06 -07:00
Linus Torvalds	0a4b6e2f80	Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "This is the main pull request for block IO related changes for the 4.16 kernel. Nothing major in this pull request, but a good amount of improvements and fixes all over the map. This contains: - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and Paolo. - Support for SMR zones for deadline and mq-deadline from Damien and Christoph. - Set of fixes for bcache by way of Michael Lyle, including fixes from himself, Kent, Rui, Tang, and Coly. - Series from Matias for lightnvm with fixes from Hans Holmberg, Javier, and Matias. Mostly centered around pblk, and the removing rrpc 1.2 in preparation for supporting 2.0. - A couple of NVMe pull requests from Christoph. Nothing major in here, just fixes and cleanups, and support for command tracing from Johannes. - Support for blk-throttle for tracking reads and writes separately. From Joseph Qi. A few cleanups/fixes also for blk-throttle from Weiping. - Series from Mike Snitzer that enables dm to register its queue more logically, something that's alwways been problematic on dm since it's a stacked device. - Series from Ming cleaning up some of the bio accessor use, in preparation for supporting multipage bvecs. - Various fixes from Ming closing up holes around queue mapping and quiescing. - BSD partition fix from Richard Narron, fixing a problem where we can't mount newer (10/11) FreeBSD partitions. - Series from Tejun reworking blk-mq timeout handling. The previous scheme relied on atomic bits, but it had races where we would think a request had timed out if it to reused at the wrong time. - null_blk now supports faking timeouts, to enable us to better exercise and test that functionality separately. From me. - Kill the separate atomic poll bit in the request struct. After this, we don't use the atomic bits on blk-mq anymore at all. From me. - sgl_alloc/free helpers from Bart. - Heavily contended tag case scalability improvement from me. - Various little fixes and cleanups from Arnd, Bart, Corentin, Douglas, Eryu, Goldwyn, and myself" * 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits) block: remove smart1,2.h nvme: add tracepoint for nvme_complete_rq nvme: add tracepoint for nvme_setup_cmd nvme-pci: introduce RECONNECTING state to mark initializing procedure nvme-rdma: remove redundant boolean for inline_data nvme: don't free uuid pointer before printing it nvme-pci: Suspend queues after deleting them bsg: use pr_debug instead of hand crafted macros blk-mq-debugfs: don't allow write on attributes with seq_operations set nvme-pci: Fix queue double allocations block: Set BIO_TRACE_COMPLETION on new bio during split blk-throttle: use queue_is_rq_based block: Remove kblockd_schedule_delayed_work{,_on}() blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly() lib/scatterlist: Fix chaining support in sgl_alloc_order() blk-throttle: track read and write request individually block: add bdev_read_only() checks to common helpers block: fail op_is_write() requests to read-only partitions blk-throttle: export io_serviced_recursive, io_service_bytes_recursive ...	2018-01-29 11:51:49 -08:00
Bart Van Assche	f5ced52aaa	block: Remove kblockd_schedule_delayed_work{,_on}() The previous patch removed all users of these two functions. Hence also remove the functions themselves. Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-19 12:52:03 -07:00
Jens Axboe	7c3fb70f03	block: rearrange a few request fields for better cache layout Move completion related items (like the call single data) near the end of the struct, instead of mixing them in with the initial queueing related fields. Move queuelist below the bio structures. Then we have all queueing related bits in the first cache line. This yields a 1.5-2% increase in IOPS for a null_blk test, both for sync and for high thread count access. Sync test goes form 975K to 992K, 32-thread case from 20.8M to 21.2M IOPS. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-10 11:47:58 -07:00
Jens Axboe	e14575b3d4	block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit We only have one atomic flag left. Instead of using an entire unsigned long for that, steal the bottom bit of the deadline field that we already reserved. Remove ->atomic_flags, since it's now unused. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-10 11:47:53 -07:00
Jens Axboe	0a72e7f449	block: add accessors for setting/querying request deadline We reduce the resolution of request expiry, but since we're already using jiffies for this where resolution depends on the kernel configuration and since the timeout resolution is coarse anyway, that should be fine. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-10 11:47:47 -07:00
Jens Axboe	76a86f9d02	block: remove REQ_ATOM_POLL_SLEPT We don't need this to be an atomic flag, it can be a regular flag. We either end up on the same CPU for the polling, in which case the state is sane, or we did the sleep which would imply the needed barrier to ensure we see the right state. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-10 11:47:43 -07:00
Tejun Heo	634f9e4631	blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq After the recent updates to use generation number and state based synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except to avoid firing the same timeout multiple times. Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple times. This removes atomic bitops from hot paths too. v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out(). v3: Added RQF_MQ_TIMEOUT_EXPIRED flag. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-09 09:31:15 -07:00
Tejun Heo	1d9bd5161b	blk-mq: replace timeout synchronization with a RCU and generation based scheme Currently, blk-mq timeout path synchronizes against the usual issue/completion path using a complex scheme involving atomic bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence rules. Unfortunately, it contains quite a few holes. There's a complex dancing around REQ_ATOM_STARTED and REQ_ATOM_COMPLETE between issue/completion and timeout paths; however, they don't have a synchronization point across request recycle instances and it isn't clear what the barriers add. blk_mq_check_expired() can easily read STARTED from N-2'th iteration, deadline from N-1'th, blk_mark_rq_complete() against Nth instance. In fact, it's pretty easy to make blk_mq_check_expired() terminate a later instance of a request. If we induce 5 sec delay before time_after_eq() test in blk_mq_check_expired(), shorten the timeout to 2s, and issue back-to-back large IOs, blk-mq starts timing out requests spuriously pretty quickly. Nothing actually timed out. It just made the call on a recycle instance of a request and then terminated a later instance long after the original instance finished. The scenario isn't theoretical either. This patch replaces the broken synchronization mechanism with a RCU and generation number based one. 1. Each request has a u64 generation + state value, which can be updated only by the request owner. Whenever a request becomes in-flight, the generation number gets bumped up too. This provides the basis for the timeout path to distinguish different recycle instances of the request. Also, marking a request in-flight and setting its deadline are protected with a seqcount so that the timeout path can fetch both values coherently. 2. The timeout path fetches the generation, state and deadline. If the verdict is timeout, it records the generation into a dedicated request abortion field and does RCU wait. 3. The completion path is also protected by RCU (from the previous patch) and checks whether the current generation number and state match the abortion field. If so, it skips completion. 4. The timeout path, after RCU wait, scans requests again and terminates the ones whose generation and state still match the ones requested for abortion. By now, the timeout path knows that either the generation number and state changed if it lost the race or the completion will yield to it and can safely timeout the request. While it's more lines of code, it's conceptually simpler, doesn't depend on direct use of subtle memory ordering or coherence, and hopefully doesn't terminate the wrong instance. While this change makes REQ_ATOM_COMPLETE synchronization unnecessary between issue/complete and timeout paths, REQ_ATOM_COMPLETE isn't removed yet as it's still used in other places. Future patches will move all state tracking to the new mechanism and remove all bitops in the hot paths. Note that this patch adds a comment explaining a race condition in BLK_EH_RESET_TIMER path. The race has always been there and this patch doesn't change it. It's just documenting the existing race. v2: - Fixed BLK_EH_RESET_TIMER handling as pointed out by Jianchao. - s/request->gstate_seqc/request->gstate_seq/ as suggested by Peter. - READ_ONCE() added in blk_mq_rq_update_state() as suggested by Peter. v3: - Fixed possible extended seqcount / u64_stats_sync read looping spotted by Peter. - MQ_RQ_IDLE was incorrectly being set in complete_request instead of free_request. Fixed. v4: - Rebased on top of hctx_lock() refactoring patch. - Added comment explaining the use of hctx_lock() in completion path. v5: - Added comments requested by Bart. - Note the addition of BLK_EH_RESET_TIMER race condition in the commit message. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <Bart.VanAssche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-09 09:31:15 -07:00
Christoph Hellwig	6cc77e9cb0	block: introduce zoned block devices zone write locking Components relying only on the request_queue structure for accessing block devices (e.g. I/O schedulers) have a limited knowledged of the device characteristics. In particular, the device capacity cannot be easily discovered, which for a zoned block device also result in the inability to easily know the number of zones of the device (the zone size is indicated by the chunk_sectors field of the queue limits). Introduce the nr_zones field to the request_queue structure to simplify access to this information. Also, add the bitmap seq_zone_bitmap which indicates which zones of the device are sequential zones (write preferred or write required) and the bitmap seq_zones_wlock which indicates if a zone is write locked, that is, if a write request targeting a zone was dispatched to the device. These fields are initialized by the low level block device driver (sd.c for ZBC/ZAC disks). They are not initialized by stacking drivers (device mappers) handling zoned block devices (e.g. dm-linear). Using this, I/O schedulers can introduce zone write locking to control request dispatching to a zoned block device and avoid write request reordering by limiting to at most a single write request per zone outside of the scheduler at any time. Based on previous patches from Damien Le Moal. Signed-off-by: Christoph Hellwig <hch@lst.de> [Damien] * Fixed comments and identation in blkdev.h * Changed helper functions * Fixed this commit message Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-05 09:22:17 -07:00
Jens Axboe	4ccafe0320	block: unalign call_single_data in struct request A previous change blindly added massive alignment to the call_single_data structure in struct request. This ballooned it in size from 296 to 320 bytes on my setup, for no valid reason at all. Use the unaligned struct __call_single_data variant instead. Fixes: `966a967116` ("smp: Avoid using two cache lines for struct call_single_data") Cc: stable@vger.kernel.org # v4.14 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-12-20 13:16:33 -07:00

1 2 3 4 5 ...

741 commits