Now we have the basic infrastructure, add the first error class so
we can build up the infrastructure in a meaningful way. Add the
metadata async write IO error class and sysfs entry, and introduce a
default configuration that matches the existing "retry forever"
behavior for async write metadata buffers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Pull libata updates from Tejun Heo:
"Trivial changes except for special case timeout bumping.
I have two more libata branches which depend on SCSI and dmaengine
tree respectively. I'll send pull requests for them once the
prerequisite trees are pulled in"
* 'for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata-scsi: use %*ph to dump small buffers
treewide: Fix typos in libata.xml
libata-core: Allow longer timeout for drive spinup from PUIS
libata: Fixup awkward whitespace in warning by removing line continuation.
We need to be able to change the way XFS behaviours in error
conditions depending on the type of underlying storage. This is
necessary for handling non-traditional block devices with extended
error cases, such as thin provisioned devices that can return ENOSPC
as an IO error.
Introduce the basic sysfs infrastructure needed to define and
configure error behaviours. This is done to be generic enough to
extend to configuring behaviour in other error conditions, such as
ENOMEM, which also has different desired behaviours according to
machine configuration.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reports have surfaced of a lockdep splat complaining about an
irq-safe -> irq-unsafe locking order in the xfs_buf_bio_end_io() bio
completion handler. This only occurs when I/O errors are present
because bp->b_lock is only acquired in this context to protect
setting an error on the buffer. The problem is that this lock can be
acquired with the (request_queue) q->queue_lock held. See
scsi_end_request() or ata_qc_schedule_eh(), for example.
Replace the locked test/set of b_io_error with a cmpxchg() call.
This eliminates the need for the lock and thus the lock ordering
problem goes away.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Commit 0b89e9aa28 (cpuidle: delay enabling interrupts until all
coupled CPUs leave idle) rightfully fixed a regression by letting
the coupled idle state framework to handle local interrupt enabling
when the CPU is exiting an idle state.
The current code checks if the idle state is coupled and, if so, it
will let the coupled code to enable interrupts. This way, it can
decrement the ready-count before handling the interrupt. This
mechanism prevents the other CPUs from waiting for a CPU which is
handling interrupts.
But the check is done against the state index returned by the back
end driver's ->enter functions which could be different from the
initial index passed as parameter to the cpuidle_enter_state()
function.
entered_state = target_state->enter(dev, drv, index);
[ ... ]
if (!cpuidle_state_is_coupled(drv, entered_state))
local_irq_enable();
[ ... ]
If the 'index' is referring to a coupled idle state but the
'entered_state' is *not* coupled, then the interrupts are enabled
again. All CPUs blocked on the sync barrier may busy loop longer
if the CPU has interrupts to handle before decrementing the
ready-count. That's consuming more energy than saving.
Fixes: 0b89e9aa28 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle)
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cut down on noise for mainstream users of the API and people doing build
testing by dropping the deprecated flag from regulator_can_change_voltage()
as it triggers even on the EXPORT_SYMBOL_GPL() which affects all builds
rather than just the remaining drivers with calls to it (for which fixes
are currently pending).
The function remains deprecated and is expected to be removed entirely
in v4.8.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXO1DsAAoJECTWi3JdVIfQFSkH/A1YGISAUiYkASQVtWQM3SeI
CmS5Cdv1NKVrSmZ+eNcOgjYfBCYNTm5iqW2jh+oholXCCvwNMq9tCavDkE4YR1B+
Wa+QB0bvaRtAKRg564QF7PexCDv/lQVt/7BItusaza9FXKIgi1vmt4GCojJzxOcS
NnJab7+8BSG8Xjngw5JaOTcvo26u3bQLBzAUzTaEXXDqbirlvUTKBzSBSDWWSYDN
W09QLX3UBRyC+TGjPl8lzSyJ+6MlCAaV5qRgmNLJip2X5MVP4I8okN1U4S9YywBm
8epi8oD9zEr/tWcbec8nNJSgWdAlxZzEJi+JWyoEcYk0hDVASfgrnBkWvNJeH0Q=
=yAs2
-----END PGP SIGNATURE-----
Merge tag 'regulator-fix-can-change-voltage' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"Fix build warnings from regulator_can_change_voltage()
Cut down on noise for mainstream users of the API and people
doing build testing by dropping the deprecated flag from
regulator_can_change_voltage() as it triggers even on the
EXPORT_SYMBOL_GPL() which affects all builds rather than just
the remaining drivers with calls to it (for which fixes are
currently pending).
The function remains deprecated and is expected to be removed
entirely in v4.8"
* tag 'regulator-fix-can-change-voltage' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Silence build warnings from regulator_can_change_voltage()
Core infrastructural changes:
- Support for natively single-ended GPIO driver stages. This
means that if the hardware has registers to configure open
drain or open source configuration, we use that rather than
(as we did before) try to emulate it by switching the line
to an input to get high impedance. This is also documented
throughly in Documentation/gpio/driver.txt for those of you
who did not understand one word of what I just wrote.
- Start to do away with the unnecessarily complex and
unitelligible ARCH_REQUIRE_GPIOLIB and
ARCH_WANT_OPTIONAL_GPIOLIB, another evolutional artifact from
the time when the GPIO subsystem was unmaintained. Archs can
now just select GPIOLIB and be done with it, cleanups to
arches will trickle in for the next kernel. Some minor archs
ACKed the changes immediately so these are included in this
pull request.
- Advancing the use of the data pointer inside the GPIO device
for storing driver data by switching the PowerPC, Super-H
Unicore and a few other subarches or subsystem drivers in
ALSA SoC, Input, serial, SSB, staging etc to use it.
- The initialization now reads the input/output state of the
GPIO lines, so that each GPIO descriptor knows - if this
callback is implemented - whether the line is input or
output. This also reflects nicely in userspace "lsgpio".
- It is now possible to name GPIO producer names, line names,
from the device tree. (Platform data has been supported for
a while.) I bet we will get a similar mechanism for ACPI
one of those days. This makes is possible to get sensible
producer names for e.g. GPIO rails in "lsgpio" in userspace.
New drivers:
- New driver for the Loongson1.
- The XLP driver now supports Broadcom Vulcan ARM64.
- The IT87 driver now supports IT8620 and IT8628.
- The PCA953X driver now supports Galileo Gen2.
Driver improvements:
- MCP23S08 was switched to use the gpiolib irqchip helpers and
now also suppors level-triggered interrupts.
- 74x164 and RCAR now supports the .set_multiple() callback
- AMDPT was converted to use generic GPIO.
- TC3589x, TPS65218, SX150X, F7188X, MENZ127, VX855, WM831X, WM8994
support the new single ended callback for open drain
and in some cases open source.
- Implement the .get_direction() callback for a few more drivers
like PL061, Xgene.
Cleanups:
- Paul Gortmaker combed through the drivers and de-modularized
those who are not really modules.
- Move the GPIO poweroff DT bindings to the power subdir where
they belong.
- Rename gpio-generic.c to gpio-mmio.c, which is much more to the
point. That's what it is handling, nothing more, nothing less.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXOuJ5AAoJEEEQszewGV1zNXsQAII5wtkP69WRJ3goYBKg1dZN
DkuLqZyVI4hCgRhptzUW10gDLHKKOCVubfetTJHSpyG/dWDJXPCyH6FHF+pW6lMX
y+em8kAvWctKpaosy4EM7O55/IohW0/fNCTOfzfrUNivjydFuA2XwPUiPqC7111O
DeKlC/t+W1JEvZTiKMi83pKq+9wqhiHmD0qxRHhV57S+MT8e7mdlSKOp7uUkKPkg
LPlerXosnmeFjL2emuSnKl/tq8pOyruU6uaIGG/uwpbo2W86Dok9GY2GWkQ4pANT
pDtprc4aJ/Clf6Q0CoKwQbmAozqTDeJo+Und9tRs2KuZRly2bWOcyVE0lyK+Y4s0
544LcKw2q6cB9ARZ6JExEVRJejPISGKMqo9TaHkyNSIJoiiatKYvNS4WVeFtTgbI
W+1WfM1svPymNRqVPO1PMLV+3m9dalDH2WjtaFF21uCAQ/G0AuPEHjEDbbx0HIpb
qrvWmYzZ97Rm/LdYROFRO53nEdCp2jh6c3n4/2kGYM8H0suvGxXZsB1g4i+Dm+B+
qKVTS282azlDuH9ohXeXizeb6atK6s8TC3Rmew97SmXDO00cUQzEQO/ZquRLHY9r
n83afQ4OL2Z9yruAxAk7pCshVSyheOsHuFPuZ7bwPW31VMdoWNRkhnaTUXMjGfYg
3y39IHrCKWNMCCVM1iNl
=z4d6
-----END PGP SIGNATURE-----
Merge tag 'gpio-v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for kernel cycle v4.7:
Core infrastructural changes:
- Support for natively single-ended GPIO driver stages.
This means that if the hardware has registers to configure open
drain or open source configuration, we use that rather than (as we
did before) try to emulate it by switching the line to an input to
get high impedance.
This is also documented throughly in Documentation/gpio/driver.txt
for those of you who did not understand one word of what I just
wrote.
- Start to do away with the unnecessarily complex and unitelligible
ARCH_REQUIRE_GPIOLIB and ARCH_WANT_OPTIONAL_GPIOLIB, another
evolutional artifact from the time when the GPIO subsystem was
unmaintained.
Archs can now just select GPIOLIB and be done with it, cleanups to
arches will trickle in for the next kernel. Some minor archs ACKed
the changes immediately so these are included in this pull request.
- Advancing the use of the data pointer inside the GPIO device for
storing driver data by switching the PowerPC, Super-H Unicore and
a few other subarches or subsystem drivers in ALSA SoC, Input,
serial, SSB, staging etc to use it.
- The initialization now reads the input/output state of the GPIO
lines, so that each GPIO descriptor knows - if this callback is
implemented - whether the line is input or output. This also
reflects nicely in userspace "lsgpio".
- It is now possible to name GPIO producer names, line names, from
the device tree. (Platform data has been supported for a while).
I bet we will get a similar mechanism for ACPI one of those days.
This makes is possible to get sensible producer names for e.g.
GPIO rails in "lsgpio" in userspace.
New drivers:
- New driver for the Loongson1.
- The XLP driver now supports Broadcom Vulcan ARM64.
- The IT87 driver now supports IT8620 and IT8628.
- The PCA953X driver now supports Galileo Gen2.
Driver improvements:
- MCP23S08 was switched to use the gpiolib irqchip helpers and now
also suppors level-triggered interrupts.
- 74x164 and RCAR now supports the .set_multiple() callback
- AMDPT was converted to use generic GPIO.
- TC3589x, TPS65218, SX150X, F7188X, MENZ127, VX855, WM831X, WM8994
support the new single ended callback for open drain and in some
cases open source.
- Implement the .get_direction() callback for a few more drivers like
PL061, Xgene.
Cleanups:
- Paul Gortmaker combed through the drivers and de-modularized those
who are not really modules.
- Move the GPIO poweroff DT bindings to the power subdir where they
belong.
- Rename gpio-generic.c to gpio-mmio.c, which is much more to the
point. That's what it is handling, nothing more, nothing less"
* tag 'gpio-v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (126 commits)
MIPS: do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB
gpio: zevio: make it explicitly non-modular
gpio: timberdale: make it explicitly non-modular
gpio: stmpe: make it explicitly non-modular
gpio: sodaville: make it explicitly non-modular
pinctrl: sh-pfc: Let gpio_chip.to_irq() return zero on error
gpio: dwapb: Add ACPI device ID for DWAPB GPIO controller on X-Gene platforms
gpio: dt-bindings: add wd,mbl-gpio bindings
gpio: of: make it possible to name GPIO lines
gpio: make gpiod_to_irq() return negative for NO_IRQ
gpio: xgene: implement .get_direction()
gpio: xgene: Enable ACPI support for X-Gene GFC GPIO driver
gpio: tegra: Implement gpio_get_direction callback
gpio: set up initial state from .get_direction()
gpio: rename gpio-generic.c into gpio-mmio.c
gpio: generic: fix GPIO_GENERIC_PLATFORM is set to module case
gpio: dwapb: add gpio-signaled acpi event support
gpio: dwapb: convert device node to fwnode
gpio: dwapb: remove name from dwapb_port_property
gpio/qoriq: select IRQ_DOMAIN
...
Pull HID updates from Jiri Kosina:
"No biggies this time:
- micro-optimization of implement() in HID core parses, from Dmitry
Torokhov
- thingm driver cleanups from Heiner Kallweit
- fine-graining detection of distance and tilt axes in wacom driver
from Jason Gerecke
- New hid-asus driver, currently supporting X205TA and VivoBook
E200HA, from Yusuke Fujimaki"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: wacom: Add fuzz factor to distance and tilt axes
HID: usbhid: quirks for Corsair RGB keyboard & mice (K70R, K95RGB, M65RGB, K70RGB, K65RGB)
HID: thingm: remove not needed error message
HID: thingm: set new flag LED_HW_PLUGGABLE
HID: thingm: factor out duplicated code to thingm_init_led
HID: simplify implement() a bit
HID: asus: add support for VivoBook E200HA
HID: hidraw: silence an uninitialized variable warning
HID: roccat: silence an uninitialized variable warning
HID: Asus X205TA keyboard driver
HID: hidraw: switch to using memdup_user
None of the cpufreq governors currently in the tree will ever fail
an invocation of the ->governor() callback with the event argument
equal to CPUFREQ_GOV_STOP (unless invoked with incorrect arguments
which doesn't matter anyway) and it is rather difficult to imagine
a valid reason for such a failure.
Accordingly, rearrange the code in the core to make it clear that
this call never fails.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
None of the cpufreq governors currently in the tree will ever fail
an invocation of the ->governor() callback with the event argument
equal to CPUFREQ_GOV_POLICY_EXIT (unless invoked with incorrect
arguments which doesn't matter anyway) and it wouldn't really
make sense to fail it, because the caller won't be able to handle
that failure in a meaningful way.
Accordingly, rearrange the code in the core to make it clear that
this call never fails.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
One of the if () statements in intel_pstate_set_policy() causes
another if () to be evaluated if the condition is true and it
doesn't do anything else, so merge the two if () statements into
one.
No functional changes.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Pull livepatching updates from Jiri Kosina:
- remove of our own implementation of architecture-specific relocation
code and leveraging existing code in the module loader to perform
arch-dependent work, from Jessica Yu.
The relevant patches have been acked by Rusty (for module.c) and
Heiko (for s390).
- live patching support for ppc64le, which is a joint work of Michael
Ellerman and Torsten Duwe. This is coming from topic branch that is
share between livepatching.git and ppc tree.
- addition of livepatching documentation from Petr Mladek
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: make object/func-walking helpers more robust
livepatch: Add some basic livepatch documentation
powerpc/livepatch: Add live patching support on ppc64le
powerpc/livepatch: Add livepatch stack to struct thread_info
powerpc/livepatch: Add livepatch header
livepatch: Allow architectures to specify an alternate ftrace location
ftrace: Make ftrace_location_range() global
livepatch: robustify klp_register_patch() API error checking
Documentation: livepatch: outline Elf format and requirements for patch modules
livepatch: reuse module loader code to write relocations
module: s390: keep mod_arch_specific for livepatch modules
module: preserve Elf information for livepatch modules
Elf: add livepatch-specific Elf constants
Pull networking updates from David Miller:
"Highlights:
1) Support SPI based w5100 devices, from Akinobu Mita.
2) Partial Segmentation Offload, from Alexander Duyck.
3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.
4) Allow cls_flower stats offload, from Amir Vadai.
5) Implement bpf blinding, from Daniel Borkmann.
6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
actually using FASYNC these atomics are superfluous. From Eric
Dumazet.
7) Run TCP more preemptibly, also from Eric Dumazet.
8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
driver, from Gal Pressman.
9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.
10) Improve BPF usage documentation, from Jesper Dangaard Brouer.
11) Support tunneling offloads in qed, from Manish Chopra.
12) aRFS offloading in mlx5e, from Maor Gottlieb.
13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
Leitner.
14) Add MSG_EOR support to TCP, this allows controlling packet
coalescing on application record boundaries for more accurate
socket timestamp sampling. From Martin KaFai Lau.
15) Fix alignment of 64-bit netlink attributes across the board, from
Nicolas Dichtel.
16) Per-vlan stats in bridging, from Nikolay Aleksandrov.
17) Several conversions of drivers to ethtool ksettings, from Philippe
Reynes.
18) Checksum neutral ILA in ipv6, from Tom Herbert.
19) Factorize all of the various marvell dsa drivers into one, from
Vivien Didelot
20) Add VF support to qed driver, from Yuval Mintz"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
Revert "phy dp83867: Make rgmii parameters optional"
r8169: default to 64-bit DMA on recent PCIe chips
phy dp83867: Make rgmii parameters optional
phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
bpf: arm64: remove callee-save registers use for tmp registers
asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
switchdev: pass pointer to fib_info instead of copy
net_sched: close another race condition in tcf_mirred_release()
tipc: fix nametable publication field in nl compat
drivers: net: Don't print unpopulated net_device name
qed: add support for dcbx.
ravb: Add missing free_irq() calls to ravb_close()
qed: Remove a stray tab
net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fec-mpc52xx: use phydev from struct net_device
bpf, doc: fix typo on bpf_asm descriptions
stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fs-enet: use phydev from struct net_device
...
The btrfs_{set,remove}xattr inode operations check for a read-only root
(btrfs_root_readonly) before calling into generic_{set,remove}xattr. If
this check is moved into __btrfs_setxattr, we can get rid of
btrfs_{set,remove}xattr.
This patch applies to mainline, I would like to keep it together with
the other xattr cleanups if possible, though. Could you please review?
Thanks,
Andreas
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Ubifs internally uses special inodes for storing xattrs. Those inodes
had NULL {get,set,remove}xattr inode operations before this change, so
xattr operations on them would fail. The super block's s_xattr field
would also apply to those special inodes. However, the inodes are not
visible outside of ubifs, and so no xattr operations will ever be
carried out on them anyway.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
bio_inc_remaining() and the block core's new async
__blkdev_issue_discard() interface
- make DM multipath's fast code-paths lockless, using lockless_deference,
to significantly improve large NUMA performance when using blk-mq. The
m->lock spinlock contention was a serious bottleneck.
- a few other small code cleanups and Documentation fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXNdGVAAoJEMUj8QotnQNaYYgH/Rf2am46A78kcR5b9nN2I+Tb
+MkqQyf8mXUzNHOu3v93CVugT+tBZuJcpHPJgCSc/1GXtgsjHLvbkO2Mc+Ioe45S
PlUA3HdRzxHSJ365SdYvT+bY+QQlGiySelSBrJHlikXC88kz3wqyQ146BT1Rw/w+
t0mi1liNJtZHsuH+3uO9uxe5+H7476lB84i79Kz0x8Ygv5+urgaSvDBRO5EH/hkJ
LN2WJWHDQLT4MtHKCuiMiLpu/1HGvISN2QrMPsFjC1d1DbbZvRWAxYDwGaP/C277
IflPo7sA/nds5T2vqb0fRTPuxBnzXdFMMvf+VQX7pjCnxlhfaxBkvNtnFpxW+oA=
=iCyS
-----END PGP SIGNATURE-----
Merge tag 'dm-4.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- based on Jens' 'for-4.7/core' to have DM thinp's discard support use
bio_inc_remaining() and the block core's new async __blkdev_issue_discard()
interface
- make DM multipath's fast code-paths lockless, using lockless_deference,
to significantly improve large NUMA performance when using blk-mq.
The m->lock spinlock contention was a serious bottleneck.
- a few other small code cleanups and Documentation fixes
* tag 'dm-4.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: unroll issue_discard() to create longer discard bio chains
dm thin: use __blkdev_issue_discard for async discard support
dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining()
dm raid: make sure no feature flags are set in metadata
dm ioctl: drop use of __GFP_REPEAT in copy_params()'s __vmalloc() call
dm stats: fix spelling mistake in Documentation
dm cache: update cache-policies.txt now that mq is an alias for smq
dm mpath: eliminate use of spinlock in IO fast-paths
dm mpath: move trigger_event member to the end of 'struct multipath'
dm mpath: use atomic_t for counting members of 'struct multipath'
dm mpath: switch to using bitops for state flags
dm thin: Remove return statement from void function
dm: remove unused mapped_device argument from free_tio()
Pull block driver updates from Jens Axboe:
"On top of the core pull request, this is the drivers pull request for
this merge window. This contains:
- Switch drivers to the new write back cache API, and kill off the
flush flags. From me.
- Kill the discard support for the STEC pci-e flash driver. It's
trivially broken, and apparently unmaintained, so it's safer to
just remove it. From Jeff Moyer.
- A set of lightnvm updates from the usual suspects (Matias/Javier,
and Simon), and fixes from Arnd, Jeff Mahoney, Sagi, and Wenwei
Tao.
- A set of updates for NVMe:
- Turn the controller state management into a proper state
machine. From Christoph.
- Shuffling of code in preparation for NVMe-over-fabrics, also
from Christoph.
- Cleanup of the command prep part from Ming Lin.
- Rewrite of the discard support from Ming Lin.
- Deadlock fix for namespace removal from Ming Lin.
- Use the now exported blk-mq tag helper for IO termination.
From Sagi.
- Various little fixes from Christoph, Guilherme, Keith, Ming
Lin, Wang Sheng-Hui.
- Convert mtip32xx to use the now exported blk-mq tag iter function,
from Keith"
* 'for-4.7/drivers' of git://git.kernel.dk/linux-block: (74 commits)
lightnvm: reserved space calculation incorrect
lightnvm: rename nr_pages to nr_ppas on nvm_rq
lightnvm: add is_cached entry to struct ppa_addr
lightnvm: expose gennvm_mark_blk to targets
lightnvm: remove mgt targets on mgt removal
lightnvm: pass dma address to hardware rather than pointer
lightnvm: do not assume sequential lun alloc.
nvme/lightnvm: Log using the ctrl named device
lightnvm: rename dma helper functions
lightnvm: enable metadata to be sent to device
lightnvm: do not free unused metadata on rrpc
lightnvm: fix out of bound ppa lun id on bb tbl
lightnvm: refactor set_bb_tbl for accepting ppa list
lightnvm: move responsibility for bad blk mgmt to target
lightnvm: make nvm_set_rqd_ppalist() aware of vblks
lightnvm: remove struct factory_blks
lightnvm: refactor device ops->get_bb_tbl()
lightnvm: introduce nvm_for_each_lun_ppa() macro
lightnvm: refactor dev->online_target to global nvm_targets
lightnvm: rename nvm_targets to nvm_tgt_type
...
Pull core block layer updates from Jens Axboe:
"This is the core block IO changes for this merge window. Nothing
earth shattering in here, it's mostly just fixes. In detail:
- Fix for a long standing issue where wrong ordering in blk-mq caused
order_to_size() to spew a warning. From Bart.
- Async discard support from Christoph. Basically just splitting our
sync interface into a submit + wait part.
- Add a cleaner interface for flagging whether a device has a write
back cache or not. We've previously overloaded blk_queue_flush()
with this, but let's make it more explicit. Drivers cleaned up and
updated in the drivers pull request. From me.
- Fix for a double check for whether IO accounting is enabled or not.
From Michael Callahan.
- Fix for the async discard from Mike Snitzer, reinstating the early
EOPNOTSUPP return if the device doesn't support discards.
- Also from Mike, export bio_inc_remaining() so dm can drop it's
private copy of it.
- From Ming Lin, add support for passing in an offset for request
payloads.
- Tag function export from Sagi, which will be used in NVMe in the
drivers pull.
- Two blktrace related fixes from Shaohua.
- Propagate NOMERGE flag when making a request from a bio, also from
Shaohua.
- An optimization to not parse cgroup paths in blk-throttle, if we
don't need to. From Shaohua"
* 'for-4.7/core' of git://git.kernel.dk/linux-block:
blk-mq: fix undefined behaviour in order_to_size()
blk-throttle: don't parse cgroup path if trace isn't enabled
blktrace: add missed mask name
blktrace: delete garbage for message trace
block: make bio_inc_remaining() interface accessible again
block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard
block: Minor blk_account_io_start usage cleanup
block: add __blkdev_issue_discard
block: remove struct bio_batch
block: copy NOMERGE flag from bio to request
block: add ability to flag write back caching on a device
blk-mq: Export tagset iter function
block: add offset in blk_add_request_payload()
writeback: Fix performance regression in wb_over_bg_thresh()
This document attempts to codify the intent around kernel self-protection
along with discussion of both existing and desired technologies, with
attention given to the rationale behind them, and the expectations of
their usage.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
[jc: applied fixes suggested by Randy]
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Pull vfs cleanups from Al Viro:
"More cleanups from Christoph"
* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nfsd: use RWF_SYNC
fs: add RWF_DSYNC aand RWF_SYNC
ceph: use generic_write_sync
fs: simplify the generic_write_sync prototype
fs: add IOCB_SYNC and IOCB_DSYNC
direct-io: remove the offset argument to dio_complete
direct-io: eliminate the offset argument to ->direct_IO
xfs: eliminate the pos variable in xfs_file_dio_aio_write
filemap: remove the pos argument to generic_file_direct_write
filemap: remove pos variables in generic_file_read_iter
Pull 'struct path' constification update from Al Viro:
"'struct path' is passed by reference to a bunch of Linux security
methods; in theory, there's nothing to stop them from modifying the
damn thing and LSM community being what it is, sooner or later some
enterprising soul is going to decide that it's a good idea.
Let's remove the temptation and constify all of those..."
* 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
constify ima_d_path()
constify security_sb_pivotroot()
constify security_path_chroot()
constify security_path_{link,rename}
apparmor: remove useless checks for NULL ->mnt
constify security_path_{mkdir,mknod,symlink}
constify security_path_{unlink,rmdir}
apparmor: constify common_perm_...()
apparmor: constify aa_path_link()
apparmor: new helper - common_path_perm()
constify chmod_common/security_path_chmod
constify security_sb_mount()
constify chown_common/security_path_chown
tomoyo: constify assorted struct path *
apparmor_path_truncate(): path->mnt is never NULL
constify vfs_truncate()
constify security_path_truncate()
[apparmor] constify struct path * in a bunch of helpers
Pull cifs xattr updates from Al Viro:
"This is the remaining parts of the xattr work - the cifs bits"
* 'for-cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cifs: Switch to generic xattr handlers
cifs: Fix removexattr for os2.* xattrs
cifs: Check for equality with ACL_TYPE_ACCESS and ACL_TYPE_DEFAULT
cifs: Fix xattr name checks
Pull UDF fixes from Jan Kara:
"A fix for UDF crash on corrupted media and one UDF header fixup"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Export superblock magic to userspace
udf: Prevent stack overflow on corrupted filesystem mount
This section of code initially looks redundant, but is required. This
improves the comment to explain more clearly why the reset is needed.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It can return NULL if layoutgets are blocked currently. Fix it to return
-EAGAIN in that case, so we can properly handle it in pnfs_update_layout.
Also, clean up and simplify the error handling -- eliminate "status" and
just use "lseg".
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
There are several problems in the way a stateid is selected for a
LAYOUTGET operation:
We pick a stateid to use in the RPC prepare op, but that makes
it difficult to serialize LAYOUTGETs that use the open stateid. That
serialization is done in pnfs_update_layout, which occurs well before
the rpc_prepare operation.
Between those two events, the i_lock is dropped and reacquired.
pnfs_update_layout can find that the list has lsegs in it and not do any
serialization, but then later pnfs_choose_layoutget_stateid ends up
choosing the open stateid.
This patch changes the client to select the stateid to use in the
LAYOUTGET earlier, when we're searching for a usable layout segment.
This way we can do it all while holding the i_lock the first time, and
ensure that we serialize any LAYOUTGET call that uses a non-layout
stateid.
This also means a rework of how LAYOUTGET replies are handled, as we
must now get the latest stateid if we want to retransmit in response
to a retryable error.
Most of those errors boil down to the fact that the layout state has
changed in some fashion. Thus, what we really want to do is to re-search
for a layout when it fails with a retryable error, so that we can avoid
reissuing the RPC at all if possible.
While the LAYOUTGET RPC is async, the initiating thread always waits for
it to complete, so it's effectively synchronous anyway. Currently, when
we need to retry a LAYOUTGET because of an error, we drive that retry
via the rpc state machine.
This means that once the call has been submitted, it runs until it
completes. So, we must move the error handling for this RPC out of the
rpc_call_done operation and into the caller.
In order to handle errors like NFS4ERR_DELAY properly, we must also
pass a pointer to the sliding timeout, which is now moved to the stack
in pnfs_update_layout.
The complicating errors are -NFS4ERR_RECALLCONFLICT and
-NFS4ERR_LAYOUTTRYLATER, as those involve a timeout after which we give
up and return NULL back to the caller. So, there is some special
handling for those errors to ensure that the layers driving the retries
can handle that appropriately.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If we get back something like NFS4ERR_OLD_STATEID, that will be
translated into -EAGAIN, and the do/while loop in send_layoutget
will drive the call again.
This is not quite what we want, I think. An error like that is a
sign that something has changed. That something could have been a
concurrent LAYOUTGET that would give us a usable lseg.
Lift the retry logic into pnfs_update_layout instead. That allows
us to redo the layout search, and may spare us from having to issue
an RPC.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Currently, the code will clear the fail bit if we get back a fatal
error. I don't think that's correct -- we want to clear that bit
if we do not get a fatal error.
Fixes: 0bcbf039f6 (nfs: handle request add failure properly)
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Setting just the NFS_LAYOUT_RETURN_REQUESTED flag doesn't do anything,
unless there are lsegs that are also being marked for return. At the
point where that happens this flag is also set, so these set_bit calls
don't do anything useful.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
LAYOUTRETURN is "special" in that servers and clients are expected to
work with old stateids. When the client sends a LAYOUTRETURN with an old
stateid in it then the server is expected to only tear down layout
segments that were present when that seqid was current. Ensure that the
client handles its accounting accordingly.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When we want to selectively do a LAYOUTRETURN, we need to specify a
stateid that represents most recent layout acquisition that is to be
returned.
When we mark a layout stateid to be returned, we update the return
sequence number in the layout header with that value, if it's newer
than the existing one. Then, when we go to do a LAYOUTRETURN on
layout header put, we overwrite the seqid in the stateid with the
saved one, and then zero it out.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
In later patches, we're going to teach the client to be more selective
about how it returns layouts. This means keeping a record of what the
stateid's seqid was at the time that the server handed out a layout
segment.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Otherwise, we'll end up returning layouts that we've just received if
the client issues a new LAYOUTGET prior to the LAYOUTRETURN.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If we are initializing reads or writes and can not connect to a DS, then
check whether or not IO is allowed through the MDS. If it is allowed,
reset to the MDS. Else, fail the layout segment and force a retry
of a new layout segment.
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Whenever we check to see if we have the needed number of DSes for the
action, we may also have to check to see whether IO is allowed to go to
the MDS or not.
[jlayton: fix merge conflict due to lack of localio patches here]
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This patch fixes a problem whereby the pNFS client falls back to doing
reads and writes through the metadata server even when the layout flag
FF_FLAGS_NO_IO_THRU_MDS is set.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
No need to make them a priority any more, or to make them succeed.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When we're using a delegation to represent our open state, we should
ensure that we use the stateid that was used to create that delegation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
In order to more easily distinguish what kind of stateid we are dealing
with, introduce a type that can be used to label the stateid structure.
The label will be useful both for debugging, but also when dealing with
operations like SETATTR, READ and WRITE that can take several different
types of stateid as arguments.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Clean up.
After "xprtrdma: Remove ro_unmap() from all registration modes",
there are no longer any sites that take rpcrdma_ia::qplock for read.
The one site that takes it for write is always single-threaded. It
is safe to remove it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
In a cluster failover scenario, it is desirable for the client to
attempt to reconnect quickly, as an alternate NFS server is already
waiting to take over for the down server. The client can't see that
a server IP address has moved to a new server until the existing
connection is gone.
For fabrics and devices where it is meaningful, set a definite upper
bound on the amount of time before it is determined that a
connection is no longer valid. This allows the RPC client to detect
connection loss in a timely matter, then perform a fresh resolution
of the server GUID in case it has changed (cluster failover).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Clean up: The ro_unmap method is no longer used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>