Currently we have two proc interfaces to set oom_score_adj. The legacy
/proc/<pid>/oom_adj and /proc/<pid>/oom_score_adj which both have their
specific handlers. Big part of the logic is duplicated so extract the
common code into __set_oom_adj helper. Legacy knob still expects some
details slightly different so make sure those are handled same way - e.g.
the legacy mode ignores oom_score_adj_min and it warns about the usage.
This patch shouldn't introduce any functional changes.
Link: http://lkml.kernel.org/r/1466426628-15074-4-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg has pointed out that can simplify both oom_adj_{read,write} and
oom_score_adj_{read,write} even further and drop the sighand lock. The
main purpose of the lock was to protect p->signal from going away but this
will not happen since ea6d290ca3 ("signals: make task_struct->signal
immutable/refcountable").
The other role of the lock was to synchronize different writers,
especially those with CAP_SYS_RESOURCE. Introduce a mutex for this
purpose. Later patches will need this lock anyway.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/1466426628-15074-3-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Series "Handle oom bypass more gracefully", V5
The following 10 patches should put some order to very rare cases of mm
shared between processes and make the paths which bypass the oom killer
oom reapable and therefore much more reliable finally. Even though mm
shared outside of thread group is rare (either vforked tasks for a short
period, use_mm by kernel threads or exotic thread model of
clone(CLONE_VM) without CLONE_SIGHAND) it is better to cover them. Not
only it makes the current oom killer logic quite hard to follow and
reason about it can lead to weird corner cases. E.g. it is possible to
select an oom victim which shares the mm with unkillable process or
bypass the oom killer even when other processes sharing the mm are still
alive and other weird cases.
Patch 1 drops bogus task_lock and mm check from oom_{score_}adj_write.
This can be considered a bug fix with a low impact as nobody has noticed
for years.
Patch 2 drops sighand lock because it is not needed anymore as pointed
by Oleg.
Patch 3 is a clean up of oom_score_adj handling and a preparatory work
for later patches.
Patch 4 enforces oom_adj_score to be consistent between processes
sharing the mm to behave consistently with the regular thread groups.
This can be considered a user visible behavior change because one thread
group updating oom_score_adj will affect others which share the same mm
via clone(CLONE_VM). I argue that this should be acceptable because we
already have the same behavior for threads in the same thread group and
sharing the mm without signal struct is just a different model of
threading. This is probably the most controversial part of the series,
I would like to find some consensus here. There were some suggestions
to hook some counter/oom_score_adj into the mm_struct but I feel that
this is not necessary right now and we can rely on proc handler +
oom_kill_process to DTRT. I can be convinced otherwise but I strongly
think that whatever we do the userspace has to have a way to see the
current oom priority as consistently as possible.
Patch 5 makes sure that no vforked task is selected if it is sharing the
mm with oom unkillable task.
Patch 6 ensures that all user tasks sharing the mm are killed which in
turn makes sure that all oom victims are oom reapable.
Patch 7 guarantees that task_will_free_mem will always imply reapable
bypass of the oom killer.
Patch 8 is new in this version and it addresses an issue pointed out by
0-day OOM report where an oom victim was reaped several times.
Patch 9 puts an upper bound on how many times oom_reaper tries to reap a
task and hides it from the oom killer to move on when no progress can be
made. This will give an upper bound to how long an oom_reapable task
can block the oom killer from selecting another victim if the oom_reaper
is not able to reap the victim.
Patch 10 tries to plug the (hopefully) last hole when we can still lock
up when the oom victim is shared with oom unkillable tasks (kthreads and
global init). We just try to be best effort in that case and rather
fallback to kill something else than risk a lockup.
This patch (of 10):
Both oom_adj_write and oom_score_adj_write are using task_lock, check for
task->mm and fail if it is NULL. This is not needed because the
oom_score_adj is per signal struct so we do not need mm at all. The code
has been introduced by 3d5992d2ac ("oom: add per-mm oom disable count")
but we do not do per-mm oom disable since c9f01245b6 ("oom: remove
oom_disable_count").
The task->mm check is even not correct because the current thread might
have exited but the thread group might be still alive - e.g. thread group
leader would lead that echo $VAL > /proc/pid/oom_score_adj would always
fail with EINVAL while /proc/pid/task/$other_tid/oom_score_adj would
succeed. This is unexpected at best.
Remove the lock along with the check to fix the unexpected behavior and
also because there is not real need for the lock in the first place.
Link: http://lkml.kernel.org/r/1466426628-15074-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This time we have bit of largish changes:
New drivers:
- Xilinx zynqmp dma engine driver.
- Marvell xor2 driver.
Updates:
- dmatest sg support.
- updates and enhancements to Xilinx drivers, adding of cyclic mode.
- clock handling fixes across drivers.
- removal of OOM messages on kzalloc across subsystem.
- interleaved transfers support in omap driver.
- runtime pm support in qcom bam dma.
- tasklet kill freeup across drivers.
- irq cleanup on remove across drivers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXmZj8AAoJEHwUBw8lI4NHANYP/0e7LkiopiBdxKyYNwViyrfL
XGsB3Fcd8MvYnojzlRUNUi5dt86YfXM5JixMWg8e2WeDSK9AXBEpRBHlEJXA7FNn
/BXy8FZxW4YkXOBSOL+GbDgC48CRiQrmoHSsYE1e9qosJTTpDJlTMd0I3EmdAK53
wjNKBYGv5ORNMXdYXJe/6uUqbN0QT7Qr7a9+Q1qgwhF1wd8pKjVXvDD6Qj2NeM8L
OCySngBECDWTCEYpuNXHbtp/s8QpWteGwCTyQYTkuNsFYM2H3nQCXi9ObMOGR9IN
44FpLgeLeIgW03F/1DmVvMWDYgCgow+b1usqHRWC7x33K/ArqzZzAsPyKePqOBU5
B9zzAla+/QKi73mKauqgHl/Siokr9FZdFpvTVWf2ssm/k3b3GJMO9tPPJ2ocyvZ6
lwlHrTMOV9n2tzeBkkadgLPWO6yDlcYlDVjj1P36DzC88GhjEfFOlq3tqcEJWxmR
adDFX2yRXdtpA6XSiI9l7jxUHxBWZwOTPJ6h1gznk/wVVd0TjyzZteX2gPc8lkcL
Aedhyx0zgGl5bE4+eBsNKLiOrUj468j7Cb87Hhe4YygDw/2T2Ff5RDGxQ9iRrkCb
YRPP21453VS01GuF2T1vzziJ/tGl8IwIon1EOSsTXuImH1sm7Or3W+Cyrke9AZgo
0M8kfHJ2EfcRnwHE8N2H
=tYp0
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-4.8-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
"This time we have bit of largish changes: two new drivers, bunch of
updates and cleanups to existing set. Nothing super exciting though.
New drivers:
- Xilinx zynqmp dma engine driver
- Marvell xor2 driver
Updates:
- dmatest sg support
- updates and enhancements to Xilinx drivers, adding of cyclic mode
- clock handling fixes across drivers
- removal of OOM messages on kzalloc across subsystem
- interleaved transfers support in omap driver
- runtime pm support in qcom bam dma
- tasklet kill freeup across drivers
- irq cleanup on remove across drivers"
* tag 'dmaengine-4.8-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (94 commits)
dmaengine: k3dma: add missing clk_disable_unprepare() on error in k3_dma_probe()
dmaengine: zynqmp_dma: add missing MODULE_LICENSE
dmaengine: qcom_hidma: use for_each_matching_node() macro
dmaengine: zynqmp_dma: Fix static checker warning
dmaengine: omap-dma: Support for interleaved transfer
dmaengine: ioat: statify symbol
dmaengine: pxa_dma: implement device_synchronize
dmaengine: imx-sdma: remove assignment never used
dmaengine: imx-sdma: remove dummy assignment
dmaengine: cppi: remove unused and bogus check
dmaengine: qcom_hidma_lli: kill the tasklets upon exit
dmaengine: pxa_dma: remove owner assignment
dmaengine: fsl_raid: remove owner assignment
dmaengine: coh901318: remove owner assignment
dmaengine: qcom_hidma: kill the tasklets upon exit
dmaengine: txx9dmac: explicitly freeup irq
dmaengine: sirf-dma: kill the tasklets upon exit
dmaengine: s3c24xx: kill the tasklets upon exit
dmaengine: s3c24xx: explicitly freeup irq
dmaengine: pl330: explicitly freeup irq
...
Add missing of_node_put() in the Qualcomm driver and update MAINTAINERS to make
sure all hwspinlock related files have a maintainer listed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXmQZdAAoJEAsfOT8Nma3FY2sP/R/sCOX33Zhu3HsvJikSwNKl
rypmM9lkR8wNMKmbdz59ocH/GsRaha1eWA+5u2w3TEaFRBVRj7wbWgrofpURlZpl
2zt7yQP6lZoXKucKA8i2u6ghFoy7GkNnCeR8ZZaxX0LkX/Nd6A2hKGbtnTGfg05Q
cY7z3XCbaa5RUfr0M4RU7kKtSqgQ/914ORzyDWEUeilZJxiYpnE5vHW+h/UfiEYC
SmUUcexNbXhPVEmwFoK67Ap/97BXqNuINewb959iQ6/a+WbOVGQUL+Rzsc0KBHnW
6T3T/Abv/OKrm8Vf9IaVx6esN1xSGOaOShgCBWXdgjIGGb9O2Zz8ViksuvIUlDtH
crH1QDq06ZCWYZbO6n1hNSpTpSiHp8zU4kufQERF5ZI2dfQyDfgn+8MVXTm6OTy8
zF46GXcLA6glVxtwjnM75WdZVtQ+Vi7jrKw9+6TeIo6ckM5TXUDAHEIccmZ+F6xi
dzhpBb9s+5YaPhLd/OxvJ+D0oYvkFOS91bO1j7gVn5P6yTQBtLtIPc9+gW20gBbQ
xFlgSY2HYTHFmOa7qbWx0L5ft5n9fCQgsUHS/D240FQRLCMxzwEJaZATNDcttqS8
U54f3eKjEjeF9alG2Ad6VZzDnVd4WErr/KUYS6DsJSMZNaSlvPztCQIvzGdnSFNG
vjn4Z7umA0SXJuFoyuaC
=Qhb2
-----END PGP SIGNATURE-----
Merge tag 'hwlock-v4.8' of git://github.com/andersson/remoteproc
Pull hwspinlock updates from Bjorn Andersson:
"Add missing of_node_put() in the Qualcomm driver and update
MAINTAINERS to make sure all hwspinlock related files have a
maintainer listed"
* tag 'hwlock-v4.8' of git://github.com/andersson/remoteproc:
MAINTAINERS: Update hwspinlock paths
hwspinlock: qcom_hwspinlock: add missing of_node_put after calling of_parse_phandle
Introduce remoteproc driver for controlling the modem/DSP Hexagon CPU
found in a multitude of Qualcomm platform. Also cleans up a race
condition/potential leak during registration of remoteprocs and includes
devicetree bindings in the MAINTAINERS entry.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXmQY5AAoJEAsfOT8Nma3FzGMP/jq1S/DLm2vRIw547Ev04LQ0
4suaKkovLMhGp2xsGsz50gxpV1YBo4MTI3DYXBP/kr4U3aCHIw+1mXwiJ1pWsv30
4MWVheD5hLeyK0JMFiEPwmI22zlX81YA0WPOimu82WUAyqQ86Zcix1KPVfHbq5SP
80aiX64BZu0PXUFAhhBUaW4qy/2nj4gcvN6Ied8LCwDTmCIBSE2LsjxaxKvxLIgA
OMWpMGfz6jIleC4kqFDs2qL4rhQh/5cRm+QOntlehOsJfrp7QN1yaJ0WahsQWRjk
AQjweQn7zgbzj5VPcpqCx9K4NhOa1Y/vZM/i3EPlYUKwV3cJnk5BgsyB1zyXd2s5
+Ul42y8pPTeBgwKisT6BhWLQE13uYViuM4N+rqCX1sH4c5wTzScCDz5tC300Plj9
omXiJmLRnAJ6soQGEnoYl6S7+e873hqOuHFvqCfkxFWVOlJ5blODOFAHXC/8Gwch
cKG2P0Hpp8KZNljgBhIvU7AHRWOfeojZvezx5epb902rZKLXrvgKOOkbAwklr9wu
lAKdbS/lunPIA5Lbom4db7zOAPa8QWNHpHGsCc4MDMY6hcoGA5FZALIdEdqimnro
rrG52apH2EKyWPpsLtJ0vglpmFUz1euabQCGkR1roYvDjXxN8SpC08mcX92r4XOM
cFxetFZDcF4f98PG96AY
=XlWX
-----END PGP SIGNATURE-----
Merge tag 'rproc-v4.8' of git://github.com/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
"Introduce remoteproc driver for controlling the modem/DSP Hexagon CPU
found in a multitude of Qualcomm platform.
Also cleans up a race condition/potential leak during registration of
remoteprocs and includes devicetree bindings in the MAINTAINERS entry"
* tag 'rproc-v4.8' of git://github.com/andersson/remoteproc:
remoteproc: qcom: hexagon: Clean up mpss validation
remoteproc: qcom: remove redundant dev_err call in q6v5_init_mem()
remoteproc: qcom: Driver for the self-authenticating Hexagon v5
dt-binding: remoteproc: Introduce Hexagon loader binding
remoteproc: Fix potential race condition in rproc_add
MAINTAINERS: Add file patterns for remoteproc device tree bindings
This prevents a double-fetch from user space that can lead to to an
undersized allocation and heap overflow.
Fixes: 54dbc15172 ("vfs: hoist the btrfs deduplication ioctl to the vfs")
Signed-off-by: Scott Bauer <sbauer@plzdonthack.me>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull HID updates from Jiri Kosina:
- new hid-alps driver for ALPS Touchpad-Stick device, from Masaki Ota
- much improved and generalized HID led handling, and merge of
specialized hid-thingm driver into this generic hid-led one, from
Heiner Kallweit
- i2c-hid power management improvements from Fu Zhonghui and Guohua
Zhong
- uhid initialization race fix from Roderick Colenbrander
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (21 commits)
HID: add usb device id for Apple Magic Keyboard
HID: hid-led: fix Delcom support on big endian systems
HID: hid-led: add support for Greynut Luxafor
HID: hid-led: add support for Delcom Visual Signal Indicator G2
HID: hid-led: remove report id from struct hidled_config
HID: alps: a few cleanups
HID: remove ThingM blink(1) driver
HID: hid-led: add support for ThingM blink(1)
HID: hid-led: add support for reading from LED devices
HID: hid-led: add support for devices with multiple independent LEDs
HID: i2c-hid: set power sleep before shutdown
HID: alps: match alps devices in core
HID: thingm: simplify debug output code
HID: alps: pass correct sizes to hid_hw_raw_request()
HID: alps: struct u1_dev *priv is internal to the driver
HID: add Alps I2C HID Touchpad-Stick support
HID: led: fix config
usb: misc: remove outdated USB LED driver
HID: migrate USB LED driver from usb misc to hid
HID: i2c_hid: enable i2c-hid devices to suspend/resume asynchronously
...
Pull quota update from Jan Kara:
"time64 support for quota"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: use time64_t internally
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJXmhmRAAoJEPL5WVaVDYGjOIcH/ixqzAvUqiquF+5EWVvaBtgU
xZvJ+ryaqifQKfK1GeC7394O7lb8zDA/Z3VvERPwmsHpwqpsoFE+V/cmH8gKc4/A
yLuhEG24GFFwSpAidHNBKR3SaIYvnfDIpq05JId95Yq4AGNzaMapYhMY9kjGzvyY
/SMN7v50nl62aMo9mikequjxRHqUPamr1cP2aRjUZ7RqD4uruDwIup75hS+uS/gb
DWnAeqnVj0A+5EhKVkFpf5H8KP1L/ob4oghrnxL4lLyr+tuh67LVFXV3Fv4xL5Z+
Bnbqqh8Y70GfpSBYrfDMZlZ1cgauXZe+sDd+Q09uEfCA8ctFE6wkXCKzVGrRTuY=
=DN2X
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random driver fix from Ted Ts'o:
"Fix a boot failure on systems with non-contiguous NUMA id's"
* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: use for_each_online_node() to iterate over NUMA nodes
Pull vfs updates from Al Viro:
"Assorted cleanups and fixes.
Probably the most interesting part long-term is ->d_init() - that will
have a bunch of followups in (at least) ceph and lustre, but we'll
need to sort the barrier-related rules before it can get used for
really non-trivial stuff.
Another fun thing is the merge of ->d_iput() callers (dentry_iput()
and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
except the one in __d_lookup_lru())"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
fs/dcache.c: avoid soft-lockup in dput()
vfs: new d_init method
vfs: Update lookup_dcache() comment
bdev: get rid of ->bd_inodes
Remove last traces of ->sync_page
new helper: d_same_name()
dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
vfs: clean up documentation
vfs: document ->d_real()
vfs: merge .d_select_inode() into .d_real()
unify dentry_iput() and dentry_unlink_inode()
binfmt_misc: ->s_root is not going anywhere
drop redundant ->owner initializations
ufs: get rid of redundant checks
orangefs: constify inode_operations
missed comment updates from ->direct_IO() prototype change
file_inode(f)->i_mapping is f->f_mapping
trim fsnotify hooks a bit
9p: new helper - v9fs_parent_fid()
debugfs: ->d_parent is never NULL or negative
...
LTP madvise05 was generating mm splat
| [ARCLinux]# /sd/ltp/testcases/bin/madvise05
| BUG: Bad page map in process madvise05 pte:80e08211 pmd:9f7d4000
| page:9fdcfc90 count:1 mapcount:-1 mapping: (null) index:0x0 flags: 0x404(referenced|reserved)
| page dumped because: bad pte
| addr:200b8000 vm_flags:00000070 anon_vma: (null) mapping: (null) index:1005c
| file: (null) fault: (null) mmap: (null) readpage: (null)
| CPU: 2 PID: 6707 Comm: madvise05
And for newer kernels, the system was rendered unusable afterwards.
The problem was mprotect->pte_modify() clearing PTE_SPECIAL (which is
set to identify the special zero page wired to the pte).
When pte was finally unmapped, special casing for zero page was not
done, and instead it was treated as a "normal" page, tripping on the
map counts etc.
This fixes ARC STAR 9001053308
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.
That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.
It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.
Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.
* merge branch 'salted-string-hash':
fs/dcache.c: Save one 32-bit multiply in dcache lookup
vfs: make the string hashes salt the hash
A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes,
and in that case NFS_INO_INVALID_ATTR is never set on the second pass
through nfs_update_inode(). The existing check to skip the clearing of
NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this
case (see commit 10b7e9ad44: "pNFS: Don't mark the inode as revalidated
if a LAYOUTCOMMIT is outstanding"). We know that if a LAYOUTCOMMIT is
outstanding then attributes will need upating, so always set
NFS_INO_INVALID_ATTR.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
On the tear-down path, the dead CPU callback for the timers was
misplaced within the 'cpuhp_state' enumeration. There is a hidden
dependency between the timers and block multiqueue. The timers
callback must happen before the block multiqueue callback otherwise a
RCU stall occurs.
Move the timers callback to the proper place in the state machine.
Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 24f73b9971 ("timers/core: Convert to hotplug state machine")
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: rt@linutronix.de
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1469610498-25914-1-git-send-email-rcochran@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The md device might not have personality (for example, ddf raid array). The
issue is introduced by 8430e7e0af9a15(md: disconnect device from personality
before trying to remove it)
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Fix format and type mismatches in a couple debug prints in the
Broadcom PDC driver. Use %pad for dma_addr_t and %pa for
resource_size_t.
Signed-off-by: Rob Rice <rob.rice@broadcom.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Some of the checks didn't handle frev 2 tables properly.
amdgpu doesn't support any tables pre-frev 2, so drop
the checks.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The genksyms helper in the kernel cannot parse a type definition
like "typeof(((type *)0)->keyfld)" that is used in the DEFINE_RB_FUNCS
helper, causing the following EXPORT_SYMBOL() statement to be ignored
when computing the crcs, and triggering a warning about this:
WARNING: "ceph_monc_do_statfs" [fs/ceph/ceph.ko] has no CRC
To work around the problem, we can rewrite the type to reference
an undefined 'extern' symbol instead of a NULL pointer. This is
evidently ok for genksyms, and it no longer complains about the
line when calling it with 'genksyms -w'.
I've looked briefly into extending genksyms instead, but it seems
really hard to do. Jan Beulich introduced basic support for 'typeof'
a while ago in dc53324060 ("genksyms: fix typeof() handling"),
but that is not sufficient for the expression we have here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: fcd00b68bb ("libceph: DEFINE_RB_FUNCS macro")
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 3c31760e76 ('drm/arm: mali-dp: Set crtc.port to the port
instead of the endpoint')
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Acked-by: Brian Starkey <brian.starkey@arm.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1469672066-13401-1-git-send-email-weiyj.lk@gmail.com
Convert asciidoc-formatted docs to rst in accordance with Jonathan's and
Jani's effort to use sphinx for kernel-doc rendering in 4.8.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/4c1b29986fa77772156b1af0c965d3799e43a47b.1467628307.git.lukas@wunner.de
It turns out that if the guest does a H_CEDE while the CPU is in
a transactional state, and the H_CEDE does a nap, and the nap
loses the architected state of the CPU (which is is allowed to do),
then we lose the checkpointed state of the virtual CPU. In addition,
the transactional-memory state recorded in the MSR gets reset back
to non-transactional, and when we try to return to the guest, we take
a TM bad thing type of program interrupt because we are trying to
transition from non-transactional to transactional with a hrfid
instruction, which is not permitted.
The result of the program interrupt occurring at that point is that
the host CPU will hang in an infinite loop with interrupts disabled.
Thus this is a denial of service vulnerability in the host which can
be triggered by any guest (and depending on the guest kernel, it can
potentially triggered by unprivileged userspace in the guest).
This vulnerability has been assigned the ID CVE-2016-5412.
To fix this, we save the TM state before napping and restore it
on exit from the nap, when handling a H_CEDE in real mode. The
case where H_CEDE exits to host virtual mode is already OK (as are
other hcalls which exit to host virtual mode) because the exit
path saves the TM state.
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This moves the transactional memory state save and restore sequences
out of the guest entry/exit paths into separate procedures. This is
so that these sequences can be used in going into and out of nap
in a subsequent patch.
The only code changes here are (a) saving and restore LR on the
stack, since these new procedures get called with a bl instruction,
(b) explicitly saving r1 into the PACA instead of assuming that
HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary
and redundant setting of MSR[TM] that should have been removed by
commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory
support", 2013-09-24) but wasn't.
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
We accidentally take the "port->lock" twice in a row. This old code
was supposed to be deleted.
Fixes: e58e241c17 ('sparc: serial: Clean up the locking for -rt')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch complains that these tests are off by one, which is true but not
life threatening.
arch/sparc/kernel/irq_32.c:169 irq_link()
error: buffer overflow 'irq_map' 384 <= 384
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, the cache of the ahash requests was updated from the 'complete'
operation. This complete operation is called from mv_cesa_tdma_process
before the cleanup operation, which means that the content of req->src
can be read and copied when it is still mapped. This commit fixes the
issue by moving this cache update from mv_cesa_ahash_complete to
mv_cesa_ahash_req_cleanup, so the copy is done once the sglist is
unmapped.
Fixes: 1bf6682cb3 ("crypto: marvell - Add a complete operation for..")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The flag CRYPTO_TFM_REQ_MAY_BACKLOG is optional and can be set from the
user to put requests into the backlog queue when the main cryptographic
queue is full. Before calling mv_cesa_tdma_chain we must check the value
of the return status to be sure that the current request has been
correctly queued or added to the backlog.
Fixes: 85030c5168 ("crypto: marvell - Add support for chaining...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
So far in mv_cesa_ablkcipher_dma_req_init, if an error is thrown while
the tdma chain is built there is a memory leak. This issue exists
because the chain is assigned later at the end of the function, so the
cleanup function is called with the wrong version of the chain.
Fixes: db509a4533 ("crypto: marvell/cesa - add TDMA support")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The Broadcom PDC mailbox driver is a mailbox controller that
manages data transfers to and from one or more offload engines.
Signed-off-by: Rob Rice <rob.rice@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Add the device tree binding documentation for the PDC hardware
in Broadcom iProc SoCs.
Signed-off-by: Rob Rice <rob.rice@broadcom.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
During following a symbolic link we received err_buf from SMB2_open().
While the validity of SMB2 error response is checked previously
in smb2_check_message() a symbolic link payload is not checked at all.
Fix it by adding such checks.
Cc: Dan Carpenter <dan.carpenter@oracle.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
if, when mounting //HOST/share/sub/dir/foo we can query /sub/dir/foo but
not any of the path components above:
- store the /sub/dir/foo prefix in the cifs super_block info
- in the superblock, set root dentry to the subpath dentry (instead of
the share root)
- set a flag in the superblock to remember it
- use prefixpath when building path from a dentry
fixes bso#8950
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
This fixes a crash on s390 with fake NUMA enabled.
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Fixes: 1e7f583af6 ("random: make /dev/urandom scalable for silly userspace programs")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Some of our "for_each_xyz()" macro constructs make gcc unhappy about
lack of braces around if-statements inside or outside the loop, because
the loop construct itself has a "if-then-else" statement inside of it.
The resulting warnings look something like this:
drivers/gpu/drm/i915/i915_debugfs.c: In function ‘i915_dump_lrc’:
drivers/gpu/drm/i915/i915_debugfs.c:2103:6: warning: suggest explicit braces to avoid ambiguous ‘else’ [-Wparentheses]
if (ctx != dev_priv->kernel_context)
^
even if the code itself is fine.
Since the warning is fairly easy to avoid by adding a braces around the
if-statement near the for_each_xyz() construct, do so, rather than
disabling the otherwise potentially useful warning.
(The if-then-else statements used in the "for_each_xyz()" constructs are
designed to be inherently safe even with no braces, but in this case
it's quite understandable that gcc isn't really able to tell that).
This finally leaves the standard "allmodconfig" build with just a
handful of remaining warnings, so new and valid warnings hopefully will
stand out.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently IS_ENABLED() produces an expression surrounded by parentheses,
which allows this code to compile, generating eg:
else if (1 || 0)
hpte_init_native();
However a change to the macro in the kbuild tree will break this in
future by removing the parentheses.
Fixes: 7353644fa9 ("powerpc/mm: Fix build break when PPC_NATIVE=n")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Newer versions of gcc warn about the use of __builtin_return_address()
with a non-zero argument when "-Wall" is specified:
kernel/trace/trace_irqsoff.c: In function ‘stop_critical_timings’:
kernel/trace/trace_irqsoff.c:433:86: warning: calling ‘__builtin_return_address’ with a nonzero argument is unsafe [-Wframe-address]
stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
[ .. repeats a few times for other similar cases .. ]
It is true that a non-zero argument is somewhat dangerous, and we do not
actually have very many uses of that in the kernel - but the ftrace code
does use it, and as Stephen Rostedt says:
"We are well aware of the danger of using __builtin_return_address() of
> 0. In fact that's part of the reason for having the "thunk" code in
x86 (See arch/x86/entry/thunk_{64,32}.S). [..] it adds extra frames
when tracking irqs off sections, to prevent __builtin_return_address()
from accessing bad areas. In fact the thunk_32.S states: 'Trampoline to
trace irqs off. (otherwise CALLER_ADDR1 might crash)'."
For now, __builtin_return_address() with a non-zero argument is the best
we can do, and the warning is not helpful and can end up making people
miss other warnings for real problems.
So disable the frame-address warning on compilers that need it.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ceph_llseek does not correctly return NXIO errors because the 'out' path
always returns 'offset'.
Fixes: 06222e491e ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so
that it can satisfy its internal needs. Inspecting the code shows that
most of the caches are indeed reclaimable since they are directly
related to the generic inode/dentry shrinkers. However, one of the
cache used to satisfy struct file is not reclaimable since its
entries are freed only when the last reference to the file is
dropped. If a heavily loaded node opens a lot of files it can
introduce non-trivial discrepancies between memory shown as reclaimable
and what is actually reclaimed when drop_caches is used.
Fix this by removing the reclaimable flag for the file's cache.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Add a 'wake' flag to ceph_cap_flush struct, which indicates if there
is someone waiting for it to finish. When getting flush ack message,
we check the 'wake' flag in corresponding ceph_cap_flush struct to
decide if we should wake up waiters. One corner case is that the
acked cap flush has 'wake' flags is set, but it is not the first one
on the flushing list. We do not wake up waiters in this case, set
'wake' flags of preceding ceph_cap_flush struct instead
Signed-off-by: Yan, Zheng <zyan@redhat.com>
This patch devide __ceph_flush_snaps() into two stags. In the first
stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them
to cap flush lists. __ceph_flush_snaps() keeps holding the
i_ceph_lock in this stagge. So inode's auth cap can not change. In
the second stage, __ceph_flush_snaps() send flushsnap cap messages.
i_ceph_lock is unlocked before sending each cap message. If auth cap
changes in the middle, __ceph_flush_snaps() just stops. This is OK
because kick_flushing_inode_caps() will re-send flushsnap cap messages
to inode's new auth MDS.
Signed-off-by: Yan, Zheng <zyan@redhat.com>