oom_score_adj is shared for the thread groups (via struct signal) but this
is not sufficient to cover processes sharing mm (CLONE_VM without
CLONE_SIGHAND) and so we can easily end up in a situation when some
processes update their oom_score_adj and confuse the oom killer. In the
worst case some of those processes might hide from the oom killer
altogether via OOM_SCORE_ADJ_MIN while others are eligible. OOM killer
would then pick up those eligible but won't be allowed to kill others
sharing the same mm so the mm wouldn't release the mm and so the memory.
It would be ideal to have the oom_score_adj per mm_struct because that is
the natural entity OOM killer considers. But this will not work because
some programs are doing
vfork()
set_oom_adj()
exec()
We can achieve the same though. oom_score_adj write handler can set the
oom_score_adj for all processes sharing the same mm if the task is not in
the middle of vfork. As a result all the processes will share the same
oom_score_adj. The current implementation is rather pessimistic and
checks all the existing processes by default if there is more than 1
holder of the mm but we do not have any reliable way to check for external
users yet.
Link: http://lkml.kernel.org/r/1466426628-15074-5-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we have two proc interfaces to set oom_score_adj. The legacy
/proc/<pid>/oom_adj and /proc/<pid>/oom_score_adj which both have their
specific handlers. Big part of the logic is duplicated so extract the
common code into __set_oom_adj helper. Legacy knob still expects some
details slightly different so make sure those are handled same way - e.g.
the legacy mode ignores oom_score_adj_min and it warns about the usage.
This patch shouldn't introduce any functional changes.
Link: http://lkml.kernel.org/r/1466426628-15074-4-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg has pointed out that can simplify both oom_adj_{read,write} and
oom_score_adj_{read,write} even further and drop the sighand lock. The
main purpose of the lock was to protect p->signal from going away but this
will not happen since ea6d290ca3 ("signals: make task_struct->signal
immutable/refcountable").
The other role of the lock was to synchronize different writers,
especially those with CAP_SYS_RESOURCE. Introduce a mutex for this
purpose. Later patches will need this lock anyway.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/1466426628-15074-3-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Series "Handle oom bypass more gracefully", V5
The following 10 patches should put some order to very rare cases of mm
shared between processes and make the paths which bypass the oom killer
oom reapable and therefore much more reliable finally. Even though mm
shared outside of thread group is rare (either vforked tasks for a short
period, use_mm by kernel threads or exotic thread model of
clone(CLONE_VM) without CLONE_SIGHAND) it is better to cover them. Not
only it makes the current oom killer logic quite hard to follow and
reason about it can lead to weird corner cases. E.g. it is possible to
select an oom victim which shares the mm with unkillable process or
bypass the oom killer even when other processes sharing the mm are still
alive and other weird cases.
Patch 1 drops bogus task_lock and mm check from oom_{score_}adj_write.
This can be considered a bug fix with a low impact as nobody has noticed
for years.
Patch 2 drops sighand lock because it is not needed anymore as pointed
by Oleg.
Patch 3 is a clean up of oom_score_adj handling and a preparatory work
for later patches.
Patch 4 enforces oom_adj_score to be consistent between processes
sharing the mm to behave consistently with the regular thread groups.
This can be considered a user visible behavior change because one thread
group updating oom_score_adj will affect others which share the same mm
via clone(CLONE_VM). I argue that this should be acceptable because we
already have the same behavior for threads in the same thread group and
sharing the mm without signal struct is just a different model of
threading. This is probably the most controversial part of the series,
I would like to find some consensus here. There were some suggestions
to hook some counter/oom_score_adj into the mm_struct but I feel that
this is not necessary right now and we can rely on proc handler +
oom_kill_process to DTRT. I can be convinced otherwise but I strongly
think that whatever we do the userspace has to have a way to see the
current oom priority as consistently as possible.
Patch 5 makes sure that no vforked task is selected if it is sharing the
mm with oom unkillable task.
Patch 6 ensures that all user tasks sharing the mm are killed which in
turn makes sure that all oom victims are oom reapable.
Patch 7 guarantees that task_will_free_mem will always imply reapable
bypass of the oom killer.
Patch 8 is new in this version and it addresses an issue pointed out by
0-day OOM report where an oom victim was reaped several times.
Patch 9 puts an upper bound on how many times oom_reaper tries to reap a
task and hides it from the oom killer to move on when no progress can be
made. This will give an upper bound to how long an oom_reapable task
can block the oom killer from selecting another victim if the oom_reaper
is not able to reap the victim.
Patch 10 tries to plug the (hopefully) last hole when we can still lock
up when the oom victim is shared with oom unkillable tasks (kthreads and
global init). We just try to be best effort in that case and rather
fallback to kill something else than risk a lockup.
This patch (of 10):
Both oom_adj_write and oom_score_adj_write are using task_lock, check for
task->mm and fail if it is NULL. This is not needed because the
oom_score_adj is per signal struct so we do not need mm at all. The code
has been introduced by 3d5992d2ac ("oom: add per-mm oom disable count")
but we do not do per-mm oom disable since c9f01245b6 ("oom: remove
oom_disable_count").
The task->mm check is even not correct because the current thread might
have exited but the thread group might be still alive - e.g. thread group
leader would lead that echo $VAL > /proc/pid/oom_score_adj would always
fail with EINVAL while /proc/pid/task/$other_tid/oom_score_adj would
succeed. This is unexpected at best.
Remove the lock along with the check to fix the unexpected behavior and
also because there is not real need for the lock in the first place.
Link: http://lkml.kernel.org/r/1466426628-15074-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This time we have bit of largish changes:
New drivers:
- Xilinx zynqmp dma engine driver.
- Marvell xor2 driver.
Updates:
- dmatest sg support.
- updates and enhancements to Xilinx drivers, adding of cyclic mode.
- clock handling fixes across drivers.
- removal of OOM messages on kzalloc across subsystem.
- interleaved transfers support in omap driver.
- runtime pm support in qcom bam dma.
- tasklet kill freeup across drivers.
- irq cleanup on remove across drivers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXmZj8AAoJEHwUBw8lI4NHANYP/0e7LkiopiBdxKyYNwViyrfL
XGsB3Fcd8MvYnojzlRUNUi5dt86YfXM5JixMWg8e2WeDSK9AXBEpRBHlEJXA7FNn
/BXy8FZxW4YkXOBSOL+GbDgC48CRiQrmoHSsYE1e9qosJTTpDJlTMd0I3EmdAK53
wjNKBYGv5ORNMXdYXJe/6uUqbN0QT7Qr7a9+Q1qgwhF1wd8pKjVXvDD6Qj2NeM8L
OCySngBECDWTCEYpuNXHbtp/s8QpWteGwCTyQYTkuNsFYM2H3nQCXi9ObMOGR9IN
44FpLgeLeIgW03F/1DmVvMWDYgCgow+b1usqHRWC7x33K/ArqzZzAsPyKePqOBU5
B9zzAla+/QKi73mKauqgHl/Siokr9FZdFpvTVWf2ssm/k3b3GJMO9tPPJ2ocyvZ6
lwlHrTMOV9n2tzeBkkadgLPWO6yDlcYlDVjj1P36DzC88GhjEfFOlq3tqcEJWxmR
adDFX2yRXdtpA6XSiI9l7jxUHxBWZwOTPJ6h1gznk/wVVd0TjyzZteX2gPc8lkcL
Aedhyx0zgGl5bE4+eBsNKLiOrUj468j7Cb87Hhe4YygDw/2T2Ff5RDGxQ9iRrkCb
YRPP21453VS01GuF2T1vzziJ/tGl8IwIon1EOSsTXuImH1sm7Or3W+Cyrke9AZgo
0M8kfHJ2EfcRnwHE8N2H
=tYp0
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-4.8-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
"This time we have bit of largish changes: two new drivers, bunch of
updates and cleanups to existing set. Nothing super exciting though.
New drivers:
- Xilinx zynqmp dma engine driver
- Marvell xor2 driver
Updates:
- dmatest sg support
- updates and enhancements to Xilinx drivers, adding of cyclic mode
- clock handling fixes across drivers
- removal of OOM messages on kzalloc across subsystem
- interleaved transfers support in omap driver
- runtime pm support in qcom bam dma
- tasklet kill freeup across drivers
- irq cleanup on remove across drivers"
* tag 'dmaengine-4.8-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (94 commits)
dmaengine: k3dma: add missing clk_disable_unprepare() on error in k3_dma_probe()
dmaengine: zynqmp_dma: add missing MODULE_LICENSE
dmaengine: qcom_hidma: use for_each_matching_node() macro
dmaengine: zynqmp_dma: Fix static checker warning
dmaengine: omap-dma: Support for interleaved transfer
dmaengine: ioat: statify symbol
dmaengine: pxa_dma: implement device_synchronize
dmaengine: imx-sdma: remove assignment never used
dmaengine: imx-sdma: remove dummy assignment
dmaengine: cppi: remove unused and bogus check
dmaengine: qcom_hidma_lli: kill the tasklets upon exit
dmaengine: pxa_dma: remove owner assignment
dmaengine: fsl_raid: remove owner assignment
dmaengine: coh901318: remove owner assignment
dmaengine: qcom_hidma: kill the tasklets upon exit
dmaengine: txx9dmac: explicitly freeup irq
dmaengine: sirf-dma: kill the tasklets upon exit
dmaengine: s3c24xx: kill the tasklets upon exit
dmaengine: s3c24xx: explicitly freeup irq
dmaengine: pl330: explicitly freeup irq
...
Add missing of_node_put() in the Qualcomm driver and update MAINTAINERS to make
sure all hwspinlock related files have a maintainer listed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXmQZdAAoJEAsfOT8Nma3FY2sP/R/sCOX33Zhu3HsvJikSwNKl
rypmM9lkR8wNMKmbdz59ocH/GsRaha1eWA+5u2w3TEaFRBVRj7wbWgrofpURlZpl
2zt7yQP6lZoXKucKA8i2u6ghFoy7GkNnCeR8ZZaxX0LkX/Nd6A2hKGbtnTGfg05Q
cY7z3XCbaa5RUfr0M4RU7kKtSqgQ/914ORzyDWEUeilZJxiYpnE5vHW+h/UfiEYC
SmUUcexNbXhPVEmwFoK67Ap/97BXqNuINewb959iQ6/a+WbOVGQUL+Rzsc0KBHnW
6T3T/Abv/OKrm8Vf9IaVx6esN1xSGOaOShgCBWXdgjIGGb9O2Zz8ViksuvIUlDtH
crH1QDq06ZCWYZbO6n1hNSpTpSiHp8zU4kufQERF5ZI2dfQyDfgn+8MVXTm6OTy8
zF46GXcLA6glVxtwjnM75WdZVtQ+Vi7jrKw9+6TeIo6ckM5TXUDAHEIccmZ+F6xi
dzhpBb9s+5YaPhLd/OxvJ+D0oYvkFOS91bO1j7gVn5P6yTQBtLtIPc9+gW20gBbQ
xFlgSY2HYTHFmOa7qbWx0L5ft5n9fCQgsUHS/D240FQRLCMxzwEJaZATNDcttqS8
U54f3eKjEjeF9alG2Ad6VZzDnVd4WErr/KUYS6DsJSMZNaSlvPztCQIvzGdnSFNG
vjn4Z7umA0SXJuFoyuaC
=Qhb2
-----END PGP SIGNATURE-----
Merge tag 'hwlock-v4.8' of git://github.com/andersson/remoteproc
Pull hwspinlock updates from Bjorn Andersson:
"Add missing of_node_put() in the Qualcomm driver and update
MAINTAINERS to make sure all hwspinlock related files have a
maintainer listed"
* tag 'hwlock-v4.8' of git://github.com/andersson/remoteproc:
MAINTAINERS: Update hwspinlock paths
hwspinlock: qcom_hwspinlock: add missing of_node_put after calling of_parse_phandle
Introduce remoteproc driver for controlling the modem/DSP Hexagon CPU
found in a multitude of Qualcomm platform. Also cleans up a race
condition/potential leak during registration of remoteprocs and includes
devicetree bindings in the MAINTAINERS entry.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXmQY5AAoJEAsfOT8Nma3FzGMP/jq1S/DLm2vRIw547Ev04LQ0
4suaKkovLMhGp2xsGsz50gxpV1YBo4MTI3DYXBP/kr4U3aCHIw+1mXwiJ1pWsv30
4MWVheD5hLeyK0JMFiEPwmI22zlX81YA0WPOimu82WUAyqQ86Zcix1KPVfHbq5SP
80aiX64BZu0PXUFAhhBUaW4qy/2nj4gcvN6Ied8LCwDTmCIBSE2LsjxaxKvxLIgA
OMWpMGfz6jIleC4kqFDs2qL4rhQh/5cRm+QOntlehOsJfrp7QN1yaJ0WahsQWRjk
AQjweQn7zgbzj5VPcpqCx9K4NhOa1Y/vZM/i3EPlYUKwV3cJnk5BgsyB1zyXd2s5
+Ul42y8pPTeBgwKisT6BhWLQE13uYViuM4N+rqCX1sH4c5wTzScCDz5tC300Plj9
omXiJmLRnAJ6soQGEnoYl6S7+e873hqOuHFvqCfkxFWVOlJ5blODOFAHXC/8Gwch
cKG2P0Hpp8KZNljgBhIvU7AHRWOfeojZvezx5epb902rZKLXrvgKOOkbAwklr9wu
lAKdbS/lunPIA5Lbom4db7zOAPa8QWNHpHGsCc4MDMY6hcoGA5FZALIdEdqimnro
rrG52apH2EKyWPpsLtJ0vglpmFUz1euabQCGkR1roYvDjXxN8SpC08mcX92r4XOM
cFxetFZDcF4f98PG96AY
=XlWX
-----END PGP SIGNATURE-----
Merge tag 'rproc-v4.8' of git://github.com/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
"Introduce remoteproc driver for controlling the modem/DSP Hexagon CPU
found in a multitude of Qualcomm platform.
Also cleans up a race condition/potential leak during registration of
remoteprocs and includes devicetree bindings in the MAINTAINERS entry"
* tag 'rproc-v4.8' of git://github.com/andersson/remoteproc:
remoteproc: qcom: hexagon: Clean up mpss validation
remoteproc: qcom: remove redundant dev_err call in q6v5_init_mem()
remoteproc: qcom: Driver for the self-authenticating Hexagon v5
dt-binding: remoteproc: Introduce Hexagon loader binding
remoteproc: Fix potential race condition in rproc_add
MAINTAINERS: Add file patterns for remoteproc device tree bindings
This prevents a double-fetch from user space that can lead to to an
undersized allocation and heap overflow.
Fixes: 54dbc15172 ("vfs: hoist the btrfs deduplication ioctl to the vfs")
Signed-off-by: Scott Bauer <sbauer@plzdonthack.me>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add Skylake-X and Broadwell-X IDs for out-of-band (OBB) control of
P-States.
For these processors, if MSR_MISC_PWR_MGMT BIT(8) == 1, then the
Intel P-State driver should exit as OS can't control P-States.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject/changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In dev_pm_opp_set_rate(), _find_opp_table() is called 4 times: once by
_get_opp_clk(), once by dev_pm_opp_set_rate() itself, and twice by
dev_pm_opp_find_freq_ceil(). If there are several opp_tables in the
system, three times of opp table finding is a big waste. This patch
reduced the call of _find_opp_table() to twice.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The following functions test whether their argument is NULL
and then return immediately.
* dev_pm_arm_wake_irq
* dev_pm_disarm_wake_irq
* wakeup_source_unregister
Thus the test around the calls is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
[ rjw: Minor whitespace adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
That is the default used when no events is specified in tools, separate
it so that simpler tools that need no evlist can use it directly.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-67mwuthscwroz88x9pswcqyv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull HID updates from Jiri Kosina:
- new hid-alps driver for ALPS Touchpad-Stick device, from Masaki Ota
- much improved and generalized HID led handling, and merge of
specialized hid-thingm driver into this generic hid-led one, from
Heiner Kallweit
- i2c-hid power management improvements from Fu Zhonghui and Guohua
Zhong
- uhid initialization race fix from Roderick Colenbrander
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (21 commits)
HID: add usb device id for Apple Magic Keyboard
HID: hid-led: fix Delcom support on big endian systems
HID: hid-led: add support for Greynut Luxafor
HID: hid-led: add support for Delcom Visual Signal Indicator G2
HID: hid-led: remove report id from struct hidled_config
HID: alps: a few cleanups
HID: remove ThingM blink(1) driver
HID: hid-led: add support for ThingM blink(1)
HID: hid-led: add support for reading from LED devices
HID: hid-led: add support for devices with multiple independent LEDs
HID: i2c-hid: set power sleep before shutdown
HID: alps: match alps devices in core
HID: thingm: simplify debug output code
HID: alps: pass correct sizes to hid_hw_raw_request()
HID: alps: struct u1_dev *priv is internal to the driver
HID: add Alps I2C HID Touchpad-Stick support
HID: led: fix config
usb: misc: remove outdated USB LED driver
HID: migrate USB LED driver from usb misc to hid
HID: i2c_hid: enable i2c-hid devices to suspend/resume asynchronously
...
Pull quota update from Jan Kara:
"time64 support for quota"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: use time64_t internally
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJXmhmRAAoJEPL5WVaVDYGjOIcH/ixqzAvUqiquF+5EWVvaBtgU
xZvJ+ryaqifQKfK1GeC7394O7lb8zDA/Z3VvERPwmsHpwqpsoFE+V/cmH8gKc4/A
yLuhEG24GFFwSpAidHNBKR3SaIYvnfDIpq05JId95Yq4AGNzaMapYhMY9kjGzvyY
/SMN7v50nl62aMo9mikequjxRHqUPamr1cP2aRjUZ7RqD4uruDwIup75hS+uS/gb
DWnAeqnVj0A+5EhKVkFpf5H8KP1L/ob4oghrnxL4lLyr+tuh67LVFXV3Fv4xL5Z+
Bnbqqh8Y70GfpSBYrfDMZlZ1cgauXZe+sDd+Q09uEfCA8ctFE6wkXCKzVGrRTuY=
=DN2X
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random driver fix from Ted Ts'o:
"Fix a boot failure on systems with non-contiguous NUMA id's"
* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: use for_each_online_node() to iterate over NUMA nodes
Pull vfs updates from Al Viro:
"Assorted cleanups and fixes.
Probably the most interesting part long-term is ->d_init() - that will
have a bunch of followups in (at least) ceph and lustre, but we'll
need to sort the barrier-related rules before it can get used for
really non-trivial stuff.
Another fun thing is the merge of ->d_iput() callers (dentry_iput()
and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
except the one in __d_lookup_lru())"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
fs/dcache.c: avoid soft-lockup in dput()
vfs: new d_init method
vfs: Update lookup_dcache() comment
bdev: get rid of ->bd_inodes
Remove last traces of ->sync_page
new helper: d_same_name()
dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
vfs: clean up documentation
vfs: document ->d_real()
vfs: merge .d_select_inode() into .d_real()
unify dentry_iput() and dentry_unlink_inode()
binfmt_misc: ->s_root is not going anywhere
drop redundant ->owner initializations
ufs: get rid of redundant checks
orangefs: constify inode_operations
missed comment updates from ->direct_IO() prototype change
file_inode(f)->i_mapping is f->f_mapping
trim fsnotify hooks a bit
9p: new helper - v9fs_parent_fid()
debugfs: ->d_parent is never NULL or negative
...
LTP madvise05 was generating mm splat
| [ARCLinux]# /sd/ltp/testcases/bin/madvise05
| BUG: Bad page map in process madvise05 pte:80e08211 pmd:9f7d4000
| page:9fdcfc90 count:1 mapcount:-1 mapping: (null) index:0x0 flags: 0x404(referenced|reserved)
| page dumped because: bad pte
| addr:200b8000 vm_flags:00000070 anon_vma: (null) mapping: (null) index:1005c
| file: (null) fault: (null) mmap: (null) readpage: (null)
| CPU: 2 PID: 6707 Comm: madvise05
And for newer kernels, the system was rendered unusable afterwards.
The problem was mprotect->pte_modify() clearing PTE_SPECIAL (which is
set to identify the special zero page wired to the pte).
When pte was finally unmapped, special casing for zero page was not
done, and instead it was treated as a "normal" page, tripping on the
map counts etc.
This fixes ARC STAR 9001053308
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.
That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.
It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.
Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.
* merge branch 'salted-string-hash':
fs/dcache.c: Save one 32-bit multiply in dcache lookup
vfs: make the string hashes salt the hash
A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes,
and in that case NFS_INO_INVALID_ATTR is never set on the second pass
through nfs_update_inode(). The existing check to skip the clearing of
NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this
case (see commit 10b7e9ad44: "pNFS: Don't mark the inode as revalidated
if a LAYOUTCOMMIT is outstanding"). We know that if a LAYOUTCOMMIT is
outstanding then attributes will need upating, so always set
NFS_INO_INVALID_ATTR.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
On the tear-down path, the dead CPU callback for the timers was
misplaced within the 'cpuhp_state' enumeration. There is a hidden
dependency between the timers and block multiqueue. The timers
callback must happen before the block multiqueue callback otherwise a
RCU stall occurs.
Move the timers callback to the proper place in the state machine.
Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 24f73b9971 ("timers/core: Convert to hotplug state machine")
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: rt@linutronix.de
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1469610498-25914-1-git-send-email-rcochran@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The md device might not have personality (for example, ddf raid array). The
issue is introduced by 8430e7e0af9a15(md: disconnect device from personality
before trying to remove it)
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Fix format and type mismatches in a couple debug prints in the
Broadcom PDC driver. Use %pad for dma_addr_t and %pa for
resource_size_t.
Signed-off-by: Rob Rice <rob.rice@broadcom.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Some of the checks didn't handle frev 2 tables properly.
amdgpu doesn't support any tables pre-frev 2, so drop
the checks.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The genksyms helper in the kernel cannot parse a type definition
like "typeof(((type *)0)->keyfld)" that is used in the DEFINE_RB_FUNCS
helper, causing the following EXPORT_SYMBOL() statement to be ignored
when computing the crcs, and triggering a warning about this:
WARNING: "ceph_monc_do_statfs" [fs/ceph/ceph.ko] has no CRC
To work around the problem, we can rewrite the type to reference
an undefined 'extern' symbol instead of a NULL pointer. This is
evidently ok for genksyms, and it no longer complains about the
line when calling it with 'genksyms -w'.
I've looked briefly into extending genksyms instead, but it seems
really hard to do. Jan Beulich introduced basic support for 'typeof'
a while ago in dc53324060 ("genksyms: fix typeof() handling"),
but that is not sufficient for the expression we have here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: fcd00b68bb ("libceph: DEFINE_RB_FUNCS macro")
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 3c31760e76 ('drm/arm: mali-dp: Set crtc.port to the port
instead of the endpoint')
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Acked-by: Brian Starkey <brian.starkey@arm.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1469672066-13401-1-git-send-email-weiyj.lk@gmail.com
AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
for MIPS at all even though ARCH_DLINFO will contain one NEW_AUX_ENT for
the VDSO address.
This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
AT_BASE_PLATFORM which MIPS doesn't use, but lets define it now and add
the comment above ARCH_DLINFO as found in several other architectures to
remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
date.
Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13823/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
At boot time, do a better job of resetting the USB host controller
to make the frequency "eye" diagram more compliant with the USB
standard while making the controller more reliable.
Signed-off-by: Steven J. Hill <steven.hill@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13831/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Set the DMA mask such that all descriptors stay in the
lower 4GB of memory.
Signed-off-by: Steven J. Hill <steven.hill@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13830/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Get rid of unnecessary forced interrupt mappings for
the USB host controller on OCTEON II.
Signed-off-by: Steven J. Hill <steven.hill@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13824/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Currently the debugfs interface to print the segment configuration
refuses to print the physical address of mapped segments. However if the
EU bit is set these become unmapped at error level (when
CP0_Status.ERL=1), so the physical address is still relevant.
Update the logic to print the physical address of mapped segments when
the EU bit is set, while still hiding the Cache Coherency Attribute
(since EU overrides that to uncached when ERL=1 too).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13833/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Convert asciidoc-formatted docs to rst in accordance with Jonathan's and
Jani's effort to use sphinx for kernel-doc rendering in 4.8.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/4c1b29986fa77772156b1af0c965d3799e43a47b.1467628307.git.lukas@wunner.de
V1F indicates that the time accuracy may have been compromised because
of a voltage drop (possibly only temporary) below VLOW1, which stops the
temperature compensation. When the time is set, the accuracy is
restored, so V1F should be cleared in order to indicate this and to be
able to detect the next temperature compensation loss. This is the same
principle as for V2F, which is cleared when the time is set to indicate
that the time is no longer invalid and to be able to detect the next
data loss.
Signed-off-by: Benoît Thébaudeau <benoit@wsystem.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
According to the application manual of the RX8900, the RESET bit must be
set to 1 to prevent a timer update while setting the time. This also
resets the subsecond counter. The application manual of the RV-8803 does
not mention such a requirement, and it says that the 100th Seconds
register is cleared when writing to the Seconds register, but using the
RESET bit for the RV-8803 too should not be an issue and is probably
safer.
This change also ensures that the RESET bit is initialized properly in
all cases. Indeed, all the registers must be initialized if the voltage
has been lower than VLOW2 (triggering V2F), but not low enough to
trigger a POR.
Signed-off-by: Benoît Thébaudeau <benoit@wsystem.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
The I²C NACK issue of the RV-8803 may occur after any I²C START
condition, depending on the timings. Consequently, the workaround must
be applied for all the I²C transfers.
This commit abstracts the I²C transfer code into register access
functions. This avoids duplicating the I²C workaround everywhere. This
also avoids the duplication of the code handling the return value of
i2c_smbus_read_i2c_block_data(). Error messages are issued in case of
definitive register access failures (if the workaround fails). This
change also makes the I²C transfer return value checks consistent.
Signed-off-by: Benoît Thébaudeau <benoit@wsystem.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
The Weekday register is encoded as 2^tm_wday, with tm_wday in 0..6, so
using tm_wday = ffs(reg) to fill tm_wday from the register value is
wrong because this gives the expected value + 1. This could be fixed as
tm_wday = ffs(reg) - 1, but tm_wday = ilog2(reg) works as well and is
more direct.
Signed-off-by: Benoît Thébaudeau <benoit@wsystem.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
The RTC core always calls rtc_valid_tm() after ->read_time() in case of
success (in __rtc_read_time()), so do not call it twice.
Signed-off-by: Benoît Thébaudeau <benoit@wsystem.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
This driver supports the Epson RX8900, but this was not indicated in
Kconfig.
Signed-off-by: Benoît Thébaudeau <benoit@wsystem.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
It turns out that if the guest does a H_CEDE while the CPU is in
a transactional state, and the H_CEDE does a nap, and the nap
loses the architected state of the CPU (which is is allowed to do),
then we lose the checkpointed state of the virtual CPU. In addition,
the transactional-memory state recorded in the MSR gets reset back
to non-transactional, and when we try to return to the guest, we take
a TM bad thing type of program interrupt because we are trying to
transition from non-transactional to transactional with a hrfid
instruction, which is not permitted.
The result of the program interrupt occurring at that point is that
the host CPU will hang in an infinite loop with interrupts disabled.
Thus this is a denial of service vulnerability in the host which can
be triggered by any guest (and depending on the guest kernel, it can
potentially triggered by unprivileged userspace in the guest).
This vulnerability has been assigned the ID CVE-2016-5412.
To fix this, we save the TM state before napping and restore it
on exit from the nap, when handling a H_CEDE in real mode. The
case where H_CEDE exits to host virtual mode is already OK (as are
other hcalls which exit to host virtual mode) because the exit
path saves the TM state.
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This moves the transactional memory state save and restore sequences
out of the guest entry/exit paths into separate procedures. This is
so that these sequences can be used in going into and out of nap
in a subsequent patch.
The only code changes here are (a) saving and restore LR on the
stack, since these new procedures get called with a bl instruction,
(b) explicitly saving r1 into the PACA instead of assuming that
HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary
and redundant setting of MSR[TM] that should have been removed by
commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory
support", 2013-09-24) but wasn't.
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
We accidentally take the "port->lock" twice in a row. This old code
was supposed to be deleted.
Fixes: e58e241c17 ('sparc: serial: Clean up the locking for -rt')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch complains that these tests are off by one, which is true but not
life threatening.
arch/sparc/kernel/irq_32.c:169 irq_link()
error: buffer overflow 'irq_map' 384 <= 384
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>