Pull perf fixes from Ingo Molnar:
"An ABI documentation fix, and a mixed-PMU perf-info-corruption fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Document the new transaction sample type
perf: Disable all pmus on unthrottling and rescheduling
We have a bit more changes than usual in ASoC here, as it was slipped
from the previous update. There are one minr ASoC PCM code fix and
ASoC dmaengine fix, in addition of a collection of small ASoC driver
fixes. The rest are a couple of HD-audio stable fixups, and a
long-standing fix for the paused stream handling.
So, all commits look not scary (and hopefully won't give you
disastrous holiday season).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJSstz0AAoJEGwxgFQ9KSmkc7UP/A5n/mgQlsyyjax0bJSEK0gx
0GpqnCo4mQFypDnPpb6MN3Xa8GDS+MbUFUmZYOpuB3aAoUm0quiBpB9u7tfjr/BO
lMS42dhvDQSe+EGJDoBo6TAp8yb2uU7x70GqnzijWvISqn5rMzgFRWuUPKnXEuaE
On++cPZhfy0f8sWmh2FdBXDeKNa/32n2RU3PcO8vdAG3BRz8eO05cVxGV8hywHHa
BGMwsVUUq+9jT4sF5a8pbqC9J1+EmJk1xkNc3I+i7KDw0MrwUsyt/i/MbcNBcNgH
FyG2VejPHkcpPs/wSZNRZe0nHjAyCYsYMFjmS0V7HqZYdKWUTzBhP63QRrUC5Dgw
lEDpApKIk+6gFgBY7tyqD3MEG5K6YSs7h0jElm32ei649U15IMr1nXB4FYM3zYyY
PBzFpTcO+lpIj/kV7GJeIdCROLUKaTheHEd4TpbNt2lLWwFs05B1b/QhMF50zw3X
7ad+VxZx+3PbvQrUU2IsBth23I9x4a9HwSScTq9xZdMzWT3TQ7O4B/Ii15rI4QTK
oQ3tofuqx3IJFoRcrXMZwF8RfKYopseO208so1v9qHt+5UeYxp8xokpa8QHzY3Yw
UjsaymLdp+uCeEYE2reaE6d4xFPXiEr2vPgBDR7ZAZLVKrlDl5+AND1m6E0FVG2v
FHlJjFh58wngEbSfQbbk
=OXEI
-----END PGP SIGNATURE-----
Merge tag 'sound-3.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"We have a bit more changes than usual in ASoC here, as it was slipped
from the previous update. There are one minr ASoC PCM code fix and
ASoC dmaengine fix, in addition of a collection of small ASoC driver
fixes. The rest are a couple of HD-audio stable fixups, and a
long-standing fix for the paused stream handling.
So, all commits look not scary (and hopefully won't give you
disastrous holiday season)"
* tag 'sound-3.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add Dell headset detection quirk for one more laptop model
ASoC: wm8904: fix DSP mode B configuration
ASoC: wm_adsp: Add small delay while polling DSP RAM start
ALSA: Add SNDRV_PCM_STATE_PAUSED case in wait_for_avail function
ASoC: kirkwood: Fix the CPU DAI rates
ASoC: wm5110: Correct HPOUT3 DAPM route typo
ALSA: hda - Add Dell headset detection quirk for three laptop models
ALSA: hda - Add enable_msi=0 workaround for four HP machines
ASoC: don't leak on error in snd_dmaengine_pcm_register
ASoC: fsl: imx-wm8962: Don't update bias_level in machine driver
ASoC: tegra: fix uninitialized variables in set_fmt
ASoC: wm8962: Enable SYSCLK provisonally before fetching generated DSPCLK_DIV
ASoC: sam9x5_wm8731: change to work in DSP A mode
ASoC: atmel_ssc_dai: add dai trigger ops
ASoC: soc-pcm: Use valid condition for snd_soc_dai_digital_mute() in hw_free()
Let the user know when the number of submission queues are being
ignored.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Simplify the initialization logic of the three block-layers.
- The queue initialization is split into two parts. This allows reuse of
code when initializing the sq-, bio- and mq-based layers.
- Set submit_queues default value to 0 and always set it at init time.
- Simplify the init error code paths.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For NUMA systems, initializing the blk-mq layer and using per node hctx.
We initialize submit queues to 1, while blk-mq nr_hw_queues is
initialized to the number of NUMA nodes.
This makes the null_init_hctx function overwrite memory outside of what
it allocated. In my case it lead to writing garbage into struct
request_queue's mq_map.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark functions skd_skmsg_state_to_str() and skd_skreq_state_to_str() as
static in skd_main.c because they are not used outside this file.
This eliminates the following warnings in skd_main.c:
drivers/block/skd_main.c:5272:13: warning: no previous prototype for ‘skd_skmsg_state_to_str’ [-Wmissing-prototypes]
drivers/block/skd_main.c:5284:13: warning: no previous prototype for ‘skd_skreq_state_to_str’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 97bc386fc1 "ARC: Add guard macro to uapi/asm/unistd.h"
inhibited multiple inclusion of ARCH unistd.h. This however hosed the system
since Generic syscall table generator relies on it being included twice,
and in lack-of an empty table was emitted by C preprocessor.
Fix that by allowing one exception to rule for the special case (just
like Xtensa)
Suggested-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The fixes here are all driver specific ones, none of which particularly
stand out but all of which are useful to users of those drivers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSstRwAAoJELSic+t+oim9zPcP/3bbMKIGKrRhnkGrYWrJQqBS
NVj327bouHo8TwvW/PDrApY9QOXMxv9Z5GMeazJJ4Y6rOZ/zYVOpO1kU6vwGZ+so
GMtMGFzV5+aF0Rvud+YAKFFV6uj/hkP08YKt9tDFAXBia/Ff8MVdvCW1xt1a6wO4
En+tmZKenhELKjtbnhpfCZClcFLYZxmtYR94Tr3NvzaIDMG0cdR5hKF8V93Wr66v
mCyBG8AblRoWklKNoB2UXJup9/xDR1yggaCq8ObSRbNfs8Zxso6fN7LrqjWuNz7w
Yvh/UiRNqWjcrmihaqRvt0KayEXF5ZSvTUB0U7InYmeZZArITMIJ2Z/BXzi1SazG
/kCg9qkNigeUYMpJPDOm71SKjvzSB1mL+Eol/5wC/GxmdFMEriKf5ob7D3Kq1zR+
c83bZZqK+663OXTdmASTB5wDAyDqwHBFVdOQlY4s8L11nP/bHDgeiRAcni4wdfOW
xovgyPY+PMGFJBfqqvoXShNtDsp2gMDhWYDrH9vUvwUQAhdikPbfEU9KhlOfEpiv
/HMD3r0VKbsstBhE9CE5dnmketQNHPDBkBCzhoFmKOQPKhR9lBwNkLiLQEYfvWaB
omg0vPUubTb+EN6lechRoPYI/s6+mZ+nBaj3skI3MUcqfsegL7QLl6O4RNlCOFD9
S83sp2o6MK1qHspj1MP4
=pOaJ
-----END PGP SIGNATURE-----
Merge tag 'asoc-v3.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v3.13
The fixes here are all driver specific ones, none of which particularly
stand out but all of which are useful to users of those drivers.
The r8a7790.dtsi file has four sdhi nodes which the first two have the wrong
resource size for their register block. This causes the sh_modbile_sdhi driver
to fail to communicate with card at-all.
Change sdhi{0,1} node size from 0x100 to 0x200 to correct these nodes
as per Kuninori Morimoto's response to the original patch where all four
nodes where changed. sdhi{2,3} are the correct size.
This bug has been present since sdhi resources were added to the r8a7790 by
8c9b1aa418 ("ARM: shmobile: r8a7790: add MMCIF and SDHI DT
templates") in v3.11-rc2.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Tested-by: William Towle <william.towle@codethink.co.uk>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
4dcfa60071
(ARM: DMA-API: better handing of DMA masks for coherent allocations)
exchanged DMA mask check method.
Below warning will appear without this patch
asoc-simple-card asoc-simple-card.0: \
Coherent DMA mask 0xffffffffffffffff is larger than dma_addr_t allows
asoc-simple-card asoc-simple-card.0: \
Driver did not use or check the return value from dma_set_coherent_mask()?
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
When an output is disabled, its DPMS mode is usually set to off. Instead
of only disabling the panel (if one is attached), turn the output off
entirely to save more power.
HDMI doesn't have any panels attached, so it previously didn't save any
power at all. With this commit, however, the complete HDMI interface
will be turned off, therefore allowing an attached monitor to go into a
standby mode.
Signed-off-by: Thierry Reding <treding@nvidia.com>
The DRM core doesn't track enable and disable state of encoders and/or
connectors, so calls to the output's .enable() and .disable() are not
guaranteed to be balanced. Track the enable state internally so that
calls to regulator and clock frameworks remain balanced.
Signed-off-by: Thierry Reding <treding@nvidia.com>
These buffer object operations are never used outside of the GEM
implementation so there is no use in exporting them.
Signed-off-by: Thierry Reding <treding@nvidia.com>
The ARCH_MULTIPLATFORM dependency was introduced back when Tegra didn't
support multiplatform yet as a means to allow the driver to be easily
compile-tested along with other DRM drivers. In the meantime, the new
COMPILE_TEST Kconfig option has been introduced for exactly that
purpose, so use that instead to clarify the intention.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tegra124 has 192 syncpoints whereas its predecessors had 32 syncpoints.
This required changes to the hardware register layout.
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Treat both negative and zero return values from clk_round_rate() as
errors. This is needed since subsequent patches will convert
clk_round_rate()'s return value to be an unsigned type, rather than a
signed type, since some clock sources can generate rates higher than
(2^31)-1 Hz.
Eventually, when calling clk_round_rate(), only a return value of zero
will be considered a error. All other values will be considered valid
rates. The comparison against values less than 0 is kept to preserve
the correct behavior in the meantime.
Signed-off-by: Paul Walmsley <pwalmsley@nvidia.com>
Cc: Mikko Perttunen <mperttunen@nvidia.com>
Cc: Arto Merilainen <amerilainen@nvidia.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Terje Bergström <tbergstrom@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
When debugfs support isn't enabled, gcc complains about some variables
being unused. To avoid further #ifdefery, move debugfs specific setup
code into static functions and use IS_ENABLED(CONFIG_DEBUG_FS) to have
the compiler, rather than the preprocessor, discard them when unused.
The advantage of doing it this way is that all the code will be
compile-tested whether or not debugfs support is enabled.
Signed-off-by: Thierry Reding <treding@nvidia.com>
The ARCH_MULTIPLATFORM dependency was introduced back when Tegra didn't
support multiplatform yet as a means to allow the driver to be easily
compile-tested along with other DRM drivers. In the meantime, the new
COMPILE_TEST Kconfig option has been introduced for exactly that
purpose, so use that instead to clarify the intention.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Include the linux/host1x.h and dev.h headers so that function prototypes
are visible to keep sparse from suggesting that their implementations be
made static.
Signed-off-by: Thierry Reding <treding@nvidia.com>
An earlier patch added a subset of the required HW specific header files
but didn't actually include the right ones when compiling for host1x02.
Signed-off-by: Thierry Reding <treding@nvidia.com>
This patch allows FILEIO to update hw_max_sectors based on the current
max_bytes_per_io. This is required because vfs_[writev,readv]() can accept
a maximum of 2048 iovecs per call, so the enforced hw_max_sectors really
needs to be calculated based on block_size.
This addresses a >= v3.5 bug where block_size=512 was rejecting > 1M
sized I/O requests, because FD_MAX_SECTORS was hardcoded to 2048 for
the block_size=4096 case.
(v2: Use max_bytes_per_io instead of ->update_hw_max_sectors)
Reported-by: Henrik Goldman <hg@x-formation.com>
Cc: <stable@vger.kernel.org> #3.5+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch moves INIT_WORK setup for cq_desc->cq_[rx,tx]_work into
isert_create_device_ib_res(), instead of being done each callback
invocation in isert_cq_[rx,tx]_callback().
This also fixes a 'INFO: trying to register non-static key' warning
when cancel_work_sync() is called before INIT_WORK has setup the
struct work_struct.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
When shutting down a target there is a race condition between
iscsit_del_np() and __iscsi_target_login_thread().
The latter sets the thread pointer to NULL, and the former
tries to issue kthread_stop() on that pointer without any
synchronization.
This patch moves the np->np_thread NULL assignment into
iscsit_del_np(), after kthread_stop() has completed. It also
removes the signal_pending() + np_state check, and only
exits when kthread_should_stop() is true.
Reported-by: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Commit 22ceeee16e ("pwm-backlight: Add
power supply support") added a mandatory power supply for the PWM
backlight. Add a fixed 5V regulator to board code with a consumer supply
entry for the backlight device.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
(cherry picked from commit ad11cb9a5cf96346f1240995c672cdbb5501785c)
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Merge patches from Andrew Morton:
"23 fixes and a MAINTAINERS update"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
mm/hugetlb: check for pte NULL pointer in __page_check_address()
fix build with make 3.80
mm/mempolicy: fix !vma in new_vma_page()
MAINTAINERS: add Davidlohr as GPT maintainer
mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
mm/compaction: respect ignore_skip_hint in update_pageblock_skip
mm/mempolicy: correct putback method for isolate pages if failed
mm: add missing dependency in Kconfig
sh: always link in helper functions extracted from libgcc
mm: page_alloc: exclude unreclaimable allocations from zone fairness policy
mm: numa: defer TLB flush for THP migration as long as possible
mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates
mm: fix TLB flush race between migration, and change_protection_range
mm: numa: avoid unnecessary disruption of NUMA hinting during migration
mm: numa: clear numa hinting information on mprotect
sched: numa: skip inaccessible VMAs
mm: numa: avoid unnecessary work on the failure path
mm: numa: ensure anon_vma is locked to prevent parallel THP splits
mm: numa: do not clear PTE for pte_numa update
mm: numa: do not clear PMD during PTE update scan
...
In __page_check_address(), if address's pud is not present,
huge_pte_offset() will return NULL, we should check the return value.
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: qiuxishi <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to Documentation/Changes, make 3.80 is still being supported
for building the kernel, hence make files must not make (unconditional)
use of features introduced only in newer versions.
Commit 1bf49dd4be ("./Makefile: export initial ramdisk compression
config option") however introduced "else ifeq" constructs which make
3.80 doesn't understand. Replace the logic there with more conventional
(in the kernel build infrastructure) list constructs (except that the
list here is intentionally limited to exactly one element).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: P J P <ppandit@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new entry for the GPT standard. Any future changes will now be
routed through linux-efi.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After a successful hugetlb page migration by soft offline, the source
page will either be freed into hugepage_freelists or buddy(over-commit
page). If page is in buddy, page_hstate(page) will be NULL. It will
hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page().
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0
PGD c23762067 PUD c24be2067 PMD 0
Oops: 0000 [#1] SMP
So check PageHuge(page) after call migrate_pages() successfully.
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
update_pageblock_skip() only fits to compaction which tries to isolate
by pageblock unit. If isolate_migratepages_range() is called by CMA, it
try to isolate regardless of pageblock unit and it don't reference
get_pageblock_skip() by ignore_skip_hint. We should also respect it on
update_pageblock_skip() to prevent from setting the wrong information.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
queue_pages_range() isolates hugetlbfs pages and putback_lru_pages()
can't handle these. We should change it to putback_movable_pages().
Naoya said that it is worth going into stable, because it can break
in-use hugepage list.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eliminate the following (rand)config warning by adding missing PROC_FS
dependency:
warning: (HWPOISON_INJECT && MEM_SOFT_DIRTY) selects PROC_PAGE_MONITOR which has unmet direct dependencies (PROC_FS && MMU)
Signed-off-by: Sima Baymani <sima.baymani@gmail.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m:
ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined!
For "lib-y", if no symbols in a compilation unit are referenced by other
units, the compilation unit will not be included in vmlinux. This
breaks modules that do reference those symbols.
Use "obj-y" instead to fix this.
http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/
This doesn't fix all cases. There are others, e.g. udivsi3.
This is also not limited to sh, many architectures handle this in the
same way.
A simple solution is to unconditionally include all helper functions.
A more complex solution is to make the choice of "lib-y" or "obj-y" depend
on CONFIG_MODULES:
obj-$(CONFIG_MODULES) += ...
lib-y($CONFIG_MODULES) += ...
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Tested-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Reviewed-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Hansen noted a regression in a microbenchmark that loops around
open() and close() on an 8-node NUMA machine and bisected it down to
commit 81c0a2bb51 ("mm: page_alloc: fair zone allocator policy").
That change forces the slab allocations of the file descriptor to spread
out to all 8 nodes, causing remote references in the page allocator and
slab.
The round-robin policy is only there to provide fairness among memory
allocations that are reclaimed involuntarily based on pressure in each
zone. It does not make sense to apply it to unreclaimable kernel
allocations that are freed manually, in this case instantly after the
allocation, and incur the remote reference costs twice for no reason.
Only round-robin allocations that are usually freed through page reclaim
or slab shrinking.
Bisected by Dave Hansen.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
THP migration can fail for a variety of reasons. Avoid flushing the TLB
to deal with THP migration races until the copy is ready to start.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to documentation on barriers, stores issued before a LOCK can
complete after the lock implying that it's possible tlb_flush_pending
can be visible after a page table update. As per revised documentation,
this patch adds a smp_mb__before_spinlock to guarantee the correct
ordering.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.
The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.
During that time, a CPU may continue writing to the page.
This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.
When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.
This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).
The basic race looks like this:
CPU A CPU B CPU C
load TLB entry
make entry PTE/PMD_NUMA
fault on entry
read/write old page
start migrating page
change PTE/PMD to new page
read/write old page [*]
flush TLB
reload TLB from new entry
read/write new page
lose data
[*] the old page may belong to a new user at this point!
The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.
This should fix both NUMA migration and compaction.
[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do_huge_pmd_numa_page() handles the case where there is parallel THP
migration. However, by the time it is checked the NUMA hinting
information has already been disrupted. This patch adds an earlier
check with some helpers.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a protection change it is no longer clear if the page should be still
accessible. This patch clears the NUMA hinting fault bits on a
protection change.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Inaccessible VMA should not be trapping NUMA hint faults. Skip them.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>