* pm-cpufreq-fixes:
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
The batadv_neigh_node was specific to a batadv_hardif_neigh_node and held
an implicit reference to it. But this reference was never stored in form of
a pointer in the batadv_neigh_node itself. Instead
batadv_neigh_node_release depends on a consistent state of
hard_iface->neigh_list and that batadv_hardif_neigh_get always returns the
batadv_hardif_neigh_node object which it has a reference for. But
batadv_hardif_neigh_get cannot guarantee that because it is working only
with rcu_read_lock on this list. It can therefore happen that a neigh_addr
is in this list twice or that batadv_hardif_neigh_get cannot find the
batadv_hardif_neigh_node for an neigh_addr due to some other list
operations taking place at the same time.
Instead add a batadv_hardif_neigh_node pointer directly in
batadv_neigh_node which will be used for the reference counter decremented
on release of batadv_neigh_node.
Fixes: cef63419f7 ("batman-adv: add list of unique single hop neighbors per hard-interface")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
The batadv_tt_local_entry was specific to a batadv_softif_vlan and held an
implicit reference to it. But this reference was never stored in form of a
pointer in the tt_local_entry itself. Instead batadv_tt_local_remove,
batadv_tt_local_table_free and batadv_tt_local_purge_pending_clients depend
on a consistent state of bat_priv->softif_vlan_list and that
batadv_softif_vlan_get always returns the batadv_softif_vlan object which
it has a reference for. But batadv_softif_vlan_get cannot guarantee that
because it is working only with rcu_read_lock on this list. It can
therefore happen that an vid is in this list twice or that
batadv_softif_vlan_get cannot find the batadv_softif_vlan for an vid due to
some other list operations taking place at the same time.
Instead add a batadv_softif_vlan pointer directly in batadv_tt_local_entry
which will be used for the reference counter decremented on release of
batadv_tt_local_entry.
Fixes: 35df3b298f ("batman-adv: fix TT VLAN inconsistency on VLAN re-add")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
At the moment there is no explicit reactivation of an hard-interface
upon NETDEV_UP event. In case of B.A.T.M.A.N. IV the interface is
reactivated as soon as the next OGM is scheduled for sending, but this
mechanism does not work with B.A.T.M.A.N. V. The latter does not rely
on the same scheduling mechanism as its predecessor and for this reason
the hard-interface remains deactivated forever after being brought down
once.
This patch fixes the reactivation mechanism by adding a new routing API
which explicitly allows each algorithm to perform any needed operation
upon interface re-activation.
Such API is optional and is implemented by B.A.T.M.A.N. V only and it
just takes care of setting the iface status to ACTIVE
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Now that DAT is VLAN aware, it must use the VID when
computing the DHT address of the candidate nodes where
an entry is going to be stored/retrieved.
Fixes: be1db4f661 ("batman-adv: make the Distributed ARP Table vlan aware")
Signed-off-by: Antonio Quartulli <a@unstable.cc>
[sven@narfation.org: fix conflicts with current version]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Phoenix Audio MT202pcs (1de7:0114) and MT202exe (1de7:0013) need the
same workaround as TMX320 for avoiding the firmware bug. It fixes the
frequent error about the sample rate inquiries and the slow device
probe as consequence.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=117321
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The enable time for buck regulators was not configured but actually is
essential: consumers, like usb3503, doing hard reset (regulator off/on)
should wait for the regulator to settle.
Configure the enable time according to datasheet.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The maximum supported voltage for ldo_io# is 3.3V, but on cold
boot the selector comes up at 0x1f, which maps to 3.8V.
This causes _regulator_get_voltage() to fail with -EINVAL which
causes regulator registration to fail when constrains are used:
[ 1.467788] vcc-touchscreen: failed to get the current voltage(-22)
[ 1.474209] axp20x-regulator axp20x-regulator: Failed to register ldo_io1
[ 1.483363] axp20x-regulator: probe of axp20x-regulator failed with error -22
This commits makes the axp20x regulator driver accept the 0x1f register
value, fixing this.
The datasheet does not guarantee reliable operation above 3.3V, so on
boards where this regulator is used the regulator-max-microvolt setting
must be 3.3V or less.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Now that set_thread_area() is supposed to give deterministic behavior
when it modifies in-use segments, test it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f2bc11af1ee1a0f815ed910840cbdba06b640a20.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current behavior of set_thread_area() when it modifies a segment that is
currently loaded is a bit confused.
If CS [1] or SS is modified, the change will take effect on return
to userspace because CS and SS are fundamentally always reloaded on
return to userspace.
Similarly, on 32-bit kernels, if DS, ES, FS, or (depending on
configuration) GS refers to a modified segment, the change will take
effect immediately on return to user mode because the entry code
reloads these registers.
If set_thread_area() modifies DS, ES [2], FS, or GS on 64-bit kernels or
GS on 32-bit lazy-GS [3] kernels, however, the segment registers
will be left alone until something (most likely a context switch)
causes them to be reloaded. This means that behavior visible to
user space is inconsistent.
If set_thread_area() is implicitly called via CLONE_SETTLS, then all
segment registers will be reloaded before the thread starts because
CLONE_SETTLS happens before the initial context switch into the
newly created thread.
Empirically, glibc requires the immediate reload on CLONE_SETTLS --
32-bit glibc on my system does *not* manually reload GS when
creating a new thread.
Before enabling FSGSBASE, we need to figure out what the behavior
will be, as FSGSBASE requires that we reconsider our behavior when,
e.g., GS and GSBASE are out of sync in user mode. Given that we
must preserve the existing behavior of CLONE_SETTLS, it makes sense
to me that we simply extend similar behavior to all invocations
of set_thread_area().
This patch explicitly updates any segment register referring to a
segment that is targetted by set_thread_area(). If set_thread_area()
deletes the segment, then the segment register will be nulled out.
[1] This can't actually happen since 0e58af4e1d ("x86/tls:
Disallow unusual TLS segments") but, if it did, this is how it
would behave.
[2] I strongly doubt that any existing non-malicious program loads a
TLS segment into DS or ES on a 64-bit kernel because the context
switch code was badly broken until recently, but that's not an
excuse to leave the current code alone.
[3] One way or another, that config option should to go away. Yuck!
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/27d119b0d396e9b82009e40dff8333a249038225.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Unlike ds and es, these are base addresses, not selectors. Rename
them so their meaning is more obvious.
On x86_32, the field is still called fs. Fixing that could make sense
as a future cleanup.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/69a18a51c4cba0ce29a241e570fc618ad721d908.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As far as I know, the optimization doesn't work on any modern distro
because modern distros use high addresses for ASLR. Remove it.
The ptrace code was either wrong or very strange, but the behavior
with this patch should be essentially identical to the behavior
without this patch unless user code goes out of its way to mislead
ptrace.
On newer CPUs, once the FSGSBASE instructions are enabled, we won't
want to use the optimized variant anyway.
This isn't actually much of a performance regression, it has no effect
on normal dynamically linked programs, and it's a considerably
simplification. It also removes some nasty special cases from code
that is already way too full of special cases for comfort.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/dd1599b08866961dba9d2458faa6bbd7fba471d7.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On AMD CPUs, a failed load_gs_base currently may not clear the FS
base. Fix it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1a6c4d3a8a4e7be79ba448b42685e0321d50c14c.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On AMD CPUs, a failed loadsegment currently may not clear the FS
base. Fix it.
While we're at it, prevent loadsegment(gs, xyz) from even compiling
on 64-bit kernels. It shouldn't be used.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a084c1b93b7b1408b58d3fd0b5d6e47da8e7d7cf.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
asm/alternative.h isn't directly useful from assembly, but it
shouldn't break the build.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e5b693fcef99fe6e80341c9e97a002fb23871e91.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
alternative.h pulls in ptrace.h, which means that alternatives can't
be used in anything referenced from ptrace.h, which is a mess.
Break the dependency by pulling text patching helpers into their own
header.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/99b93b13f2c9eb671f5c98bba4c2cbdc061293a2.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Else we get 'BUG: spinlock bad magic on CPU#' on resize when
spin lock debugging is enabled.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The default configuration of a pin is often with a value in the
pull-up/down field at chip reset. So, even if the internal logic of the
controller prevents writing a configuration with pull-up and pull-down at
the same time, we must ensure explicitly this condition before writing the
register.
This was leading to a pull-down condition not taken into account for
instance.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Fixes: 776180848b ("pinctrl: introduce driver for Atmel PIO4 controller")
Cc: stable@vger.kernel.org #v4.4 and later
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
So the following commit:
884f4f66ff ("efi: Remove global 'memmap' EFI memory map")
... triggered the following build warning on x86 64-bit allyesconfig:
drivers/xen/efi.c:290:47: warning: missing braces around initializer [-Wmissing-braces]
It's this initialization in drivers/xen/efi.c:
static const struct efi efi_xen __initconst = {
...
.memmap = NULL, /* Not used under Xen. */
...
which was forgotten about, as .memmap now is an embedded struct:
struct efi_memory_map memmap;
We can remove this initialization - it's an EFI core internal data structure plus
it's not used in the Xen driver anyway.
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: bp@alien8.de
Cc: linux-tip-commits@vger.kernel.org
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/20160429083128.GA4925@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Relocation handling performs bounds checking on the resulting calculated
addresses. The existing code uses output_len (VO size plus relocs size) as
the max address. This is not right since the max_addr check should stop at
the end of VO and exclude bss, brk, etc, which follows. The valid range
should be VO [_text, __bss_start] in the loaded physical address space.
This patch adds an export for __bss_start in voffset.h and uses it to
set the correct limit for max_addr.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since 'run_size' is now calculated in misc.c, the old script and associated
argument passing is no longer needed. This patch removes them, and renames
'run_size' to the more descriptive 'kernel_total_size'.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog, renamed 'run_size' to 'kernel_total_size' ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, the "run_size" variable holds the total kernel size
(size of code plus brk and bss) and is calculated via the shell script
arch/x86/tools/calc_run_size.sh. It gets the file offset and mem size
of the .bss and .brk sections from the vmlinux, and adds them as follows:
run_size = $(( $offsetA + $sizeA + $sizeB ))
However, this is not correct (it is too large). To illustrate, here's
a walk-through of the script's calculation, compared to the correct way
to find it.
First, offsetA is found as the starting address of the first .bss or
.brk section seen in the ELF file. The sizeA and sizeB values are the
respective section sizes.
[bhe@x1 linux]$ objdump -h vmlinux
vmlinux: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
27 .bss 00170000 ffffffff81ec8000 0000000001ec8000 012c8000 2**12
ALLOC
28 .brk 00027000 ffffffff82038000 0000000002038000 012c8000 2**0
ALLOC
Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
0x00027000. The resulting run_size is 0x145f000:
0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000
However, if we instead examine the ELF LOAD program headers, we see a
different picture.
[bhe@x1 linux]$ readelf -l vmlinux
Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
0x0000000000b5e000 0x0000000000b5e000 R E 200000
LOAD 0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
0x0000000000145000 0x0000000000145000 RW 200000
LOAD 0x0000000001000000 0x0000000000000000 0x0000000001d45000
0x0000000000018158 0x0000000000018158 RW 200000
LOAD 0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
0x000000000016a000 0x0000000000301000 RWE 200000
NOTE 0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
0x00000000000001bc 0x00000000000001bc 4
Section to Segment mapping:
Segment Sections...
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
__ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
__modver
01 .data .vvar
02 .data..percpu
03 .init.text .init.data .x86_cpu_dev.init .parainstructions
.altinstructions .altinstr_replacement .iommu_table .apicdrivers
.exit.text .smp_locks .bss .brk
04 .notes
As mentioned, run_size needs to be the size of the running kernel
including .bss and .brk. We can see from the Section/Segment mapping
above that .bss and .brk are included in segment 03 (which corresponds
to the final LOAD program header). To find the run_size, we calculate
the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
and its MemSiz (0x0000000000301000), minus the physical load address of
the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
resulting run_size is 0x105f000:
0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000
So, from this we can see that the existing run_size calculation is
0x400000 too high. And, as it turns out, the correct run_size is
actually equal to VO_end - VO_text, which is certainly easier to calculate.
_end: 0xffffffff8205f000
_text:0xffffffff81000000
0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000
As a result, run_size is a simple constant, so we don't need to pass it
around; we already have voffset.h for such things. We can share voffset.h
between misc.c and header.S instead of getting run_size in other ways.
This patch moves voffset.h creation code to boot/compressed/Makefile,
and switches misc.c to use the VO_end - VO_text calculation for run_size.
Dependence before:
boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
Dependence after:
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Fixes: e6023367d7 ("x86, kaslr: Prevent .bss from overlaping initrd")
Link: http://lkml.kernel.org/r/1461888548-32439-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
This doesn't work well because mkpiggy.c doesn't know the details of the
decompressor in use. As a result, it can only make an estimation, which
has risks:
- output + output_len (VO) could be much bigger than input + input_len
(ZO). In this case, the decompressed kernel plus relocs could overwrite
the decompression code while it is running.
- The head code of ZO could be bigger than z_extract_offset. In this case
an overwrite could happen when the head code is running to move ZO to
the end of buffer. Though currently the size of the head code is very
small it's still a potential risk. Since there is no rule to limit the
size of the head code of ZO, it runs the risk of suddenly becoming a
(hard to find) bug.
Instead, this moves the z_extract_offset calculation into header.S, and
makes adjustments to be sure that the above two cases can never happen,
and further corrects the comments describing the calculations.
Since we have (in the previous patch) made ZO always be located against
the end of decompression buffer, z_extract_offset is only used here to
calculate an appropriate buffer size (INIT_SIZE), and is not longer used
elsewhere. As such, it can be removed from voffset.h.
Additionally clean up #if/#else #define to improve readability.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:
77d1a49995 ("x86, boot: make symbols from the main vmlinux available")
37ba7ab5e3 ("x86, boot: make kernel_alignment adjustable; new bzImage fields")
Specifically:
All names prefixed with 'VO_':
- relate to the uncompressed kernel image
- the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
All names prefixed with 'ZO_':
- relate to the bootable compressed kernel image (boot/compressed/vmlinux),
which is composed of the following memory areas:
- head text
- compressed kernel (VO image and relocs table)
- decompressor code
- the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
The 'INIT_SIZE' value is used to find the larger of the two image sizes:
#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
#define VO_INIT_SIZE (VO__end - VO__text)
#if ZO_INIT_SIZE > VO_INIT_SIZE
# define INIT_SIZE ZO_INIT_SIZE
#else
# define INIT_SIZE VO_INIT_SIZE
#endif
The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)
Why does z_extract_offset exist? It's needed because we are trying to minimize
the amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.
To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range.
z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
the build process, at a point when we know the exact compressed and
uncompressed size of the kernel images and can calculate this safe minimum
offset value. (Note that the mkpiggy.c calculation is not perfect, because
we don't know the decompressor used at that stage, so the z_extract_offset
calculation is necessarily imprecise and is mostly based on gzip internals -
we'll improve that in the next patch.)
When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:
|-----compressed kernel image------|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO VO__end ZO__end
^ ^
|-------uncompressed kernel image---------|
When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.
|-compressed kernel image-|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO ZO__end VO__end
^ ^
|------------uncompressed kernel image-------------|
To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing t at
the start of extract_offset as is currently done.
This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code.
Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
[ Rewrote the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When processing the relocation table, the offset used to calculate the
relocation is an 'int'. This is sufficient for calculating the physical
address of the relocs entry on 32-bit systems and on 64-bit systems when
the relocation is under 2G.
To handle relocations above 2G (seen in situations like kexec, netboot, etc),
this offset needs to be calculated using a 'long' to avoid wrapping and
miscalculating the relocation.
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A few fixes for 4.6.
- revert amdgpu PX commit that was previously reverted on the radeon side
- cleaned up version of the NI+ MC update display fix for radeon
- TTM kref fix
* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: disable vm interrupts with vm_fault_stop=2
drm/amdgpu: print a message if ATPX dGPU power control is missing
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
drm/radeon: fix vertical bars appear on monitor (v2)
drm/ttm: fix kref count mess in ttm_bo_move_to_lru_tail
three misc vmwgfx fixes
* 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux:
drm/vmwgfx: Fix order of operation
drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
Pull x86 fixes from Ingo Molnar:
"Two boot crash fixes and an IRQ handling crash fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Handle zero vector gracefully in clear_vector_irq()
Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging"
xen/qspinlock: Don't kick CPU if IRQ is not initialized
Pull perf fixes from Ingo Molnar:
"x86 PMU driver fixes plus a core code race fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix incorrect lbr_sel_mask value
perf/x86/intel/pt: Don't die on VMXON
perf/core: Fix perf_event_open() vs. execve() race
perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX
perf/core: Make sysctl_perf_cpu_time_max_percent conform to documentation
perf/x86/intel/rapl: Add missing Haswell model
perf/x86/intel: Add model number for Skylake Server to perf
Pull EFI fix from Ingo Molnar:
"This fixes a bug in the efivars code"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Fix out-of-bounds read in variable_matches()
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXIhgfAAoJEAhfPr2O5OEVlY4P/Rw71pT4fJ5MJdwrg11V7Kor
ev3QxqjKQbeAi2oQEooIaLIlGtvHiGdKApo/jT+VjpvHvdT1y1YDTck0pLYGKTEz
61dGWWGe3S6WKLXI+jDww7r/MscmdqzYheEGx+qtwB1nvpni6e3szxrIwhKyup70
wTmh+LO80VhzHOORnYs9E4gUWIlYYOBxtnb1TDeYKzZquly7Mls32gQ+3Uixk4pt
AFsilvsq8iUU/0LAyxtkPClmmf8ZWoKgLSgAhFBOHZx5TR6Kwa/YwLE+WH6kd4fS
CQuyD2rvxKwix4PocYjtZJB2YEVGeUU/Ux6VMsKkDrh5aG/V0F3dcqQwCr3iSoTU
51ieaBh9wFdesT/FnWCznOtVINr4v23wRuOyAHEHd6HrVxXxkLo8R1ADMynwr6HU
YQMS1Su3icoQcLsdwlYxpQwaJaYvUV4LzDycE5G8weXg9hb2Tfv60svQc1zbXSWc
Urvw8c7k23vHNLky0h1yYadkgE9K0b567/Am78FXTnFJR1saMpdt9kS7vxqpSJXC
hoTw+MAaJrmM9jNk+Nmic6ps5xDKAiptjMxZ6YfCzrSK7IHjyYSvay3Lnd9M0zwm
CKDEaXf9YhDEGEVpaSOIuluo6tcSFumlRY4FFoRysI4n/A779X7X7Ittd6vCrO8Z
ymL6dhrTgFrnCLEWV9cB
=REYX
-----END PGP SIGNATURE-----
Merge tag 'media/v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Some regression fixes:
- videobuf2 core: avoid the risk of going past buffer on multi-planes
and fix rw mode
- fix support for 4K formats at V4L2 core
- fix a trouble at davinci_fpe, caused by a bad patch
- usbvision: revert a patch with a partial fixup. The fixup patch
was merged already, and this one has some issues"
* tag 'media/v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] vb2-memops: Fix over allocation of frame vectors
[media] media: vb2: Fix regression on poll() for RW mode
[media] v4l2-dv-timings.h: fix polarity for 4k formats
[media] davinci_vpfe: Revert "staging: media: davinci_vpfe: remove,unnecessary ret variable"
[media] usbvision: revert commit 588afcc1
[media] videobuf2-v4l2: Verify planes array in buffer dequeueing
[media] videobuf2-core: Check user space planes array in dqbuf
Usually we get a big collection of fixes for ASoC once during rc.
And this is it.
At this time, most of fixes are about Intel Skylake ASoC driver, which
is a new and still on-going development. Along with it, a slight
large LOC is seen in legacy HD-audio driver, but it's merely a code
move to the upper layer.
Other than that, the rest are small or trivial fixes to various
drivers, in addition to an ASoC dapm debugfs code fix.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXIgKpAAoJEGwxgFQ9KSmkNYQP/1Fzw/1Of+6ONNSO0mhTrFsZ
KcYkKBlWTXLgzV5KTmcxhtx66QP8Z1S6yUNVxeeW8sWlW+F8CRdpGMhWaXcxpiRl
uhuOO6qWIa96U0U8huFN3hvcUDjdEUJTS/cyMK2FD253qIWkZlZifSgPJRxeWKg2
EBfIICz14UKWusk5Frqb/mD3QGmkh9P4wd/Z2y+4p+TNrpCFKGOcHo8LKI4JscXo
bhQduiQ0LsGWFfNfdzd9KX6G5XAE+pu7hc9VYDE1X89Ih/pMzdJGqwy9dyDBokh6
ucE+y3E715r/CS19vZ6l9p/4s+b5gUVvXBAk2MjFo/3/HOfoMqCz/ECi3PTfuc7c
CcmJq8A2KifOgGc7/yi4apajn5THUpQmQuQAharlipB1Y/vWZxKUo4eX7RGjJnaq
/peXevea96ilpajankQ1Et8LwsvklxzGAQ5J1ONtR7x8pqiOJ7mVL5Cp0GU3Cg1J
BHIKWPm+Dv5piMtke0/7JSYIbxL5zzgmltqfYY44wnAzEvAcehv4u7zDE/BQdPo8
l8oLxgTsyAaBj+4YCUtlSWQjCyBifK8kD1OsRqak+jqYg6rl/uug7Cau7qAcpOyG
IWSnQbLyP3YfcLNagtqyqru35Q4w1F1NRpZFwe2XLNbsG6hM5Ea/y0QKbCUS/ncn
IkQPkrdx3+h/3PGNLo+P
=ju49
-----END PGP SIGNATURE-----
Merge tag 'sound-4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Usually we get a big collection of fixes for ASoC once during rc. And
this is it.
At this time, most of fixes are about Intel Skylake ASoC driver, which
is a new and still on-going development. Along with it, a slight
large LOC is seen in legacy HD-audio driver, but it's merely a code
move to the upper layer.
Other than that, the rest are small or trivial fixes to various
drivers, in addition to an ASoC dapm debugfs code fix"
* tag 'sound-4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ALSA: hda - Update BCLK also at hotplug for i915 HSW/BDW
ALSA: hda - Add dock support for ThinkPad X260
ASoC: wm5102: Free compressed IRQ in CODEC remove
ASoC: arizona: Free speaker thermal IRQs in CODEC remove
ASoC: Intel: Skylake: Fix ibs/obs calc for non-integral sampling rates
ASoC: Intel: sst: fix a loop timeout in sst_hsw_stream_reset()
ASoC: Intel: Skylake: Fix to turn OFF codec power when entering S3
ASoC: hdac_hdmi: Fix codec power state in S3 during playback
ASoC: hdac_hdmi: Fix to use dev_pm ops instead soc pm
ASoC: wm8962: Correct typo when setting DSPCLK rate
ASoC: nau8825: Fix jack detection across suspend
ASoC: Intel: Skylake: Fix DSP resource de-allocation
ASoC: Intel: Skylake: Fix for unloading module only when it is loaded
ASoC: Intel: Skylake: Fix kbuild dependency
ASoC: dapm: Make sure we have a card when displaying component widgets
ASoC: rt5640: Correct the digital interface data select
ASoC: Intel: Skylake: remove call to pci_dev_put
ASoC: Intel: Skylake: Call i915 exit last
ASoC: Intel: Skylake: Unmap the address last
ASoC: Intel: Skylake: Freeup properly on skl_dsp_free
...
Do not bail out from depot_save_stack() if the stack trace has zero hash.
Initially depot_save_stack() silently dropped stack traces with zero
hashes, however there's actually no point in reserving this zero value.
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The change fixes improper check for a returned error value by
class_create() function, which on error returns ERR_PTR() value, thus the
original check always results in a dead code on error path.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_hwpoison_page() must recheck relation between head and tail pages.
n-horiguchi said: without this recheck, the race causes kernel to pin an
irrelevant page, and finally makes kernel crash for refcount mismatch.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dlm_deref_lockres_done_handler() should return zero if the message is
successfully handled.
Fixes: 60d663cb52 ("ocfs2/dlm: add DEREF_DONE message").
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current ID is going away soon... update email address
Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kcov causes the compiler to add a call to __sanitizer_cov_trace_pc() in
every basic block. Ftrace patches in a call to _mcount() to each
function it has annotated.
Letting these mechanisms annotate each other is a bad thing. Break the
loop by adding 'notrace' to __sanitizer_cov_trace_pc() so that ftrace
won't try to patch this code.
This patch lets arm64 with KCOV and STACK_TRACER boot.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When kswapd goes to sleep it checks if the node is balanced and at first
it sleeps only for HZ/10 time, then rechecks if the node is still
balanced and nobody has woken it during the initial sleep. Only then it
goes fully sleep until an allocation slowpath wakes it up again.
For higher-order allocations, waking up kcompactd is done only before
the full sleep. This turns out to be an issue in case another
high-order allocation fails during the initial sleep. It will wake
kswapd up, however kswapd considers the zone balanced from the order-0
perspective, and will just quickly try to sleep again. So if there's a
longer stream of high-order allocations hitting the slowpath and waking
up kswapd, it might never actually wake up kcompactd, which may be
considered a regression from kswapd-based compaction. In the worst
case, it might be that a single allocation that cannot direct
reclaim/compact itself is waking kswapd in the retry loop and preventing
kcompactd from being woken up and unblocking it.
This patch makes sure kcompactd is woken up in such situations by simply
moving the wakeup before the short initial sleep. More efficient
solution would be to wake kcompactd immediately instead of kswapd if the
node is already order-0 balanced, but in that case we should also move
reset_isolation_suitable() call to kcompactd so it's not adding to the
allocator's latency. Since it's late in the 4.6 cycle, let's go with
the simpler change for now.
Fixes: accf62422b ("mm, kswapd: replace kswapd compaction with waking up kcompactd")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Set current email address to replace obsolete email addresses.
Signed-off-by: Frank Rowand <frank.rowand@sonymobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, migration code increses num_poisoned_pages on *failed*
migration page as well as successfully migrated one at the trial of
memory-failure. It will make the stat wrong. As well, it marks the
page as PG_HWPoison even if the migration trial failed. It would mean
we cannot recover the corrupted page using memory-failure facility.
This patches fixes it.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in
page_swap_info. The reason is that page_endio in rw_page unlocks the
page if read I/O is completed so we need to hold a PG_lock again to
check PageSwapCache. Otherwise, the page can be removed from swapcache.
Kernel BUG at c00f9040 [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 4 PID: 13446 Comm: RenderThread Tainted: G W 3.10.84-g9f14aec-dirty #73
task: c3b73200 ti: dd192000 task.ti: dd192000
PC is at page_swap_info+0x10/0x2c
LR is at swap_slot_free_notify+0x18/0x6c
pc : [<c00f9040>] lr : [<c00f5560>] psr: 400f0113
sp : dd193d78 ip : c2deb1e4 fp : da015180
r10: 00000000 r9 : 000200da r8 : c120fe08
r7 : 00000000 r6 : 00000000 r5 : c249a6c0 r4 : = c249a6c0
r3 : 00000000 r2 : 40080009 r1 : 200f0113 r0 : = c249a6c0
..<snip> ..
Call Trace:
page_swap_info+0x10/0x2c
swap_slot_free_notify+0x18/0x6c
swap_readpage+0x90/0x11c
read_swap_cache_async+0x134/0x1ac
swapin_readahead+0x70/0xb0
handle_pte_fault+0x320/0x6fc
handle_mm_fault+0xc0/0xf0
do_page_fault+0x11c/0x36c
do_DataAbort+0x34/0x118
Fixes: 3f2b1a04f4 ("zram: revive swap_slot_free_notify")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Tested-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have been reclaimed highmem zone if buffer_heads is over limit but
commit 6b4f7799c6 ("mm: vmscan: invoke slab shrinkers from
shrink_zone()") changed the behavior so it doesn't reclaim highmem zone
although buffer_heads is over the limit. This patch restores the logic.
Fixes: 6b4f7799c6 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture. On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.
On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available. In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped. On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.
This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Khugepaged detects own VMAs by checking vm_file and vm_ops but this way
it cannot distinguish private /dev/zero mappings from other special
mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap.
This fixes false-positive VM_BUG_ON and prevents installing THP where
they are not expected.
Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com
Fixes: 78f11a2557 ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups")
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>