Toshi explains:
"No, the default values need to be set to the fallback types,
i.e. minimal supported mode. For WC and WT, UC is the fallback type.
When PAT is disabled, pat_init() does update the tables below to
enable WT per the default BIOS setup. However, when PAT is enabled,
but CPU has PAT -errata, WT falls back to UC per the default values."
Revert: ca1fec58bc 'x86/mm/pat: Adjust default caching mode translation tables'
Requested-by: Toshi Kani <toshi.kani@hp.com>
Cc: Jan Beulich <jbeulich@suse.de>
Link: http://lkml.kernel.org/r/1437577776.3214.252.camel@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This change reverts most of commit 53e9accf0f 'Do not use R9 in
SYSCALL32'. I don't yet understand how, but code in that commit
sometimes fails to preserve EBP.
See https://bugzilla.kernel.org/show_bug.cgi?id=101061
"Problems while executing 32-bit code on AMD64"
Reported-and-tested-by: Krzysztof A. Sobiecki <sobkas@gmail.com>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Will Drewry <wad@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
Link: http://lkml.kernel.org/r/1437740203-11552-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- prelim hw support dropped for skl after Damien fixed an ABI issue around
planes
- legacy modesetting is done using atomic infrastructure now (Maarten)!
- more gen9 workarounds (Arun&Nick)
- MOCS programming (cache control for better performance) for skl/bxt
- vlv/chv dpll improvements (Ville)
- PSR fixes from Rodrigo
- fbc improvements from Paulo
- plumb requests into execlist submit functions (Mika)
- opregion code cleanup from Jani
- resource streamer support from Abdiel for mesa
- final fixes for 12bpc hdmi + enabling support from Ville
drm-intel-next-2015-07-03:
- dsi improvements (Gaurav)
- bxt ddi dpll hw state readout (Imre)
- chv dvfs support and overall wm improvements for both vlv and chv (Ville)
- ppgtt polish from Mika and Michel
- cdclk support for bxt (Bob Pauwe)
- make frontbuffer tracking more precise
- OLR removal (John Harrison)
- per-ctx WA batch buffer support (Arun Siluvery)
- remvoe KMS Kconfig option (Chris)
- more hpd handling refactoring from Jani
- use atomic states throughout modeset code and integrate with atomic plane
update (Maarten)
drm-intel-next-2015-06-19:
- refactoring hpd irq handlers (Jani)
- polish skl dpll code a bit (Damien)
- dynamic cdclk adjustement (Ville & Mika)
- fix up 12bpc hdmi and enable it for real again (Ville)
- extend hsw cmd parser to be useful for atomic configuration (Franscico Jerez)
- even more atomic conversion and rolling state handling out across modeset code
from Maarten & Ander
- fix DRRS idleness detection (Ramalingam)
- clean up dsp address alignment handling (Ville)
- some fbc cleanup patches from Paulo
- prevent hard-hangs when trying to reset the gpu on skl (Mika)
* tag 'drm-intel-next-2015-07-17' of git://anongit.freedesktop.org/drm-intel: (386 commits)
drm/i915: Update DRIVER_DATE to 20150717
drm/i915/skl: Drop the preliminary_hw_support flag
drm/i915/skl: Don't expose the top most plane on gen9 display
drm/i915: Fix divide by zero on watermark update
drm/i915: Invert fastboot check
drm/i915: Clarify logic for initial modeset
drm/i915: Unconditionally check gmch pfit state
drm/i915: always disable irqs in intel_pipe_update_start
drm/i915: Remove use of runtime pm in atomic commit functions
drm/i915: Call plane update functions directly from intel_atomic_commit.
drm/i915: Use full atomic modeset.
drm/i915/gen9: Add WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken
drm/i915/gen9: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
drm/i915/gen9: Add WaDisableCtxRestoreArbitration workaround
drm/i915: Enable WA batch buffers for Gen9
drm/i915/gen9: Implement WaDisableKillLogic for gen 9
drm/i915: Use expcitly fixed type in compat32 structs
drm/i915: Fix noatomic crtc disabling, v2.
drm/i915: fill in more mode members
drm/i915: Added BXT check in HAS_CORE_RING_FREQ macro
...
When we scan a PCI bus, we read PCI-PCI bridge window registers with
pci_read_bridge_bases() so we can validate the resource hierarchy. Most
architectures call pci_read_bridge_bases() from pcibios_fixup_bus(), but
PCI-PCI bridges are not arch-specific, so this doesn't need to be in
arch-specific code.
Call pci_read_bridge_bases() directly from the PCI core instead of from
arch code.
For alpha and mips, we now call pci_read_bridge_bases() always; previously
we only called it if PCI_PROBE_ONLY was set.
[bhelgaas: changelog]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Ralf Baechle <ralf@linux-mips.org>
CC: James E.J. Bottomley <jejb@parisc-linux.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: David Howells <dhowells@redhat.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Tony Luck <tony.luck@intel.com>
CC: David S. Miller <davem@davemloft.net>
CC: Ingo Molnar <mingo@redhat.com>
CC: Guenter Roeck <linux@roeck-us.net>
CC: Michal Simek <monstr@monstr.eu>
CC: Chris Zankel <chris@zankel.net>
Conflicts:
net/bridge/br_mdb.c
br_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
We can disable CD unconditionally when there is no assigned device.
KVM now forces guest PAT to all-writeback in that case, so it makes
sense to also force CR0.CD=0.
When there are assigned devices, emulate cache-disabled operation
through the page tables. This behavior is consistent with VMX
microcode, where CD/NW are not touched by vmentry/vmexit. However,
keep this dependent on the quirk because OVMF enables the caches
too late.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow a nested hypervisor to single step its guests.
Signed-off-by: Mihai Donțu <mihai.dontu@gmail.com>
[Fix overlong line. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sending of notification is done by exiting vcpu to user space
if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space
the kvm_run structure contains system_event with type
KVM_SYSTEM_EVENT_CRASH to notify about guest crash occurred.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch introduce Hyper-V related source code file - hyperv.c and
per vm and per vcpu hyperv context structures.
All Hyper-V MSR's and hypercall code moved into hyperv.c.
All Hyper-V kvm/vcpu fields moved into appropriate hyperv context
structures. Copyrights and authors information copied from x86.c
to hyperv.c.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to Intel SDM several checks must be applied for memory operands
of VMX instructions.
Long mode: #GP(0) or #SS(0) depending on the segment must be thrown
if the memory address is in a non-canonical form.
Protected mode, checks in chronological order:
- The segment type must be checked with access type (read or write) taken
into account.
For write access: #GP(0) must be generated if the destination operand
is located in a read-only data segment or any code segment.
For read access: #GP(0) must be generated if if the source operand is
located in an execute-only code segment.
- Usability of the segment must be checked. #GP(0) or #SS(0) depending on the
segment must be thrown if the segment is unusable.
- Limit check. #GP(0) or #SS(0) depending on the segment must be
thrown if the memory operand effective address is outside the segment
limit.
Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make them clearly architecture-dependent; the capability is valid for
all architectures, but the argument is not.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
OVMF depends on WB to boot fast, because it only clears caches after
it has set up MTRRs---which is too late.
Let's do writeback if CR0.CD is set to make it happy, similar to what
SVM is already doing.
Signed-off-by: Xiao Guangrong <guangrong.xiao@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The logic of the disabled_quirks field usually results in a double
negation. Wrap it in a simple function that checks the bit and
negates it.
Based on a patch from Xiao Guangrong.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_mtrr_get_guest_memory_type never returns -1 which is implied
in the current code since if @type = -1 (means no MTRR contains the
range), iter.partial_map must be true
Simplify the code to indicate this fact
Signed-off-by: Xiao Guangrong <guangrong.xiao@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently code uses default memory type if MTRR is fully disabled,
fix it by using UC instead.
Signed-off-by: Xiao Guangrong <guangrong.xiao@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros. This also requires
inverting the sense of the conditional, which this commit also does.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Use accessor for_each_pci_msi_entry() to access MSI device list, so we
could easily move msi_list from struct pci_dev into struct device
later.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Stuart Yoder <stuart.yoder@freescale.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1436428847-8886-6-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
__ioremap_caller() calls region_is_ram() to walk through the
iomem_resource table to check if a target range is in RAM, which was
added to improve the lookup performance over page_is_ram() (commit
906e36c5c7 "x86: use optimized ioresource lookup in ioremap
function"). page_is_ram() was no longer used when this change was
added, though.
__ioremap_caller() then calls walk_system_ram_range(), which had
replaced page_is_ram() to improve the lookup performance (commit
c81c8a1eee "x86, ioremap: Speed up check for RAM pages").
Since both checks walk through the same iomem_resource table for
the same purpose, there is no need to call both functions.
Aside of that walk_system_ram_range() is the only useful check at the
moment because region_is_ram() always returns -1 due to an
implementation bug. That bug in region_is_ram() cannot be fixed
without breaking existing ioremap callers, which rely on the subtle
difference of walk_system_ram_range() versus non page aligned ranges.
Once these offending callers are fixed we can use region_is_ram() and
remove walk_system_ram_range().
[ tglx: Massaged changelog ]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1437088996-28511-3-git-send-email-toshi.kani@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
__ioremap_check_ram() has a WARN_ONCE() which is emitted when the
given pfn range is not RAM. The warning is bogus in two aspects:
- it never triggers since walk_system_ram_range() only calls
__ioremap_check_ram() for RAM ranges.
- the warning message is wrong as it says: "ioremap on RAM' after it
established that the pfn range is not RAM.
Move the WARN_ONCE() to __ioremap_caller(), and update the message to
include the address range so we get an actual warning when something
tries to ioremap system RAM.
[ tglx: Massaged changelog ]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1437088996-28511-2-git-send-email-toshi.kani@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
As per:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
GCC only allows -mpreferred-stack-boundary=3 on x86_64 if -mno-sse is set.
That means that cc-option will not detect -mpreferred-stack-boundary=3
support, because we test for it before setting -mno-sse.
Fix it by reordering the Makefile bits.
Compile-tested only. This should help avoid code generation
issues such as the one that was worked around in:
b96fecbfa8 ("x86/fpu: Fix boot crash in the early FPU code")
I'm a bit concerned that we could still have problems on older
GCC versions given that our asm code does not respect GCC's idea
of the ABI-required stack alignment.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f5297c192969adfa0d28b84cf8a22d59573db26d.1436126872.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The kernel does not support the MCA bus anymroe, so mark sys_desc_table
as obsolete: remove any reference from the code together with the remaining
of MCA logic.
bloat-o-meter output:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-55 (-55)
function old new delta
i386_start_kernel 128 119 -9
setup_arch 1421 1375 -46
Signed-off-by: Paolo Pisati <p.pisati@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437409430-8491-1-git-send-email-p.pisati@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We currently have no safe way of currently defining architecture
agnostic IOMMU ioremap_*() variants. The trend is for folks to
*assume* that ioremap_nocache() should be the default everywhere
and then add this mapping on each architectures -- this is not
correct today for a variety of reasons.
We have two options:
1) Sit and wait for every architecture in Linux to get a
an ioremap_*() variant defined before including it upstream.
2) Gather consensus on a safe architecture agnostic ioremap_*()
default.
Approach 1) introduces development latencies, and since 2) will
take time and work on clarifying semantics the only remaining
sensible thing to do to avoid issues is returning NULL on
ioremap_*() variants.
In order for this to work we must have all architectures declare
their own ioremap_*() variants as defined. This will take some
work, do this for ioremp_uc() to set the example as its only
currently implemented on x86. Document all this.
We only provide implementation support for ioremap_uc() as the
other ioremap_*() variants are well defined all over the kernel
for other architectures already.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: benh@kernel.crashing.org
Cc: bp@suse.de
Cc: dan.j.williams@intel.com
Cc: geert@linux-m68k.org
Cc: hch@lst.de
Cc: hmh@hmh.eng.br
Cc: jgross@suse.com
Cc: linux-mm@kvack.org
Cc: luto@amacapital.net
Cc: mpe@ellerman.id.au
Cc: mst@redhat.com
Cc: ralf@linux-mips.org
Cc: ross.zwisler@linux.intel.com
Cc: stefan.bader@canonical.com
Cc: tj@kernel.org
Cc: tomi.valkeinen@ti.com
Cc: toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1436488096-3165-1-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
VM86 is entirely broken if ptrace, syscall auditing, or
NOHZ_FULL is in use. The code is a big undocumented mess, it's
a real PITA to test, and it looks like a big chunk of vm86_32.c
is dead code. It also plays awful games with the entry asm.
No one should be using it anyway. Use DOSBOX or KVM instead.
Let's accelerate its slow death. Remove it from EXPERT and
default it to n. Distros should not enable it. In the unlikely
event that some user needs it, they can easily re-enable it.
While we're at it, rename it to CONFIG_X86_LEGACY_VM86 so that 'make
oldconfig' users will be prompted again. I left CONFIG_VM86 as
an alias to avoid a treewide replacement of the names. We can
clean that up once the current asm and vm86 code churn settles
down.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d29c6cc442d32d4df58849d2f8c89fb39ff88d61.1436542295.git.luto@kernel.org
[ Refined it some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On x86_64, there's no socketcall syscall; instead all of the
socket calls are real syscalls. For 32-bit programs, we're
stuck offering the socketcall syscall, but it would be nice to
expose the direct calls as well. This will enable seccomp to
filter socket calls (for new userspace only, but that's fine for
some applications) and it will provide a tiny performance boost.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Larsson <alexl@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Cosimo Cecchi <cosimo@endlessm.com>
Cc: Dan Nicholson <nicholson@endlessm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Cc: libc-alpha <libc-alpha@sourceware.org>
Link: http://lkml.kernel.org/r/cb5138299d37d5800e2d135b01a7667fa6115854.1436912629.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the userspace accesses down into the common function in
preparation for the next set of patches. Also change to copying
the fields explicitly instead of assuming a fixed order in
pt_regs and the kernel data structures.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437354550-25858-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is no legitimate reason for usermode to modify the 'orig_ax'
field on entry to vm86 mode, so copy it from the 32-bit regs.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437354550-25858-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is no need to save FS and non-lazy GS outside the 32-bit
regs. Lazy GS still needs to be saved because it wasn't saved
on syscall entry. Save it in the gs slot of regs32, which is
present but unused.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437354550-25858-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make WT really mean WT (rather than UC).
I can't see why commit 9cd25aac1f ("x86/mm/pat: Emulate PAT when
it is disabled") didn't make this to match its changes to
pat_init().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/55ACC3660200007800092E62@mail.emea.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Complete the set of dependent features that need disabling at
once: XSAVEC, AVX-512 and all currently known to the kernel
extensions to it, as well as MPX need to be disabled too.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/55ACC40D0200007800092E6C@mail.emea.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It has never had any effect. Remove it for comprehensibility.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c91fa38507760d9e54a4b8737fa6409bde896b33.1437418322.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
MPX setups private anonymous mapping, but uses vma->vm_ops too.
This can confuse core VM, as it relies on vm->vm_ops to
distinguish file VMAs from anonymous.
As result we will get SIGBUS, because handle_pte_fault() thinks
it's file VMA without vm_ops->fault and it doesn't know how to
handle the situation properly.
Let's fix that by not setting ->vm_ops.
We don't really need ->vm_ops here: MPX VMA can be detected with
VM_MPX flag. And vma_merge() will not merge MPX VMA with non-MPX
VMA, because ->vm_flags won't match.
The only thing left is name of VMA. I'm not sure if it's part of
ABI, or we can just drop it. The patch keep it by providing
arch_vma_name() on x86.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org> # Fixes: 6b7339f4 (mm: avoid setting up anonymous pages into file mapping)
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Link: http://lkml.kernel.org/r/20150720212958.305CC3E9@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
MSR_IA32_ENERGY_PERF_BIAS is lost after suspend/resume:
x86_energy_perf_policy -r before
cpu0: 0x0000000000000006
cpu1: 0x0000000000000006
cpu2: 0x0000000000000006
cpu3: 0x0000000000000006
cpu4: 0x0000000000000006
cpu5: 0x0000000000000006
cpu6: 0x0000000000000006
cpu7: 0x0000000000000006
after
cpu0: 0x0000000000000000
cpu1: 0x0000000000000006
cpu2: 0x0000000000000006
cpu3: 0x0000000000000006
cpu4: 0x0000000000000006
cpu5: 0x0000000000000006
cpu6: 0x0000000000000006
cpu7: 0x0000000000000006
Resulting in inconsistent energy policy settings across CPUs.
This register is set via init_intel() at bootup. During resume,
the secondary CPUs are brought online again and init_intel() is
callled which re-initializes the register. The boot CPU however
never reinitializes the register.
Add a syscore callback to reinitialize the register for the boot CPU.
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437428878-4105-1-git-send-email-labbott@fedoraproject.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
flush_tlb_info->flush_start/end are both normal virtual
addresses. When calculating 'nr_pages' (only used for the
tracepoint), I neglected to put parenthesis in.
Thanks to David Koufaty for pointing this out.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20150720230153.9E834081@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via
helper functions. These functions may change skb->data/hlen which are
cached by some JITs to improve performance of ld_abs/ld_ind instructions.
Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls,
re-compute header len and re-cache skb->data/hlen back into cpu registers.
Note, skb->data/hlen are not directly accessible from the programs,
so any changes to skb->data done either by these helpers or by other
TC actions are safe.
eBPF JIT supported by three architectures:
- arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is.
- x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls
(experiments showed that conditional re-caching is slower).
- s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present
in the program (re-caching is tbd).
These helpers allow more scalable handling of vlan from the programs.
Instead of creating thousands of vlan netdevs on top of eth0 and attaching
TC+ingress+bpf to all of them, the program can be attached to eth0 directly
and manipulate vlans as necessary.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always we use type unsigned long to format the ip address, since the
value of ip address is never the negative.
This patch uses type unsigned long, instead of long, to format the ip
address. The code is more clearly to be viewed by using type unsigned
long, although it is correct by using either unsigned long or long.
Link: http://lkml.kernel.org/r/1436694744-16747-1-git-send-email-mhuang@redhat.com
Cc: Minfei Huang <mnfhuang@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The __ref / __refdata annotations used to be needed because of
referencing functions / variables annotated __cpuinit /
__cpuinitdata.
But those annotations vanished during the development of v3.11.
Therefore most of the __ref / __refdata annotations are not needed
anymore. As they may hide legitimate sections mismatches, we
better get rid of them.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437409973-8927-1-git-send-email-minipli@googlemail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Two families of fixes:
- Fix an FPU context related boot crash on newer x86 hardware with
larger context sizes than what most people test. To fix this
without ugly kludges or extensive reverts we had to touch core task
allocator, to allow x86 to determine the task size dynamically, at
boot time.
I've tested it on a number of x86 platforms, and I cross-built it
to a handful of architectures:
(warns) (warns)
testing x86-64: -git: pass ( 0), -tip: pass ( 0)
testing x86-32: -git: pass ( 0), -tip: pass ( 0)
testing arm: -git: pass ( 1359), -tip: pass ( 1359)
testing cris: -git: pass ( 1031), -tip: pass ( 1031)
testing m32r: -git: pass ( 1135), -tip: pass ( 1135)
testing m68k: -git: pass ( 1471), -tip: pass ( 1471)
testing mips: -git: pass ( 1162), -tip: pass ( 1162)
testing mn10300: -git: pass ( 1058), -tip: pass ( 1058)
testing parisc: -git: pass ( 1846), -tip: pass ( 1846)
testing sparc: -git: pass ( 1185), -tip: pass ( 1185)
... so I hope the cross-arch impact 'none', as intended.
(by Dave Hansen)
- Fix various NMI handling related bugs unearthed by the big asm code
rewrite and generally make the NMI code more robust and more
maintainable while at it. These changes are a bit late in the
cycle, I hope they are still acceptable.
(by Andy Lutomirski)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
x86/fpu, sched: Dynamically allocate 'struct fpu'
x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
x86/nmi/64: Make the "NMI executing" variable more consistent
x86/nmi/64: Minor asm simplification
x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
x86/nmi/64: Reorder nested NMI checks
x86/nmi/64: Improve nested NMI comments
x86/nmi/64: Switch stacks on userspace NMI entry
x86/nmi/64: Remove asm code that saves CR2
x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, plus a static key fix fixing /sys/devices/cpu/rdpmc"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Really allow to specify custom CC, AR or LD
perf auxtrace: Fix misplaced check for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
perf hists browser: Take the --comm, --dsos, etc filters into account
perf symbols: Store if there is a filter in place
x86, perf: Fix static_key bug in load_mm_cr4()
tools: Copy lib/hweight.c from the kernel sources
perf tools: Fix the detached tarball wrt rbtree copy
perf thread_map: Fix the sizeof() calculation for map entries
tools lib: Improve clean target
perf stat: Fix shadow declaration of close
perf tools: Fix lockup using 32-bit compat vdso
Pull irq fixes from Ingo Molnar:
"Misc irq fixes:
- two driver fixes
- a Xen regression fix
- a nested irq thread crash fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gicv3-its: Fix mapping of LPIs to collections
genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD
genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
gpio/davinci: Fix race in installing chained irq handler
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.
Also optimize the x86 code a bit by caching task_struct_size.
Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).
Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already. This saves from doing an extra slab
allocation at fork().
The only real downside here is that we have to stick everything
and the end of the task_struct. But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 2ae416b142 ("mm: new mm hook framework") introduced an empty
header file (mm-arch-hooks.h) for every architecture, even those which
doesn't need to define mm hooks.
As suggested by Geert Uytterhoeven, this could be cleaned through the use
of a generic header file included via each per architecture
asm/include/Kbuild file.
The PowerPC architecture is not impacted here since this architecture has
to defined the arch_remap MM hook.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus noticed that the early return check was missing
_TIF_USER_RETURN_NOTIFY. If the only work flag was
_TIF_USER_RETURN_NOTIFY, we'd skip user return notifiers. Fix
it. (This is the only missing bit.)
This fixes double faults on a KVM host. It's the same issue as
last time, except that this time it's very easy to trigger.
Apparently no one uses -next as a KVM host.
( I'm still not quite sure what it is that KVM does that blows up
so badly if we miss a user return notifier. My best guess is that KVM
lets KERNEL_GS_BASE (i.e. the user's gs base) be negative and fixes
it up in a user return notifier. If we actually end up in user mode
with a negative gs base, we blow up pretty badly. )
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c5c46f59e4 ("x86/entry: Add new, comprehensible entry and exit handlers written in C")
Link: http://lkml.kernel.org/r/3f801104d24ee7a6bb1446408d9950777aa63277.1436995419.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>