Having ftrace_write() return -EPERM on failure, as that's what the callers
return, then we can clean up the code a bit. That is, instead of:
if (ftrace_write(...))
return -EPERM;
return 0;
or
if (ftrace_write(...)) {
ret = -EPERM;
goto_out;
}
We can instead have:
return ftrace_write(...);
or
ret = ftrace_write(...);
if (ret)
goto out;
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The dump_trace() function in dumpstack_64.c is hard to follow.
The test for exception stack is processed differently than the
test for irq stack, and the normal stack is outside completely.
By restructuring this code to have all the stacks determined by
a single function that returns an enum of the following:
STACK_IS_NORMAL
STACK_IS_EXCEPTION
STACK_IS_IRQ
STACK_IS_UNKNOWN
and has the logic of each within a switch statement.
This should make the code much easier to read and understand.
Link: http://lkml.kernel.org/r/20110806012354.684598995@goodmis.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140206144322.086050042@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
x86_64 uses a per_cpu variable kernel_stack to always point to
the thread stack of current. This is where the thread_info is stored
and is accessed from this location even when the irq or exception stack
is in use. This removes the complexity of having to maintain the
thread info on the stack when interrupts are running and having to
copy the preempt_count and other fields to the interrupt stack.
x86_32 uses the old method of copying the thread_info from the thread
stack to the exception stack just before executing the exception.
Having the two different requires #ifdefs and also the x86_32 way
is a bit of a pain to maintain. By converting x86_32 to the same
method of x86_64, we can remove #ifdefs, clean up the x86_32 code
a little, and remove the overhead of the copy.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The i386 thread_info contains a previous_esp field that is used
to daisy chain the different stacks for dump_stack()
(ie. irq, softirq, thread stacks).
The goal is to eventual make i386 handling of thread_info the same
as x86_64, which means that the thread_info will not be in the stack
but as a per_cpu variable. We will no longer depend on thread_info
being able to daisy chain different stacks as it will only exist
in one location (the thread stack).
By moving previous_esp to the end of thread_info and referencing
it as an offset instead of using a thread_info field, this becomes
a stepping stone to moving the thread_info.
The offset to get to the previous stack is rather ugly in this
patch, but this is only temporary and the prev_esp will be changed
in the next commit. This commit is more for sanity checks of the
change.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.891757693@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.608754481@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
According to a git log -p, GET_THREAD_INFO_WITH_ESP() has only been defined
and never been used. Get rid of it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140206144321.409045251@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Nothing references the supervisor_stack in the thread_info field,
and it does not exist in x86_64. To make the two more the same,
it is being removed.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.546183789@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.203619611@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Building on commit 0ac09f9f8c ("x86, trace: Fix CR2 corruption when
tracing page faults") this patch addresses another few issues:
- Now that read_cr2() is lifted into trace_do_page_fault(), we should
pass the address to trace_page_fault_entries() to avoid it
re-reading a potentially changed cr2.
- Put both trace_do_page_fault() and trace_page_fault_entries() under
CONFIG_TRACING.
- Mark both fault entry functions {,trace_}do_page_fault() as notrace
to avoid getting __mcount or other function entry trace callbacks
before we've observed CR2.
- Mark __do_page_fault() as noinline to guarantee the function tracer
does get to see the fault.
Cc: <jolsa@redhat.com>
Cc: <vincent.weaver@maine.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140306145300.GO9987@twins.programming.kicks-ass.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The preadv64/pwrite64 have been implemented for the x32 ABI, in order
to allow passing 64 bit arguments from user space without splitting
them into two 32 bit parameters, like it would be necessary for usual
compat tasks.
Howevert these two system calls are only being used for the x32 ABI,
so add __ARCH_WANT_COMPAT defines for these two compat syscalls and
make these two only visible for x86.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Only CF9_COND is appropriate for inclusion in the default chain, not
CF9; the latter will poke that register unconditionally, whereas
CF9_COND will at least look for PCI configuration method #1 or #2
first (a weak check, but better than nothing.)
CF9 should be used for explicit system configuration (command line or
DMI) only.
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/53130A46.1010801@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reboot is the last service linux OS provides to the end user. We are
supposed to make this function more robust than today. This patch adds
all of the known reboot methods into the default attempt list. The
machines requiring reboot=efi or reboot=p or reboot=bios get a chance
to reboot automatically now.
If there is a new reboot method emerged, we are supposed to add it to
the default list as well, instead of adding the endless dmidecode entry.
If one method required is in the default list in this patch but the
machine reboot still hangs, that means some methods ahead of the
required method cause the system hangs, then reboot the machine by
passing reboot= arguments and submit the reboot dmidecode table quirk.
We are supposed to remove the reboot dmidecode table from the kernel,
but to be safe, we keep it. This patch prevents us from adding more.
If you happened to have a machine listed in the reboot dmidecode
table and this patch makes reboot work on your machine, please submit
a patch to remove the quirk.
The default reboot order with this patch is now:
ACPI > KBD > ACPI > KBD > EFI > CF9_COND > BIOS
Because BIOS and TRIPLE are mutually exclusive (either will either
work or hang the machine) that method is not included.
[ hpa: as with any changes to the reboot order, this patch will have
to be monitored carefully for regressions. ]
Signed-off-by: Aubrey Li <aubrey.li@intel.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/53130A46.1010801@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Compiling last minute changes without setting the proper config
options is not really clever.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kbuild test robot reported the following errors, introduced with
commit 54b52d8726 ("x86/efi: Build our own EFI services pointer
table"),
arch/x86/boot/compressed/head_32.o: In function `efi32_config':
>> (.data+0x58): undefined reference to `efi_call_phys'
arch/x86/boot/compressed/head_64.o: In function `efi64_config':
>> (.data+0x90): undefined reference to `efi_call6'
Wrap the efi*_config structures in #ifdef CONFIG_EFI_STUB so that we
don't make references to EFI functions if they're not compiled in.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The kbuild test robot reported the following errors that were introduced
with commit 993c30a04e ("x86, tools: Consolidate #ifdef code"),
arch/x86/boot/tools/build.c: In function 'update_pecoff_setup_and_reloc':
>> arch/x86/boot/tools/build.c:252:1: error: parameter name omitted
static inline void update_pecoff_setup_and_reloc(unsigned int) {}
^
arch/x86/boot/tools/build.c: In function 'update_pecoff_text':
>> arch/x86/boot/tools/build.c:253:1: error: parameter name omitted
static inline void update_pecoff_text(unsigned int, unsigned int) {}
^
>> arch/x86/boot/tools/build.c:253:1: error: parameter name omitted
arch/x86/boot/tools/build.c: In function 'main':
>> arch/x86/boot/tools/build.c:372:2: warning: implicit declaration of function 'efi_stub_entry_update' [-Wimplicit-function-declaration]
efi_stub_entry_update();
^
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The trace_do_page_fault function trigger tracepoint
and then handles the actual page fault.
This could lead to error if the tracepoint caused page
fault. The original cr2 value gets lost and the original
page fault handler kills current process with SIGSEGV.
This happens if you record page faults with callchain
data, the user part of it will cause tracepoint handler
to page fault:
# perf record -g -e exceptions:page_fault_user ls
Fixing this by saving the original cr2 value
and using it after tracepoint handler is done.
v2: Moving the cr2 read before exception_enter, because
it could trigger tracepoint as well.
Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1402211701380.6395@vincent-weaver-1.um.maine.edu
Link: http://lkml.kernel.org/r/20140228160526.GD1133@krava.brq.redhat.com
causes a crash during boot - Borislav Petkov
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTFmV4AAoJEC84WcCNIz1VeSgP/1gykrBiH3Vr4H4c32la/arZ
ktXRAT+RHdiebEXopt6+A1Pv6iyRYz3fBB1cKKb7q8fhDmVVmefGVkyO4qg4NbgL
Ic1dP6uPgiFidcYML9/c6+UIovbwDD7f5wwJEjXxoZKg7b7P0TIykd8z8YPQ/6A6
fIY5Z2L9A8eVt4k1m6Dg2PUJ2B+XKOLa5BbL7gXB6u9avgAVAoLM9oQStss+V9Pv
JQu+BZxeEwuxgi0LOxk1sFWXaAoRtgVDNH6nPK93CRvO2H83voWFf+OcW9hQWsF5
tLam6aMkvjuM5Dv1IA69PBAWrE/jxcMvjEdJyrGMWRKWtH/iTN4ez4EoFiO38o4R
IgzXh9L5xbP3o+g3rntIu4h3/5yde9TRM18mER0lLTdNFZ8QzmLt4L0TptntRoBv
bTXffILACq0uQU6T10P+EwseT472HphMeswaWVDkRxkN2hTCipFqQX0ekb0qbw3O
yQeRyz1/t7DeA77iAGG96SfeziMdr44d6u6zDQPrTAJV+H1ZZ3XNIlJDi01CAg3b
PDT6nHb/9V0tPZQebntZnczVRP+5EBdn5RbJaAMldEwVgjdjlFNY5slsF01KeCSF
6Cx6UyAI9XKuZMoayC3QDHKKWi+BTOwbKcVAdtZ1c9AMLt9pyxsDfdoOAV4zBiQa
PsVs9t/l7TZoL1MQHlEJ
=YIab
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' into x86/urgent
* Disable the new EFI 1:1 virtual mapping for SGI UV because using it
causes a crash during boot - Borislav Petkov
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Alex reported hitting the following BUG after the EFI 1:1 virtual
mapping work was merged,
kernel BUG at arch/x86/mm/init_64.c:351!
invalid opcode: 0000 [#1] SMP
Call Trace:
[<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15
[<ffffffff818a5e20>] uv_system_init+0x22b/0x124b
[<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d
[<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7
[<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd
[<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7
[<ffffffff8153d955>] ? printk+0x72/0x74
[<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6
[<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb
[<ffffffff81535530>] ? rest_init+0x74/0x74
[<ffffffff81535539>] kernel_init+0x9/0xff
[<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0
[<ffffffff81535530>] ? rest_init+0x74/0x74
Getting this thing to work with the new mapping scheme would need more
work, so automatically switch to the old memmap layout for SGI UV.
Acked-by: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Reported-by: fengguang.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
Commit 1aec16967 (x86: Hyperv: Cleanup the irq mess) removed the
ability to build the hyperv stuff as a module. Bring it back.
Reported-by: fengguang.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
Some firmware appears to enable interrupts during boot service calls,
even if we've explicitly disabled them prior to the call. This is
actually allowed per the UEFI spec because boottime services expect to
be called with interrupts enabled.
So that's fine, we just need to ensure that we disable them again in
efi_enter32() before switching to a 64-bit GDT, otherwise an interrupt
may fire causing a 32-bit IRQ handler to run after we've left
compatibility mode.
Despite efi_enter32() being called both for boottime and runtime
services, this really only affects boottime because the runtime services
callchain is executed with interrupts disabled. See efi_thunk().
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Some EFI firmware makes use of the FPU during boottime services and
clearing X86_CR4_OSFXSR by overwriting %cr4 causes the firmware to
crash.
Add the PAE bit explicitly instead of trashing the existing contents,
leaving the rest of the bits as the firmware set them.
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Add the Kconfig option and bump the kernel header version so that boot
loaders can check whether the handover code is available if they want.
The xloadflags field in the bzImage header is also updated to reflect
that the kernel supports both entry points by setting both of
XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
guaranteed to be addressable with 32-bits.
Note that no boot loaders should be using the bits set in xloadflags to
decide which entry point to jump to. The entire scheme is based on the
concept that 32-bit bootloaders always jump to ->handover_offset and
64-bit loaders always jump to ->handover_offset + 512. We set both bits
merely to inform the boot loader that it's safe to use the native
handover offset even if the machine type in the PE/COFF header claims
otherwise.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Setup the runtime services based on whether we're booting in EFI native
mode or not. For non-native mode we need to thunk from 64-bit into
32-bit mode before invoking the EFI runtime services.
Using the runtime services after SetVirtualAddressMap() is slightly more
complicated because we need to ensure that all the addresses we pass to
the firmware are below the 4GB boundary so that they can be addressed
with 32-bit pointers, see efi_setup_page_tables().
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The EFI handover code only works if the "bitness" of the firmware and
the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
possible to mix the two. This goes against the tradition that a 32-bit
kernel can be loaded on a 64-bit BIOS platform without having to do
anything special in the boot loader. Linux distributions, for one thing,
regularly run only 32-bit kernels on their live media.
Despite having only one 'handover_offset' field in the kernel header,
EFI boot loaders use two separate entry points to enter the kernel based
on the architecture the boot loader was compiled for,
(1) 32-bit loader: handover_offset
(2) 64-bit loader: handover_offset + 512
Since we already have two entry points, we can leverage them to infer
the bitness of the firmware we're running on, without requiring any boot
loader modifications, by making (1) and (2) valid entry points for both
CONFIG_X86_32 and CONFIG_X86_64 kernels.
To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
loader will always use (2). It's just that, if a single kernel image
supports (1) and (2) that image can be used with both 32-bit and 64-bit
boot loaders, and hence both 32-bit and 64-bit EFI.
(1) and (2) must be 512 bytes apart at all times, but that is already
part of the boot ABI and we could never change that delta without
breaking existing boot loaders anyhow.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Implement the transition code to go from IA32e mode to protected mode in
the EFI boot stub. This is required to use 32-bit EFI services from a
64-bit kernel.
Since EFI boot stub is executed in an identity-mapped region, there's
not much we need to do before invoking the 32-bit EFI boot services.
However, we do reload the firmware's global descriptor table
(efi32_boot_gdt) in case things like timer events are still running in
the firmware.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
It's not possible to dereference the EFI System table directly when
booting a 64-bit kernel on a 32-bit EFI firmware because the size of
pointers don't match.
In preparation for supporting the above use case, build a list of
function pointers on boot so that callers don't have to worry about
converting pointer sizes through multiple levels of indirection.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The traditional approach of using machine-specific types such as
'unsigned long' does not allow the kernel to interact with firmware
running in a different CPU mode, e.g. 64-bit kernel with 32-bit EFI.
Add distinct EFI structure definitions for both 32-bit and 64-bit so
that we can use them in the 32-bit and 64-bit code paths.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Both efi_free_boot_services() and efi_enter_virtual_mode() are invoked
from init/main.c, but only if the EFI runtime services are available.
This is not the case for non-native boots, e.g. where a 64-bit kernel is
booted with 32-bit EFI firmware.
Delete the dead code.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Now that we have EFI-specific page tables we need to lookup the pgd when
dumping those page tables, rather than assuming that swapper_pgdir is
the current pgdir.
Remove the double underscore prefix, which is usually reserved for
static functions.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Instead of littering main() with #ifdef CONFIG_EFI_STUB, move the logic
into separate functions that do nothing if the config option isn't set.
This makes main() much easier to read.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
handover_offset is now filled out by build.c. Don't set a default value
as it will be overwritten anyway.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The vmbus/hyperv interrupt handling is another complete trainwreck and
probably the worst of all currently in tree.
If CONFIG_HYPERV=y then the interrupt delivery to the vmbus happens
via the direct HYPERVISOR_CALLBACK_VECTOR. So far so good, but:
The driver requests first a normal device interrupt. The only reason
to do so is to increment the interrupt stats of that device
interrupt. For no reason it also installs a private flow handler.
We have proper accounting mechanisms for direct vectors, but of
course it's too much effort to add that 5 lines of code.
Aside of that the alloc_intr_gate() is not protected against
reallocation which makes module reload impossible.
Solution to the problem is simple to rip out the whole mess and
implement it correctly.
First of all move all that code to arch/x86/kernel/cpu/mshyperv.c and
merily install the HYPERVISOR_CALLBACK_VECTOR with proper reallocation
protection and use the proper direct vector accounting mechanism.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212739.028307673@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
HyperV abuses a device interrupt to account for the
HYPERVISOR_CALLBACK_VECTOR.
Provide proper accounting as we have for the other vectors as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212738.681855582@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Let the core do the irq_desc resolution.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Xen <xen-devel@lists.xenproject.org>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212737.869264085@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
... into a kexec flavor for better code readability and simplicity. The
original one was getting ugly with ifdeffery.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Currently, running SetVirtualAddressMap() and passing the physical
address of the virtual map array was working only by a lucky coincidence
because the memory was present in the EFI page table too. Until Toshi
went and booted this on a big HP box - the krealloc() manner of resizing
the memmap we're doing did allocate from such physical addresses which
were not mapped anymore and boom:
http://lkml.kernel.org/r/1386806463.1791.295.camel@misato.fc.hp.com
One way to take care of that issue is to reimplement the krealloc thing
but with pages. We start with contiguous pages of order 1, i.e. 2 pages,
and when we deplete that memory (shouldn't happen all that often but you
know firmware) we realloc the next power-of-two pages.
Having the pages, it is much more handy and easy to map them into the
EFI page table with the already existing mapping code which we're using
for building the virtual mappings.
Thanks to Toshi Kani and Matt for the great debugging help.
Reported-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
We will use it in efi so expose it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This is very useful for debugging issues with the recently added
pagetable switching code for EFI virtual mode.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
With reusing the ->trampoline_pgd page table for mapping EFI regions in
order to use them after having switched to EFI virtual mode, it is very
useful to be able to dump aforementioned page table in dmesg. This adds
that functionality through the walk_pgd_level() interface which can be
called from somewhere else.
The original functionality of dumping to debugfs remains untouched.
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Coalesce formats and remove spaces before tabs.
Move __initdata after the variable declaration.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
For now we only ensure about 5kb free space for avoiding our board
refusing boot. But the comment lies that we retain 50% space.
Signed-off-by: Madper Xie <cxie@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
It makes more sense to set the feature flag in the success path of the
detection function than it does to rely on the caller doing it. Apart
from it being more logical to group the code and data together, it sets
a much better example for new EFI architectures.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
As we grow support for more EFI architectures they're going to want the
ability to query which EFI features are available on the running system.
Instead of storing this information in an architecture-specific place,
stick it in the global 'struct efi', which is already the central
location for EFI state.
While we're at it, let's change the return value of efi_enabled() to be
bool and replace all references to 'facility' with 'feature', which is
the usual word used to describe the attributes of the running system.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
commit 0061d53daf introduced a mechanism to execute a global clock
update for a vm. We can apply this periodically in order to propagate
host NTP corrections. Also, if all vcpus of a vm are pinned, then
without an additional trigger, no guest NTP corrections can propagate
either, as the current trigger is only vcpu cpu migration.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>