This patch removes the redundant sysfs cacheinfo code by reusing
the newly introduced generic cacheinfo infrastructure through the
commit
246246cbde ("drivers: base: support cpu cache information
interface to userspace via sysfs")
The private pointer provided by the cacheinfo is used to implement
the AMD L3 cache-specific attributes.
Note that with v4.0-rc1, commit
513e3d2d11 ("cpumask: always use nr_cpu_ids in formatting and parsing
functions")
in particular changes from long format to shorter one for all cpumasks
sysfs entries. As the consequence of the same, even the shared_cpu_map
in the cacheinfo sysfs was also changed.
This patch neither alters any existing sysfs entries nor their
formating, however since the generic cacheinfo has switched to use the
device attributes instead of the traditional raw kobjects, a directory
named "power" along with its standard attributes are added similar to
any other device.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Link: http://lkml.kernel.org/r/1425470416-20691-1-git-send-email-sudeep.holla@arm.com
[ Add a check for uninitialized this_cpu_ci for the cpu_has_topoext case too
in __cache_amd_cpumap_setup() ]
Signed-off-by: Borislav Petkov <bp@suse.de>
By the nature of the TEST operation, it is often possible to test
a narrower part of the operand:
"testl $3, mem" -> "testb $3, mem",
"testq $3, %rcx" -> "testb $3, %cl"
This results in shorter instructions, because the TEST instruction
has no sign-entending byte-immediate forms unlike other ALU ops.
Note that this change does not create any LCP (Length-Changing Prefix)
stalls, which happen when adding a 0x66 prefix, which happens when
16-bit immediates are used, which changes such TEST instructions:
[test_opcode] [modrm] [imm32]
to:
[0x66] [test_opcode] [modrm] [imm16]
where [imm16] has a *different length* now: 2 bytes instead of 4.
This confuses the decoder and slows down execution.
REX prefixes were carefully designed to almost never hit this case:
adding REX prefix does not change instruction length except MOVABS
and MOV [addr],RAX instruction.
This patch does not add instructions which would use a 0x66 prefix,
code changes in assembly are:
-48 f7 07 01 00 00 00 testq $0x1,(%rdi)
+f6 07 01 testb $0x1,(%rdi)
-48 f7 c1 01 00 00 00 test $0x1,%rcx
+f6 c1 01 test $0x1,%cl
-48 f7 c1 02 00 00 00 test $0x2,%rcx
+f6 c1 02 test $0x2,%cl
-41 f7 c2 01 00 00 00 test $0x1,%r10d
+41 f6 c2 01 test $0x1,%r10b
-48 f7 c1 04 00 00 00 test $0x4,%rcx
+f6 c1 04 test $0x4,%cl
-48 f7 c1 08 00 00 00 test $0x8,%rcx
+f6 c1 08 test $0x8,%cl
Linus further notes:
"There are no stalls from using 8-bit instruction forms.
Now, changing from 64-bit or 32-bit 'test' instructions to 8-bit ones
*could* cause problems if it ends up having forwarding issues, so that
instead of just forwarding the result, you end up having to wait for
it to be stable in the L1 cache (or possibly the register file). The
forwarding from the store buffer is simplest and most reliable if the
read is done at the exact same address and the exact same size as the
write that gets forwarded.
But that's true only if:
(a) the write was very recent and is still in the write queue. I'm
not sure that's the case here anyway.
(b) on at least most Intel microarchitectures, you have to test a
different byte than the lowest one (so forwarding a 64-bit write
to a 8-bit read ends up working fine, as long as the 8-bit read
is of the low 8 bits of the written data).
A very similar issue *might* show up for registers too, not just
memory writes, if you use 'testb' with a high-byte register (where
instead of forwarding the value from the original producer it needs to
go through the register file and then shifted). But it's mainly a
problem for store buffers.
But afaik, the way Denys changed the test instructions, neither of the
above issues should be true.
The real problem for store buffer forwarding tends to be "write 8
bits, read 32 bits". That can be really surprisingly expensive,
because the read ends up having to wait until the write has hit the
cacheline, and we might talk tens of cycles of latency here. But
"write 32 bits, read the low 8 bits" *should* be fast on pretty much
all x86 chips, afaik."
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425675332-31576-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I broke 32-bit kernels. The implementation of sp0 was correct
as far as I can tell, but sp0 was much weirder on x86_32 than I
realized. It has the following issues:
- Init's sp0 is inconsistent with everything else's: non-init tasks
are offset by 8 bytes. (I have no idea why, and the comment is unhelpful.)
- vm86 does crazy things to sp0.
Fix it up by replacing this_cpu_sp0() with
current_top_of_stack() and using a new percpu variable to track
the top of the stack on x86_32.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 75182b1632 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")
Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The change:
75182b1632 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")
had the unintended side effect of changing the return value of
current_thread_info() during part of the context switch process.
Change it back.
This has no effect as far as I can tell -- it's just for
consistency.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9fcaa47dd8487db59eed7a3911b6ae409476763e.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since we have a native 8250 driver carrying the Intel MID serial devices the
specific support is not needed anymore. This patch removes it for Intel MID.
Note that the console device name is changed from ttyMFDx to ttySx.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This has nothing to do with the init thread or the initial
anything. It's just the CPU's TSS.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a0bd5e26b32a2e1f08ff99017d0997118fbb2485.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The INIT_TSS is unnecessary. Just define the initial TSS where
'cpu_tss' is defined.
While we're at it, merge the 32-bit and 64-bit definitions. The
only syntactic change is that 32-bit kernels were computing sp0
as long, but now they compute it as unsigned long.
Verified by objdump: the contents and relocations of
.data..percpu..shared_aligned are unchanged on 32-bit and 64-bit
kernels.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8fc39fa3f6c5d635e93afbdd1a0fe0678a6d7913.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It has nothing to do with init -- there's only one TSS per cpu.
Other names considered include:
- current_tss: Confusing because we never switch the tss.
- singleton_tss: Too long.
This patch was generated with 's/init_tss/cpu_tss/g'. Followup
patches will fix INIT_TSS and INIT_TSS_IST by hand.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The ia32 sysenter code loaded the top of the kernel stack into
rsp by loading kernel_stack and then adjusting it. It can be
simplified to just read sp0 directly.
This requires the addition of a new asm-offsets entry for sp0.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/88ff9006163d296a0665338585c36d9bfb85235d.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This will make modifying the semantics of kernel_stack easier.
The change to ist_begin_non_atomic() is necessary because sp0 no
longer points to the same THREAD_SIZE-aligned region as RSP;
it's one byte too high for that. At Denys' suggestion, rather
than offsetting it, just check explicitly that we're in the
correct range ending at sp0. This has the added benefit that we
no longer assume that the thread stack is aligned to
THREAD_SIZE.
Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ef8254ad414cbb8034c9a56396eeb24f5dd5b0de.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We currently store references to the top of the kernel stack in
multiple places: kernel_stack (with an offset) and
init_tss.x86_tss.sp0 (no offset). The latter is defined by
hardware and is a clean canonical way to find the top of the
stack. Add an accessor so we can start using it.
This needs minor paravirt tweaks. On native, sp0 defines the
top of the kernel stack and is therefore always correct. On Xen
and lguest, the hypervisor tracks the top of the stack, but we
want to start reading sp0 in the kernel. Fixing this is simple:
just update our local copy of sp0 as well as the hypervisor's
copy on task switches.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
the related state make sense for 'ret_from_sys_call'. This is
entirely the wrong check. TS_COMPAT would make a little more
sense, but there's really no point in keeping this optimization
at all.
This fixes a return to the wrong user CS if we came from int
0x80 in a 64-bit task.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
[ Backported from tip:x86/asm. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As early_trap_init() doesn't use IST, replace
set_intr_gate_ist() and set_system_intr_gate_ist() with their
standard counterparts.
set_intr_gate() requires a trace_debug symbol which we don't
have and won't use. This patch separates set_intr_gate() into two
parts, and uses base version in early_trap_init().
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: <dave.hansen@linux.intel.com>
Cc: <lizefan@huawei.com>
Cc: <masami.hiramatsu.pt@hitachi.com>
Cc: <oleg@redhat.com>
Cc: <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425010789-13714-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
the related state make sense for 'ret_from_sys_call'. This is
entirely the wrong check. TS_COMPAT would make a little more
sense, but there's really no point in keeping this optimization
at all.
This fixes a return to the wrong user CS if we came from int
0x80 in a 64-bit task.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Avoid redundant load of %r11 (it is already loaded a few
instructions before).
Also simplify %rsp restoration, instead of two steps:
add $0x80, %rsp
mov 0x18(%rsp), %rsp
we can do a simplified single step to restore user-space RSP:
mov 0x98(%rsp), %rsp
and get the same result.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[ Clarified the changelog. ]
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1aef69b346a6db0d99cdfb0f5ba83e8c985e27d7.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Constants such as SS+8 or SS+8-RIP are mysterious.
In most cases, SS+8 is just meant to be SIZEOF_PTREGS,
SS+8-RIP is RIP's offset in the iret frame.
This patch changes some of these constants to be less
mysterious.
No code changes (verified with objdump).
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1d20491384773bd606e23a382fac23ddb49b5178.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch does a lot of cleanup in comments and formatting,
but it does not change any code:
- Rename 'save_paranoid' to 'paranoid_entry': this makes naming
similar to its "non-paranoid" sibling, 'error_entry',
and to its counterpart, 'paranoid_exit'.
- Use the same CFI annotation atop 'paranoid_entry' and 'error_entry'.
- Fix irregular indentation of assembler operands.
- Add/fix comments on top of 'paranoid_entry' and 'error_entry'.
- Remove stale comment about "oldrax".
- Make comments about "no swapgs" flag in ebx more prominent.
- Deindent wrongly indented top-level comment atop 'paranoid_exit'.
- Indent wrongly deindented comment inside 'error_entry'.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/4640f9fcd5ea46eb299b1cd6d3f5da3167d2f78d.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For some odd reason, these two functions are at the very top of
the file. "save_paranoid"'s caller is approximately in the middle
of it, move it there. Move 'ret_from_fork' to be right after
fork/exec helpers.
This is a pure block move, nothing is changed in the function
bodies.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/6446bbfe4094532623a5b83779b7015fec167a9d.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
SYSCALL/SYSRET and SYSENTER/SYSEXIT have weird semantics.
Moreover, they differ in 32- and 64-bit mode.
What is saved? What is not? Is rsp set? Are interrupts disabled?
People tend to not remember these details well enough.
This patch adds comments which explain in detail
what registers are modified by each of these instructions.
The comments are placed immediately before corresponding
entry and exit points.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/a94b98b63527797c871a81402ff5060b18fa880a.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ARGOFFSET is zero now, removing it changes no code.
A few macros lost "offset" parameter, since it is always zero
now too.
No code changes - verified with objdump.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/8689f937622d9d2db0ab8be82331fa15e4ed4713.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 64-bit entry code was using six stack slots less by not
saving/restoring registers which are callee-preserved according
to the C ABI, and was not allocating space for them.
Only when syscalls needed a complete "struct pt_regs" was
the complete area allocated and filled in.
As an additional twist, on interrupt entry a "slightly less
truncated pt_regs" trick is used, to make nested interrupt
stacks easier to unwind.
This proved to be a source of significant obfuscation and subtle
bugs. For example, 'stub_fork' had to pop the return address,
extend the struct, save registers, and push return address back.
Ugly. 'ia32_ptregs_common' pops return address and "returns" via
jmp insn, throwing a wrench into CPU return stack cache.
This patch changes the code to always allocate a complete
"struct pt_regs" on the kernel stack. The saving of registers
is still done lazily.
"Partial pt_regs" trick on interrupt stack is retained.
Macros which manipulate "struct pt_regs" on stack are reworked:
- ALLOC_PT_GPREGS_ON_STACK allocates the structure.
- SAVE_C_REGS saves to it those registers which are clobbered
by C code.
- SAVE_EXTRA_REGS saves to it all other registers.
- Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros
reverse it.
'ia32_ptregs_common', 'stub_fork' and friends lost their ugly dance
with the return pointer.
LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets
instead of magic numbers.
'error_entry' and 'save_paranoid' now use SAVE_C_REGS +
SAVE_EXTRA_REGS instead of having it open-coded yet again.
Patch was run-tested: 64-bit executables, 32-bit executables,
strace works.
Timing tests did not show measurable difference in 32-bit
and 64-bit syscalls.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1423778052-21038-2-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/b89763d354aa23e670b9bdf3a40ae320320a7c2e.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the last fix of this nature, a few more instances have crept
in. Fix them up. No object code changes (constants have the same
value).
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1423778052-21038-1-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/f5e1c4084319a42e5f14d41e2d638949ce66bc08.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
pad instructions and thus make using the alternatives macros more
straightforward and without having to figure out old and new instruction
sizes but have the toolchain figure that out for us.
Furthermore, it optimizes JMPs used so that fetch and decode can be
relieved with smaller versions of the JMPs, where possible.
Some stats:
x86_64 defconfig:
Alternatives sites total: 2478
Total padding added (in Bytes): 6051
The padding is currently done for:
X86_FEATURE_ALWAYS
X86_FEATURE_ERMS
X86_FEATURE_LFENCE_RDTSC
X86_FEATURE_MFENCE_RDTSC
X86_FEATURE_SMAP
This is with the latest version of the patchset. Of course, on each
machine the alternatives sites actually being patched are a proper
subset of the total number.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU9ekpAAoJEBLB8Bhh3lVKyjYP/AiHEiHkkjnpwTt49kUtUMI6
GIlGfJVNjp5LLnSRD/fkL/wdkBgQtMzr9O1g8Qi/lbFqxsOFteU9f1OtLx34ZwZw
MhtdiHcrKGMsaIxTJh4FaqPHBT5ussm2yn1jlAX+LgILd3dpqe3oytsO8JihcK9j
t2u9V/Lq92TV7zXxGgWJsPc86WhhgdldlU3X96S++Di18bnDaKbGkzthU6WzZG/H
qtFZ5bfK8TlVHYduft+D9ZPzFYGp1WCOa03qU4+Djaxw02HDB6Ltysend9zg0lB1
RT/BP0PwHD3mOL11qpgtV1ChCbR8FJMN/z5+YdSNJgzDQA0H5Sf0UueTweosfAz+
/iC5t/wkegdYtqtA0nKVypYOJCS+UdfMZXenYgtSUJl6drB6I5BCW4mVft3AuWo+
EilPGpblvmjWRx1HiF4/Q/5zrSWHzmKQDyXuyxI9m0OUxAGAM0+8CY6wOqRA5pX+
/f5MjZ1hXELQGhl5Qdj4nqJacICGevJ8WYdZ53B+uYVxz7fbXk9hSYcZKT94UshD
qSdaV4XJSuC7pDKqiWoNWXp5N1g+D2BgfwoQEr/RnodFZRlfc+cmOv/visak0OLr
E/pp1vJvCi3+T3ImX1MCDiXmflQtFctiL3hNgMXYK2IGhJb2RDC2bFeZkksOHuAE
BGgrn+usQDjVlikEnfI3
=0KXp
-----END PGP SIGNATURE-----
Merge tag 'alternatives_padding' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/asm
Pull alternative instructions framework improvements from Borislav Petkov:
"A more involved rework of the alternatives framework to be able to
pad instructions and thus make using the alternatives macros more
straightforward and without having to figure out old and new instruction
sizes but have the toolchain figure that out for us.
Furthermore, it optimizes JMPs used so that fetch and decode can be
relieved with smaller versions of the JMPs, where possible.
Some stats:
x86_64 defconfig:
Alternatives sites total: 2478
Total padding added (in Bytes): 6051
The padding is currently done for:
X86_FEATURE_ALWAYS
X86_FEATURE_ERMS
X86_FEATURE_LFENCE_RDTSC
X86_FEATURE_MFENCE_RDTSC
X86_FEATURE_SMAP
This is with the latest version of the patchset. Of course, on each
machine the alternatives sites actually being patched are a proper
subset of the total number."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Combine the 32-bit syscall tables into one file.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425439896-8322-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's more work to come but let's unload this pile first.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU9LsjAAoJEBLB8Bhh3lVKVIMP/0xUfRb/wV8P+HtJ6St41G8/
OygkO/D4UJcZiZP2xrxNix5/waExpegtEIqJPbO6Wlyq0N+0imCZxcsPsgmdw2Zo
CQv6eu4p0on/v8EUFTJV9+ZHqs6zhch1tNQMfLuq+nXH7f8okSzbDL25RWoo8QU5
qOOHhHwjyQzivC1KpEodwVte1nT/KLNFio3moRwONKM0/1xCBHyvK4us6QqifWow
hsDnVNdoXqTzqhY43u7zxNcSzo/RMCq/4sc90augdCdZFAVbKzPGM0o0Pq5FQTmv
MbMuGF80LhfMFln0tBv30IGFuEc54BBD/x3d7YYadyeX0jIE4T27Pe4xbVLoTDTM
T8PnaNn/sUyiBUGYO5ff5niMwzpqnuwgKi3wSe4fJ0HIHVE/SAHQogoomP1EKb59
n66RTWV5eE9KkzdZCTdXhm8aalLK8QfbSlElOrbqwr7/qfFslNpnNUzRwhaHoN1k
kk5PJ8PipZR/YmWapIU7K6lEZQoRixAb+StMiOvX5n++d+Z7d2/Mu3UqebpqlvUc
nVFpbPpB9FuH0XQPfICQEUvfWf+MTP9cPw5OkMu+Zo4ok8fIt7MdHCtQJVa9d0R/
3ZMc9daDgwU8bEYoMRn6xHSGaUjtyO6AUeFhDM7b7If9YWDg59H7nufa5ySkM5KB
xhWDhQYiSjhdvToKKF/e
=Jlz9
-----END PGP SIGNATURE-----
Merge tag 'intel_microcode_cleanup_p1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/microcode
Pull x86 microcode loader code cleanups from Borislav Petkov:
"The first part of the scrubbing of the intel early microcode loader.
There's more work to come but let's unload this pile first."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
previous log level after a newline. (Adrien Schildknecht)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU9L+WAAoJEBLB8Bhh3lVKJqIQAKizB0nODvJi8YV2PCMRfXoP
ht30vwzqhLdmFQg0vC9ARbb1OB1Eq02Iq4wBmVfXRuNH+cMuLFK0BZjxKuAbRqWi
cOhvvIIdqV+f/yEXtncG7q0JrD4JrtMPVTOeEC7q1yFsXlPyR7oeW6KcrRZIZUiz
Rs+04QJELM1ZkLdKh/oNsA9A8IIPysoZ0elODnMb37RX/+8Rz6Lr/lG26t07xA9n
8Bb7i1oeL34GgvnZICFtON11L3iSB6vFlv3pgshqZNV0VaN0yzlk8oJBAy09srgS
vLfuEW/q2GGeOkeim48tfAvoScMS8qQFRT+U92cOzNOFtULCVH9MRy4Ymq4vVtBv
EmRLv3OgI0IaLBKFLNqJdvQRvMo8Ru4XW8LCbAesLAJsKTD0YSOpWNowG+wJLVv6
DJU8jUnT8zuNYQbe2Sa3XADkwWCohatLOljd6BpkyA2qGczixqYw43iNcAA5U0WH
Q7taSpx2Srmi8NxT/tRbA1DdOsXATMZN1pX7lKpQUprdC3XRTQ0GQtL+TTpo5qPX
7gdNcQOdO1Jz2cf3CLn8dmDujZFNeJo10oZ/1ShBd6YJhqhm6kJv6I8ABbNW7yPj
bh5FScnYPuiikxO56CaquDexEcI9NzxMaifwtTyvtHpamHknV1ciTNPO6PqFxBc/
2K4oIGyt1fTNElgicoLL
=9DIN
-----END PGP SIGNATURE-----
Merge tag 'tip_x86_kernel' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/debug
Pull x86 debugging updates from Borislav Petkov:
"Two small fixes to the stack dumper, a cleanup and sustaining the
previous log level after a newline. (Adrien Schildknecht)"
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When doing
echo 1 > /sys/devices/system/cpu/microcode/reload
in order to reload microcode, I get:
microcode: Total microcode saved: 1
BUG: using smp_processor_id() in preemptible [00000000] code: bash/2606
caller is debug_smp_processor_id+0x17/0x20
CPU: 1 PID: 2606 Comm: bash Not tainted 3.19.0-rc7+ #9
Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
ffffffff81a4266d ffff8802131db808 ffffffff81666588 0000000000000007
0000000000000001 ffff8802131db838 ffffffff812e6eef ffff8802131db868
00000000000306a9 0000000000000010 0000000000000015 ffff8802131db848
Call Trace:
dump_stack
check_preemption_disabled
debug_smp_processor_id
show_saved_mc
? save_microcode.constprop.8
save_mc_for_early
? print_context_stack
? dump_trace
? __bfs
? mark_held_locks
? get_page_from_freelist
? trace_hardirqs_on_caller
? trace_hardirqs_on
? __alloc_pages_nodemask
? __get_vm_area_node
? map_vm_area
? __vmalloc_node_range
? generic_load_microcode
generic_load_microcode
? microcode_fini_cpu
request_microcode_fw
reload_store
dev_attr_store
sysfs_kf_write
kernfs_fop_write
vfs_write
? sysret_check
SyS_write
system_call_fastpath
microcode: CPU1: sig=0x306a9, pf=0x10, rev=0x15
microcode: mc_saved[0]: sig=0x306a9, pf=0x12, rev=0x1b, toal size=0x3000, date = 2014-05-29
because we're using smp_processor_id() in preemtible context. And we
don't really need to use it there because the microcode container we're
dumping is global and CPU-specific info is irrelevant.
While at it, make pr_* stuff use "microcode: " prefix for easier
grepping and document how to enable the DEBUG build.
Signed-off-by: Borislav Petkov <bp@suse.de>
... arguments list so that it comes more natural for those functions to
have the signature, processor flags and revision together, before the
rest of the args.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
* remove state variable and out label
* get rid of completely unused mc_size
* shorten variable names
* get rid of local variables
* don't do assignments in local var declarations for less cluttered code
* finally rename it to the shorter and perfectly fine load_microcode_early()
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
... to the header. Split the family acquiring function into a
main one, doing CPUID and a helper which computes the extended
family and is used in multiple places. Get rid of the locally-grown
get_x86_{family,model}().
While at it, rename local variables to something more descriptive and
vertically align assignments for better readability.
There should be no functionality change resulting from this patch.
Signed-off-by: Borislav Petkov <bp@suse.de>
Shorten local variable names for better readability and flatten loop
indentation levels.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
... of microcode patches instead of handing in a pointer which is used
for I/O in an otherwise void function.
Signed-off-by: Borislav Petkov <bp@suse.de>
Don't compute start and end from start and size in order to compute size
again down the path in scan_microcode(). So pass size directly instead
and simplify a bunch. Shorten variable names and remove useless ones.
Signed-off-by: Borislav Petkov <bp@suse.de>
... and only then deref it. Also, shorten some variable names and rename
others so as to diminish the ubiquitous presence of the "mc_" prefix
everywhere and make it a bit more readable.
Use kcalloc so that we don't kfree() uninitialized memory on the unwind
path, as suggested by Quentin.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
We should check the return value of the routines fishing out the proper
microcode and not try to apply if we haven't found a suitable blob.
Signed-off-by: Borislav Petkov <bp@suse.de>
Improper pointer arithmetics when calculating the address of the
extended header could lead to an out of bounds memory read and kernel
panic.
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Link: http://lkml.kernel.org/r/20150225094125.GB30434@chrystal.uk.oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>