From 1d683af0356098068a8231b988036d56918b542f Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Tue, 14 Jun 2016 11:17:14 -0600 Subject: [PATCH 01/25] coresight: Fix erroneous memset in tmc_read_unprepare_etr At the end of a trace collection, we try to clear the entire buffer and enable the ETR back if it was already enabled. But, we would have adjusted the drvdata->buf to point to the beginning of the trace data in the trace buffer @drvdata->vaddr. So, the following code which clears the buffer is dangerous and can cause crashes, like below : memset(drvdata->buf, 0, drvdata->size); Unable to handle kernel paging request at virtual address ffffff800a145000 pgd = ffffffc974726000 *pgd=00000009f3e91003, *pud=00000009f3e91003, *pmd=0000000000000000 PREEMPT SMP Modules linked in: CPU: 4 PID: 1692 Comm: dd Not tainted 4.7.0-rc2+ #1721 Hardware name: ARM Juno development board (r0) (DT) task: ffffffc9734a0080 ti: ffffffc974460000 task.ti: ffffffc974460000 PC is at __memset+0x1ac/0x200 LR is at tmc_read_unprepare_etr+0x144/0x1bc pc : [] lr : [] pstate: 200001c5 ... [] __memset+0x1ac/0x200 [] tmc_release+0x90/0x94 [] __fput+0xa8/0x1ec [] ____fput+0xc/0x14 [] task_work_run+0xb0/0xe4 [] do_notify_resume+0x64/0x6c [] work_pending+0x10/0x14 Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428) Since we clear the buffer anyway in the following call to tmc_etr_enable_hw(), remove the erroneous memset(). Fixes: commit de5461970b3e9e1 ("coresight: tmc: allocating memory when needed") Cc: Mathieu Poirier Signed-off-by: Suzuki K Poulose Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman (cherry picked from commit f3b8172fe15fbed0d0d33d99780e122213e00684) Signed-off-by: Alex Shi --- drivers/hwtracing/coresight/coresight-tmc-etr.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 847d1b5f2c13..44eaae6e7d29 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -300,13 +300,10 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata) if (local_read(&drvdata->mode) == CS_MODE_SYSFS) { /* * The trace run will continue with the same allocated trace - * buffer. As such zero-out the buffer so that we don't end - * up with stale data. - * - * Since the tracer is still enabled drvdata::buf - * can't be NULL. + * buffer. The trace buffer is cleared in tmc_etr_enable_hw(), + * so we don't have to explicitly clear it. Also, since the + * tracer is still enabled drvdata::buf can't be NULL. */ - memset(drvdata->buf, 0, drvdata->size); tmc_etr_enable_hw(drvdata); } else { /* From d3539486bbce11701e012428ce7690000dd34fa6 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 24 Aug 2016 10:07:14 +0100 Subject: [PATCH 02/25] perf/core: Use this_cpu_ptr() when stopping AUX events When tearing down an AUX buf for an event via perf_mmap_close(), __perf_event_output_stop() is called on the event's CPU to ensure that trace generation is halted before the process of unmapping and freeing the buffer pages begins. The callback is performed via cpu_function_call(), which ensures that it runs with interrupts disabled and is therefore not preemptible. Unfortunately, the current code grabs the per-cpu context pointer using get_cpu_ptr(), which unnecessarily disables preemption and doesn't pair the call with put_cpu_ptr(), leading to a preempt_count() imbalance and a BUG when freeing the AUX buffer later on: WARNING: CPU: 1 PID: 2249 at kernel/events/ring_buffer.c:539 __rb_free_aux+0x10c/0x120 Modules linked in: [...] Call Trace: [] dump_stack+0x4f/0x72 [] __warn+0xc6/0xe0 [] warn_slowpath_null+0x18/0x20 [] __rb_free_aux+0x10c/0x120 [] rb_free_aux+0x13/0x20 [] perf_mmap_close+0x29e/0x2f0 [] ? perf_iterate_ctx+0xe0/0xe0 [] remove_vma+0x25/0x60 [] exit_mmap+0x106/0x140 [] mmput+0x1c/0xd0 [] do_exit+0x253/0xbf0 [] do_group_exit+0x3e/0xb0 [] get_signal+0x249/0x640 [] do_signal+0x23/0x640 [] ? _raw_write_unlock_irq+0x12/0x30 [] ? _raw_spin_unlock_irq+0x9/0x10 [] ? __schedule+0x2c6/0x710 [] exit_to_usermode_loop+0x74/0x90 [] prepare_exit_to_usermode+0x26/0x30 [] retint_user+0x8/0x10 This patch uses this_cpu_ptr() instead of get_cpu_ptr(), since preemption is already disabled by the caller. Signed-off-by: Will Deacon Reviewed-by: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vince Weaver Fixes: 95ff4ca26c49 ("perf/core: Free AUX pages in unmap path") Link: http://lkml.kernel.org/r/20160824091905.GA16944@arm.com Signed-off-by: Ingo Molnar (cherry picked from commit 8b6a3fe8fab97716990a3abde1a01fb5a34552a3) Signed-off-by: Alex Shi --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index cda4b292135a..582c607417d0 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5797,7 +5797,7 @@ static int __perf_pmu_output_stop(void *info) { struct perf_event *event = info; struct pmu *pmu = event->pmu; - struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context); + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); struct remote_output ro = { .rb = event->rb, }; From de10b843a6031739406ab603b5bce2a678ffbf40 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Thu, 25 Aug 2016 15:18:54 -0600 Subject: [PATCH 03/25] coresight: Remove erroneous dma_free_coherent in tmc_probe commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed") removed the static allocation of buffer for the trace data in ETR mode in tmc_probe. However it failed to remove the "devm_free_coherent" in tmc_probe when the probe fails due to other reasons. This patch gets rid of the incorrect dma_free_coherent() call. Fixes: commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed") Cc: Mathieu Poirier Signed-off-by: Suzuki K Poulose Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 481e46fe7a88557b66330cbb047b25cc13eff4b9) Signed-off-by: Alex Shi --- drivers/hwtracing/coresight/coresight-tmc.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c index 9e02ac963cd0..3978cbb6b038 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.c +++ b/drivers/hwtracing/coresight/coresight-tmc.c @@ -388,9 +388,6 @@ static int tmc_probe(struct amba_device *adev, const struct amba_id *id) err_misc_register: coresight_unregister(drvdata->csdev); err_devm_kzalloc: - if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) - dma_free_coherent(dev, drvdata->size, - drvdata->vaddr, drvdata->paddr); return ret; } From cac5bc4eaee18bad89a256aa62627508df7cec91 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 23 Sep 2016 17:55:05 +0100 Subject: [PATCH 04/25] arm64: fix dump_backtrace/unwind_frame with NULL tsk In some places, dump_backtrace() is called with a NULL tsk parameter, e.g. in bug_handler() in arch/arm64, or indirectly via show_stack() in core code. The expectation is that this is treated as if current were passed instead of NULL. Similar is true of unwind_frame(). Commit a80a0eb70c358f8c ("arm64: make irq_stack_ptr more robust") didn't take this into account. In dump_backtrace() it compares tsk against current *before* we check if tsk is NULL, and in unwind_frame() we never set tsk if it is NULL. Due to this, we won't initialise irq_stack_ptr in either function. In dump_backtrace() this results in calling dump_mem() for memory immediately above the IRQ stack range, rather than for the relevant range on the task stack. In unwind_frame we'll reject unwinding frames on the IRQ stack. In either case this results in incomplete or misleading backtrace information, but is not otherwise problematic. The initial percpu areas (including the IRQ stacks) are allocated in the linear map, and dump_mem uses __get_user(), so we shouldn't access anything with side-effects, and will handle holes safely. This patch fixes the issue by having both functions handle the NULL tsk case before doing anything else with tsk. Signed-off-by: Mark Rutland Fixes: a80a0eb70c358f8c ("arm64: make irq_stack_ptr more robust") Acked-by: James Morse Cc: Catalin Marinas Cc: Will Deacon Cc: Yang Shi Signed-off-by: Will Deacon (cherry picked from commit b5e7307d9d5a340d2c9fabbe1cee137d4c682c71) Signed-off-by: Alex Shi --- arch/arm64/kernel/stacktrace.c | 5 ++++- arch/arm64/kernel/traps.c | 10 +++++----- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index cfd46c227c8c..a99eff9afc1f 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -43,6 +43,9 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) unsigned long fp = frame->fp; unsigned long irq_stack_ptr; + if (!tsk) + tsk = current; + /* * Switching between stacks is valid when tracing current and in * non-preemptible context. @@ -67,7 +70,7 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) frame->pc = *(unsigned long *)(fp + 8); #ifdef CONFIG_FUNCTION_GRAPH_TRACER - if (tsk && tsk->ret_stack && + if (tsk->ret_stack && (frame->pc == (unsigned long)return_to_handler)) { /* * This is a case where function graph tracer has diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index c5392081b49b..0f35cbb22c0a 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -149,6 +149,11 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) unsigned long irq_stack_ptr; int skip; + pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk); + + if (!tsk) + tsk = current; + /* * Switching between stacks is valid when tracing current and in * non-preemptible context. @@ -158,11 +163,6 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) else irq_stack_ptr = 0; - pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk); - - if (!tsk) - tsk = current; - if (tsk == current) { frame.fp = (unsigned long)__builtin_frame_address(0); frame.sp = current_stack_pointer; From 7841412d4c9d4977e62adaf05e6b55461ee25847 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 13 Oct 2016 17:42:09 +0100 Subject: [PATCH 05/25] arm64: kaslr: fix breakage with CONFIG_MODVERSIONS=y As it turns out, the KASLR code breaks CONFIG_MODVERSIONS, since the kcrctab has an absolute address field that is relocated at runtime when the kernel offset is randomized. This has been fixed already for PowerPC in the past, so simply wire up the existing code dealing with this issue. Cc: Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR") Tested-by: Timur Tabi Signed-off-by: Ard Biesheuvel Signed-off-by: Will Deacon (cherry picked from commit 9c0e83c371cf4696926c95f9c8c77cd6ea803426) Signed-off-by: Alex Shi --- arch/arm64/include/asm/module.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h index e12af6754634..06ff7fd9e81f 100644 --- a/arch/arm64/include/asm/module.h +++ b/arch/arm64/include/asm/module.h @@ -17,6 +17,7 @@ #define __ASM_MODULE_H #include +#include #define MODULE_ARCH_VERMAGIC "aarch64" @@ -32,6 +33,10 @@ u64 module_emit_plt_entry(struct module *mod, const Elf64_Rela *rela, Elf64_Sym *sym); #ifdef CONFIG_RANDOMIZE_BASE +#ifdef CONFIG_MODVERSIONS +#define ARCH_RELOCATES_KCRCTAB +#define reloc_start (kimage_vaddr - KIMAGE_VADDR) +#endif extern u64 module_alloc_base; #else #define module_alloc_base ((u64)_etext - MODULES_VSIZE) From 7f42466b9d485dd9bba16004003e84c61e712391 Mon Sep 17 00:00:00 2001 From: Huang Shijie Date: Tue, 8 Nov 2016 13:44:38 +0800 Subject: [PATCH 06/25] arm64: hugetlb: remove the wrong pmd check in find_num_contig() The find_num_contig() will return 1 when the pmd is not present. It will cause a kernel dead loop in the following scenaro: 1.) pmd entry is not present. 2.) the page fault occurs: ... hugetlb_fault() --> hugetlb_no_page() --> set_huge_pte_at() 3.) set_huge_pte_at() will only set the first PMD entry, since the find_num_contig just return 1 in this case. So the PMD entries are all empty except the first one. 4.) when kernel accesses the address mapped by the second PMD entry, a new page fault occurs: ... hugetlb_fault() --> huge_ptep_set_access_flags() The second PMD entry is still empty now. 5.) When the kernel returns, the access will cause a page fault again. The kernel will run like the "4)" above. We will see a dead loop since here. The dead loop is caught in the 32M hugetlb page (2M PMD + Contiguous bit). This patch removes wrong pmd check, and fixes this dead loop. This patch also removes the redundant checks for PGD/PUD in the find_num_contig(). Acked-by: Steve Capper Signed-off-by: Huang Shijie Reviewed-by: Catalin Marinas Signed-off-by: Catalin Marinas (cherry picked from commit 20156ce2365d61beaa6f5a78a7a789044e0e7acc) Signed-off-by: Alex Shi --- arch/arm64/mm/hugetlbpage.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index da30529bb1f6..0a18f81e4910 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -51,20 +51,8 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr, *pgsize = PAGE_SIZE; if (!pte_cont(pte)) return 1; - if (!pgd_present(*pgd)) { - VM_BUG_ON(!pgd_present(*pgd)); - return 1; - } pud = pud_offset(pgd, addr); - if (!pud_present(*pud)) { - VM_BUG_ON(!pud_present(*pud)); - return 1; - } pmd = pmd_offset(pud, addr); - if (!pmd_present(*pmd)) { - VM_BUG_ON(!pmd_present(*pmd)); - return 1; - } if ((pte_t *)pmd == ptep) { *pgsize = PMD_SIZE; return CONT_PMDS; From 4fc54a30664c50ccca342ec0a81c8c1efce0a0bf Mon Sep 17 00:00:00 2001 From: Huang Shijie Date: Tue, 8 Nov 2016 13:44:39 +0800 Subject: [PATCH 07/25] arm64: hugetlb: fix the wrong address for several functions The libhugetlbfs meets several failures since the following functions do not use the correct address: huge_ptep_get_and_clear() huge_ptep_set_access_flags() huge_ptep_set_wrprotect() huge_ptep_clear_flush() This patch fixes the wrong address for them. Signed-off-by: Huang Shijie Reviewed-by: Catalin Marinas Signed-off-by: Catalin Marinas (cherry picked from commit 0c2f0afe3582c58efeef93bc57bc07d502132618) Signed-off-by: Alex Shi --- arch/arm64/mm/hugetlbpage.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 0a18f81e4910..1af57cffcdec 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -200,7 +200,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, ncontig = find_num_contig(mm, addr, cpte, *cpte, &pgsize); /* save the 1st pte to return */ pte = ptep_get_and_clear(mm, addr, cpte); - for (i = 1; i < ncontig; ++i) { + for (i = 1, addr += pgsize; i < ncontig; ++i, addr += pgsize) { /* * If HW_AFDBM is enabled, then the HW could * turn on the dirty bit for any of the page @@ -238,7 +238,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, pfn = pte_pfn(*cpte); ncontig = find_num_contig(vma->vm_mm, addr, cpte, *cpte, &pgsize); - for (i = 0; i < ncontig; ++i, ++cpte) { + for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) { changed = ptep_set_access_flags(vma, addr, cpte, pfn_pte(pfn, hugeprot), @@ -261,7 +261,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, cpte = huge_pte_offset(mm, addr); ncontig = find_num_contig(mm, addr, cpte, *cpte, &pgsize); - for (i = 0; i < ncontig; ++i, ++cpte) + for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) ptep_set_wrprotect(mm, addr, cpte); } else { ptep_set_wrprotect(mm, addr, ptep); @@ -279,7 +279,7 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma, cpte = huge_pte_offset(vma->vm_mm, addr); ncontig = find_num_contig(vma->vm_mm, addr, cpte, *cpte, &pgsize); - for (i = 0; i < ncontig; ++i, ++cpte) + for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) ptep_clear_flush(vma, addr, cpte); } else { ptep_clear_flush(vma, addr, ptep); From 5a54b72529e96f3ca594e9ef3bfc9be68700900b Mon Sep 17 00:00:00 2001 From: Huang Shijie Date: Wed, 11 Jan 2017 14:02:00 +0800 Subject: [PATCH 08/25] arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags In current code, the @changed always returns the last one's status for the huge page with the contiguous bit set. This is really not what we want. Even one of the PTEs is changed, we should tell it to the caller. This patch fixes this issue. Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Cc: # 4.5.x- Signed-off-by: Huang Shijie Signed-off-by: Catalin Marinas (cherry picked from commit 69d012345a1a32d3f03957f14d972efccc106a98) Signed-off-by: Alex Shi --- arch/arm64/mm/hugetlbpage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 1af57cffcdec..019f13637fae 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -239,7 +239,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, ncontig = find_num_contig(vma->vm_mm, addr, cpte, *cpte, &pgsize); for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) { - changed = ptep_set_access_flags(vma, addr, cpte, + changed |= ptep_set_access_flags(vma, addr, cpte, pfn_pte(pfn, hugeprot), dirty); From f03ec877840b5fd07db4fb4f76a958de0f38e936 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Wed, 4 May 2016 18:49:55 +0530 Subject: [PATCH 09/25] PM / OPP: Remove useless check Regulators are optional for devices using OPPs and the OPP core shouldn't be printing any errors for such missing regulators. It was fine before the commit 0c717d0f9cb4, but that failed to update this part of the code to remove an 'always true' check and an extra unwanted print message. Fix that now. Fixes: 0c717d0f9cb4 (PM / OPP: Initialize regulator pointer to an error value) Reported-by: Marc Gonzalez Signed-off-by: Viresh Kumar Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki (cherry picked from commit 21f8a99ce61b2d4b74bd425a5bf7e9efbe162788) Signed-off-by: Alex Shi --- drivers/base/power/opp/core.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c index 433b60092972..d8f4cc22856c 100644 --- a/drivers/base/power/opp/core.c +++ b/drivers/base/power/opp/core.c @@ -259,9 +259,6 @@ unsigned long dev_pm_opp_get_max_volt_latency(struct device *dev) reg = opp_table->regulator; if (IS_ERR(reg)) { /* Regulator may not be required for device */ - if (reg) - dev_err(dev, "%s: Invalid regulator (%ld)\n", __func__, - PTR_ERR(reg)); rcu_read_unlock(); return 0; } From d6b9c33cfd0fbc5933ca540fb8c0d77c5fcb84f7 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 13 Jun 2016 11:15:14 +0100 Subject: [PATCH 10/25] arm64: fix dump_instr when PAN and UAO are in use If the kernel is set to show unhandled signals, and a user task does not handle a SIGILL as a result of an instruction abort, we will attempt to log the offending instruction with dump_instr before killing the task. We use dump_instr to log the encoding of the offending userspace instruction. However, dump_instr is also used to dump instructions from kernel space, and internally always switches to KERNEL_DS before dumping the instruction with get_user. When both PAN and UAO are in use, reading a user instruction via get_user while in KERNEL_DS will result in a permission fault, which leads to an Oops. As we have regs corresponding to the context of the original instruction abort, we can inspect this and only flip to KERNEL_DS if the original abort was taken from the kernel, avoiding this issue. At the same time, remove the redundant (and incorrect) comments regarding the order dump_mem and dump_instr are called in. Cc: Catalin Marinas Cc: James Morse Cc: Robin Murphy Cc: #4.6+ Signed-off-by: Mark Rutland Reported-by: Vladimir Murzin Tested-by: Vladimir Murzin Fixes: 57f4959bad0a154a ("arm64: kernel: Add support for User Access Override") Signed-off-by: Will Deacon (cherry picked from commit c5cea06be060f38e5400d796e61cfc8c36e52924) Signed-off-by: Alex Shi --- arch/arm64/kernel/traps.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 0f35cbb22c0a..fe60ecf8be19 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -64,8 +64,7 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom, /* * We need to switch to kernel mode so that we can use __get_user - * to safely read from kernel space. Note that we now dump the - * code first, just in case the backtrace kills us. + * to safely read from kernel space. */ fs = get_fs(); set_fs(KERNEL_DS); @@ -111,21 +110,12 @@ static void dump_backtrace_entry(unsigned long where) print_ip_sym(where); } -static void dump_instr(const char *lvl, struct pt_regs *regs) +static void __dump_instr(const char *lvl, struct pt_regs *regs) { unsigned long addr = instruction_pointer(regs); - mm_segment_t fs; char str[sizeof("00000000 ") * 5 + 2 + 1], *p = str; int i; - /* - * We need to switch to kernel mode so that we can use __get_user - * to safely read from kernel space. Note that we now dump the - * code first, just in case the backtrace kills us. - */ - fs = get_fs(); - set_fs(KERNEL_DS); - for (i = -4; i < 1; i++) { unsigned int val, bad; @@ -139,8 +129,18 @@ static void dump_instr(const char *lvl, struct pt_regs *regs) } } printk("%sCode: %s\n", lvl, str); +} - set_fs(fs); +static void dump_instr(const char *lvl, struct pt_regs *regs) +{ + if (!user_mode(regs)) { + mm_segment_t fs = get_fs(); + set_fs(KERNEL_DS); + __dump_instr(lvl, regs); + set_fs(fs); + } else { + __dump_instr(lvl, regs); + } } static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) From b1c272801015672bac0fb0b4398cd94a47b70f66 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Thu, 2 Jun 2016 15:27:04 +0100 Subject: [PATCH 11/25] arm64: spinlock: order spin_{is_locked,unlock_wait} against local locks spin_is_locked has grown two very different use-cases: (1) [The sane case] API functions may require a certain lock to be held by the caller and can therefore use spin_is_locked as part of an assert statement in order to verify that the lock is indeed held. For example, usage of assert_spin_locked. (2) [The insane case] There are two locks, where a CPU takes one of the locks and then checks whether or not the other one is held before accessing some shared state. For example, the "optimized locking" in ipc/sem.c. In the latter case, the sequence looks like: spin_lock(&sem->lock); if (!spin_is_locked(&sma->sem_perm.lock)) /* Access shared state */ and requires that the spin_is_locked check is ordered after taking the sem->lock. Unfortunately, since our spinlocks are implemented using a LDAXR/STXR sequence, the read of &sma->sem_perm.lock can be speculated before the STXR and consequently return a stale value. Whilst this hasn't been seen to cause issues in practice, PowerPC fixed the same issue in 51d7d5205d33 ("powerpc: Add smp_mb() to arch_spin_is_locked()") and, although we did something similar for spin_unlock_wait in d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") that doesn't actually take care of ordering against local acquisition of a different lock. This patch adds an smp_mb() to the start of our arch_spin_is_locked and arch_spin_unlock_wait routines to ensure that the lock value is always loaded after any other locks have been taken by the current CPU. Reported-by: Peter Zijlstra Signed-off-by: Will Deacon (cherry picked from commit 38b850a73034f075c4088e7511b36ebbef9dce00) Signed-off-by: Alex Shi --- arch/arm64/include/asm/spinlock.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index fc9682bfe002..aac64d55cb22 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -31,6 +31,12 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock) unsigned int tmp; arch_spinlock_t lockval; + /* + * Ensure prior spin_lock operations to other locks have completed + * on this CPU before we test whether "lock" is locked. + */ + smp_mb(); + asm volatile( " sevl\n" "1: wfe\n" @@ -148,6 +154,7 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock) static inline int arch_spin_is_locked(arch_spinlock_t *lock) { + smp_mb(); /* See arch_spin_unlock_wait */ return !arch_spin_value_unlocked(READ_ONCE(*lock)); } From 78732f2e083800c5044e3c4b8161e5fc7a1d4adc Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 8 Jun 2016 15:10:57 +0100 Subject: [PATCH 12/25] arm64: spinlock: fix spin_unlock_wait for LSE atomics Commit d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") fixed spin_unlock_wait for LL/SC-based atomics under the premise that the LSE atomics (in particular, the LDADDA instruction) are indivisible. Unfortunately, these instructions are only indivisible when used with the -AL (full ordering) suffix and, consequently, the same issue can theoretically be observed with LSE atomics, where a later (in program order) load can be speculated before the write portion of the atomic operation. This patch fixes the issue by performing a CAS of the lock once we've established that it's unlocked, in much the same way as the LL/SC code. Fixes: d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") Signed-off-by: Will Deacon (cherry picked from commit 3a5facd09da848193f5bcb0dea098a298bc1a29d) Signed-off-by: Alex Shi --- arch/arm64/include/asm/spinlock.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index aac64d55cb22..d5c894253e73 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -43,13 +43,17 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock) "2: ldaxr %w0, %2\n" " eor %w1, %w0, %w0, ror #16\n" " cbnz %w1, 1b\n" + /* Serialise against any concurrent lockers */ ARM64_LSE_ATOMIC_INSN( /* LL/SC */ " stxr %w1, %w0, %2\n" -" cbnz %w1, 2b\n", /* Serialise against any concurrent lockers */ - /* LSE atomics */ " nop\n" -" nop\n") +" nop\n", + /* LSE atomics */ +" mov %w1, %w0\n" +" cas %w0, %w0, %2\n" +" eor %w1, %w1, %w0\n") +" cbnz %w1, 2b\n" : "=&r" (lockval), "=&r" (tmp), "+Q" (*lock) : : "memory"); From f76dc9a5db236a11ee3adf86aa0d7c2751582bb5 Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Tue, 26 Jul 2016 10:16:55 -0700 Subject: [PATCH 13/25] arm64: Only select ARM64_MODULE_PLTS if MODULES=y MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Selecting CONFIG_RANDOMIZE_BASE=y and CONFIG_MODULES=n fails to build the module PLTs support: CC arch/arm64/kernel/module-plts.o /work/Linux/linux-2.6-aarch64/arch/arm64/kernel/module-plts.c: In function ‘module_emit_plt_entry’: /work/Linux/linux-2.6-aarch64/arch/arm64/kernel/module-plts.c:32:49: error: dereferencing pointer to incomplete type ‘struct module’ This patch selects ARM64_MODULE_PLTS conditionally only if MODULES is enabled. Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR") Cc: # 4.6+ Reported-by: Jeff Vander Stoep Acked-by: Ard Biesheuvel Acked-by: Mark Rutland Signed-off-by: Catalin Marinas (cherry picked from commit b9c220b589daaf140f5b8ebe502c98745b94e65c) Signed-off-by: Alex Shi --- arch/arm64/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 97583a1878db..2693917d200f 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -750,7 +750,7 @@ config RELOCATABLE config RANDOMIZE_BASE bool "Randomize the address of the kernel image" - select ARM64_MODULE_PLTS + select ARM64_MODULE_PLTS if MODULES select RELOCATABLE help Randomizes the virtual address at which the kernel image is From 94fa0d8ea617ea063c02781da224a0cfa6fe2233 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 24 Aug 2016 10:07:14 +0100 Subject: [PATCH 14/25] perf/core: Use this_cpu_ptr() when stopping AUX events When tearing down an AUX buf for an event via perf_mmap_close(), __perf_event_output_stop() is called on the event's CPU to ensure that trace generation is halted before the process of unmapping and freeing the buffer pages begins. The callback is performed via cpu_function_call(), which ensures that it runs with interrupts disabled and is therefore not preemptible. Unfortunately, the current code grabs the per-cpu context pointer using get_cpu_ptr(), which unnecessarily disables preemption and doesn't pair the call with put_cpu_ptr(), leading to a preempt_count() imbalance and a BUG when freeing the AUX buffer later on: WARNING: CPU: 1 PID: 2249 at kernel/events/ring_buffer.c:539 __rb_free_aux+0x10c/0x120 Modules linked in: [...] Call Trace: [] dump_stack+0x4f/0x72 [] __warn+0xc6/0xe0 [] warn_slowpath_null+0x18/0x20 [] __rb_free_aux+0x10c/0x120 [] rb_free_aux+0x13/0x20 [] perf_mmap_close+0x29e/0x2f0 [] ? perf_iterate_ctx+0xe0/0xe0 [] remove_vma+0x25/0x60 [] exit_mmap+0x106/0x140 [] mmput+0x1c/0xd0 [] do_exit+0x253/0xbf0 [] do_group_exit+0x3e/0xb0 [] get_signal+0x249/0x640 [] do_signal+0x23/0x640 [] ? _raw_write_unlock_irq+0x12/0x30 [] ? _raw_spin_unlock_irq+0x9/0x10 [] ? __schedule+0x2c6/0x710 [] exit_to_usermode_loop+0x74/0x90 [] prepare_exit_to_usermode+0x26/0x30 [] retint_user+0x8/0x10 This patch uses this_cpu_ptr() instead of get_cpu_ptr(), since preemption is already disabled by the caller. Signed-off-by: Will Deacon Reviewed-by: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vince Weaver Fixes: 95ff4ca26c49 ("perf/core: Free AUX pages in unmap path") Link: http://lkml.kernel.org/r/20160824091905.GA16944@arm.com Signed-off-by: Ingo Molnar (cherry picked from commit 8b6a3fe8fab97716990a3abde1a01fb5a34552a3) Signed-off-by: Alex Shi --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index e2998a4444b0..cbaaf6c70f29 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5797,7 +5797,7 @@ static int __perf_pmu_output_stop(void *info) { struct perf_event *event = info; struct pmu *pmu = event->pmu; - struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context); + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); struct remote_output ro = { .rb = event->rb, }; From a877498ddb55fce5f94674e4d0b39747932ef6b8 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 24 Aug 2016 18:02:08 +0100 Subject: [PATCH 15/25] arm64: avoid TLB conflict with CONFIG_RANDOMIZE_BASE When CONFIG_RANDOMIZE_BASE is selected, we modify the page tables to remap the kernel at a newly-chosen VA range. We do this with the MMU disabled, but do not invalidate TLBs prior to re-enabling the MMU with the new tables. Thus the old mappings entries may still live in TLBs, and we risk violating Break-Before-Make requirements, leading to TLB conflicts and/or other issues. We invalidate TLBs when we uninsall the idmap in early setup code, but prior to this we are subject to issues relating to the Break-Before-Make violation. Avoid these issues by invalidating the TLBs before the new mappings can be used by the hardware. Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR") Cc: # 4.6+ Acked-by: Ard Biesheuvel Acked-by: Will Deacon Signed-off-by: Mark Rutland Signed-off-by: Catalin Marinas (cherry picked from commit fd363bd417ddb6103564c69cfcbd92d9a7877431) Signed-off-by: Alex Shi --- arch/arm64/kernel/head.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 9890d04a96cb..488708687fda 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -697,6 +697,9 @@ __enable_mmu: isb bl __create_page_tables // recreate kernel mapping + tlbi vmalle1 // Remove any stale TLB entries + dsb nsh + msr sctlr_el1, x19 // re-enable the MMU isb ic iallu // flush instructions fetched From 5f642b4ad6b579bbdca82f690c2de314eccae507 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Thu, 25 Aug 2016 15:18:54 -0600 Subject: [PATCH 16/25] coresight: Remove erroneous dma_free_coherent in tmc_probe commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed") removed the static allocation of buffer for the trace data in ETR mode in tmc_probe. However it failed to remove the "devm_free_coherent" in tmc_probe when the probe fails due to other reasons. This patch gets rid of the incorrect dma_free_coherent() call. Fixes: commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed") Cc: Mathieu Poirier Signed-off-by: Suzuki K Poulose Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 481e46fe7a88557b66330cbb047b25cc13eff4b9) Signed-off-by: Alex Shi --- drivers/hwtracing/coresight/coresight-tmc.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c index 9e02ac963cd0..3978cbb6b038 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.c +++ b/drivers/hwtracing/coresight/coresight-tmc.c @@ -388,9 +388,6 @@ static int tmc_probe(struct amba_device *adev, const struct amba_id *id) err_misc_register: coresight_unregister(drvdata->csdev); err_devm_kzalloc: - if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) - dma_free_coherent(dev, drvdata->size, - drvdata->vaddr, drvdata->paddr); return ret; } From 2780600caa02a0b1d28a09bfe606fca270451430 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Tue, 31 May 2016 15:58:00 +0100 Subject: [PATCH 17/25] arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled With ARM64_64K_PAGES and RANDOMIZE_TEXT_OFFSET enabled, we hit the following issue on the boot: kernel BUG at arch/arm64/mm/mmu.c:480! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.6.0 #310 Hardware name: ARM Juno development board (r2) (DT) task: ffff000008d58a80 ti: ffff000008d30000 task.ti: ffff000008d30000 PC is at map_kernel_segment+0x44/0xb0 LR is at paging_init+0x84/0x5b0 pc : [] lr : [] pstate: 600002c5 Call trace: [] map_kernel_segment+0x44/0xb0 [] paging_init+0x84/0x5b0 [] setup_arch+0x198/0x534 [] start_kernel+0x70/0x388 [] __primary_switched+0x30/0x74 Commit 7eb90f2ff7e3 ("arm64: cover the .head.text section in the .text segment mapping") removed the alignment between the .head.text and .text sections, and used the _text rather than the _stext interval for mapping the .text segment. Prior to this commit _stext was always section aligned and didn't cause any issue even when RANDOMIZE_TEXT_OFFSET was enabled. Since that alignment has been removed and _text is used to map the .text segment, we need ensure _text is always page aligned when RANDOMIZE_TEXT_OFFSET is enabled. This patch adds logic to TEXT_OFFSET fuzzing to ensure that the offset is always aligned to the kernel page size. To ensure this, we rely on the PAGE_SHIFT being available via Kconfig. Signed-off-by: Mark Rutland Reported-by: Sudeep Holla Cc: Ard Biesheuvel Cc: Catalin Marinas Cc: Will Deacon Fixes: 7eb90f2ff7e3 ("arm64: cover the .head.text section in the .text segment mapping") Signed-off-by: Will Deacon (cherry picked from commit aed7eb8367939244ba19445292ffdfc398e0d66a) Signed-off-by: Alex Shi --- arch/arm64/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 0a9bf4500852..ed9a181c4db5 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -60,7 +60,9 @@ head-y := arch/arm64/kernel/head.o # The byte offset of the kernel image in RAM from the start of RAM. ifeq ($(CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET), y) -TEXT_OFFSET := $(shell awk 'BEGIN {srand(); printf "0x%03x000\n", int(512 * rand())}') +TEXT_OFFSET := $(shell awk "BEGIN {srand(); printf \"0x%06x\n\", \ + int(2 * 1024 * 1024 / (2 ^ $(CONFIG_ARM64_PAGE_SHIFT)) * \ + rand()) * (2 ^ $(CONFIG_ARM64_PAGE_SHIFT))}") else TEXT_OFFSET := 0x00080000 endif From 0f0c7c1c0c4e2f479a8c2056ee9cd835b9bd4854 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 3 Mar 2016 15:10:59 +0100 Subject: [PATCH 18/25] arm64: enable CONFIG_DEBUG_RODATA by default In spite of its name, CONFIG_DEBUG_RODATA is an important hardening feature for production kernels, and distros all enable it by default in their kernel configs. However, since enabling it used to result in more granular, and thus less efficient kernel mappings, it is not enabled by default for performance reasons. However, since commit 2f39b5f91eb4 ("arm64: mm: Mark .rodata as RO"), the various kernel segments (.text, .rodata, .init and .data) are already mapped individually, and the only effect of setting CONFIG_DEBUG_RODATA is that the existing .text and .rodata mappings are updated late in the boot sequence to have their read-only attributes set, which means that any performance concerns related to enabling CONFIG_DEBUG_RODATA are no longer valid. So from now on, make CONFIG_DEBUG_RODATA default to 'y' Signed-off-by: Ard Biesheuvel Acked-by: Mark Rutland Acked-by: Kees Cook Signed-off-by: Catalin Marinas (cherry picked from commit 57efac2f7108e3255d0dfe512290c9896f4ed55f) Signed-off-by: Alex Shi --- arch/arm64/Kconfig.debug | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug index ab1cb1fc4e3d..d33041614206 100644 --- a/arch/arm64/Kconfig.debug +++ b/arch/arm64/Kconfig.debug @@ -64,13 +64,13 @@ config DEBUG_SET_MODULE_RONX config DEBUG_RODATA bool "Make kernel text and rodata read-only" + default y help If this is set, kernel text and rodata will be made read-only. This is to help catch accidental or malicious attempts to change the - kernel's executable code. Additionally splits rodata from kernel - text so it can be made explicitly non-executable. + kernel's executable code. - If in doubt, say Y + If in doubt, say Y config DEBUG_ALIGN_RODATA depends on DEBUG_RODATA From 351068f5efa8ae188bb1250eae0aa74060d493c8 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Thu, 8 Sep 2016 09:57:08 +0200 Subject: [PATCH 19/25] fs/proc/kcore.c: Add bounce buffer for ktext data We hit hardened usercopy feature check for kernel text access by reading kcore file: usercopy: kernel memory exposure attempt detected from ffffffff8179a01f () (4065 bytes) kernel BUG at mm/usercopy.c:75! Bypassing this check for kcore by adding bounce buffer for ktext data. Reported-by: Steve Best Fixes: f5509cc18daa ("mm: Hardened usercopy") Suggested-by: Kees Cook Signed-off-by: Jiri Olsa Acked-by: Kees Cook Signed-off-by: Linus Torvalds (cherry picked from commit df04abfd181acc276ba6762c8206891ae10ae00d) Signed-off-by: Alex Shi --- fs/proc/kcore.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 92e6726f6e37..d1e5a66a33ba 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -516,7 +516,12 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) if (kern_addr_valid(start)) { unsigned long n; - n = copy_to_user(buffer, (char *)start, tsz); + /* + * Using bounce buffer to bypass the + * hardened user copy kernel text checks. + */ + memcpy(buf, (char *) start, tsz); + n = copy_to_user(buffer, buf, tsz); /* * We cannot distinguish between fault on source * and fault on destination. When this happens From d425243ac3340984ea43d32d3815eeb502d43bce Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Thu, 17 Mar 2016 12:47:12 +0100 Subject: [PATCH 20/25] s390: add DEBUG_RODATA support git commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings") adds a bogus warning to the console which states that s390 does not support kernel memory protection. This however is not true. We do support that since a couple of years however in a different way than the author of the above named patch expected. To get rid of the misleading message implement the mark_rodata_ro function and emit a message which states the amount of memory which was write protected already earlier. This is the same what parisc currently does. We currently do not support the kernel parameter "rodata=off" which would allow to write to the rodata section again. However since we have this feature since years without any problems there is no reason to add support for this. Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky (cherry picked from commit 91d37211769510ae0b4747045d8f81d3b9dd4278) Signed-off-by: Alex Shi --- arch/s390/Kconfig | 3 +++ arch/s390/mm/init.c | 10 +++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 3a55f493c7da..8ddd756faa32 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -62,6 +62,9 @@ config PCI_QUIRKS config ARCH_SUPPORTS_UPROBES def_bool y +config DEBUG_RODATA + def_bool y + config S390 def_bool y select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index c722400c7697..9c1b3c5e3174 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -108,6 +108,13 @@ void __init paging_init(void) free_area_init_nodes(max_zone_pfns); } +void mark_rodata_ro(void) +{ + /* Text and rodata are already protected. Nothing to do here. */ + pr_info("Write protecting the kernel read-only data: %luk\n", + ((unsigned long)&_eshared - (unsigned long)&_stext) >> 10); +} + void __init mem_init(void) { if (MACHINE_HAS_TLB_LC) @@ -126,9 +133,6 @@ void __init mem_init(void) setup_zero_pages(); /* Setup zeroed pages. */ mem_init_print_info(NULL); - printk("Write protected kernel read-only data: %#lx - %#lx\n", - (unsigned long)&_stext, - PFN_ALIGN((unsigned long)&_eshared) - 1); } void free_initmem(void) From 957413466cde02bee0b6540889376d35d6b0b41e Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Tue, 7 Jun 2016 12:20:51 +0200 Subject: [PATCH 21/25] vmlinux.lds.h: allow arch specific handling of ro_after_init data section commit c74ba8b3480d ("arch: Introduce post-init read-only memory") introduced the __ro_after_init attribute which allows to add variables to the ro_after_init data section. This new section was added to rodata, even though it contains writable data. This in turn causes problems on architectures which mark the page table entries read-only that point to rodata very early. This patch allows architectures to implement an own handling of the .data..ro_after_init section. Usually that would be: - mark the rodata section read-only very early - mark the ro_after_init section read-only within mark_rodata_ro Signed-off-by: Heiko Carstens Reviewed-by: Kees Cook Signed-off-by: Martin Schwidefsky (cherry picked from commit 32fb2fc5c357fb99616bbe100dbcb27bc7f5d045) Signed-off-by: Alex Shi --- include/asm-generic/vmlinux.lds.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 772c784ba763..33eb2fe23071 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -248,6 +248,14 @@ . = ALIGN(align); \ *(.data..init_task) +/* + * Allow architectures to handle ro_after_init data on their + * own by defining an empty RO_AFTER_INIT_DATA. + */ +#ifndef RO_AFTER_INIT_DATA +#define RO_AFTER_INIT_DATA *(.data..ro_after_init) +#endif + /* * Read only Data */ @@ -256,7 +264,7 @@ .rodata : AT(ADDR(.rodata) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start_rodata) = .; \ *(.rodata) *(.rodata.*) \ - *(.data..ro_after_init) /* Read only after init */ \ + RO_AFTER_INIT_DATA /* Read only after init */ \ *(__vermagic) /* Kernel version magic */ \ . = ALIGN(8); \ VMLINUX_SYMBOL(__start___tracepoints_ptrs) = .; \ From a34cb2390f7f28f4a44568cccbc2c65a9bdff721 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 3 Mar 2016 15:10:59 +0100 Subject: [PATCH 22/25] arm64: enable CONFIG_DEBUG_RODATA by default In spite of its name, CONFIG_DEBUG_RODATA is an important hardening feature for production kernels, and distros all enable it by default in their kernel configs. However, since enabling it used to result in more granular, and thus less efficient kernel mappings, it is not enabled by default for performance reasons. However, since commit 2f39b5f91eb4 ("arm64: mm: Mark .rodata as RO"), the various kernel segments (.text, .rodata, .init and .data) are already mapped individually, and the only effect of setting CONFIG_DEBUG_RODATA is that the existing .text and .rodata mappings are updated late in the boot sequence to have their read-only attributes set, which means that any performance concerns related to enabling CONFIG_DEBUG_RODATA are no longer valid. So from now on, make CONFIG_DEBUG_RODATA default to 'y' Signed-off-by: Ard Biesheuvel Acked-by: Mark Rutland Acked-by: Kees Cook Signed-off-by: Catalin Marinas (cherry picked from commit 57efac2f7108e3255d0dfe512290c9896f4ed55f) Signed-off-by: Alex Shi --- arch/arm64/Kconfig.debug | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug index 04fb73b973f1..8b0cd45de394 100644 --- a/arch/arm64/Kconfig.debug +++ b/arch/arm64/Kconfig.debug @@ -64,13 +64,13 @@ config DEBUG_SET_MODULE_RONX config DEBUG_RODATA bool "Make kernel text and rodata read-only" + default y help If this is set, kernel text and rodata will be made read-only. This is to help catch accidental or malicious attempts to change the - kernel's executable code. Additionally splits rodata from kernel - text so it can be made explicitly non-executable. + kernel's executable code. - If in doubt, say Y + If in doubt, say Y config DEBUG_ALIGN_RODATA depends on DEBUG_RODATA && ARM64_4K_PAGES From 1ff46aa097c9d407a905de365ecdfcb1fa27d0cc Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Thu, 11 Aug 2016 14:11:05 +0100 Subject: [PATCH 23/25] arm64: hibernate: avoid potential TLB conflict In create_safe_exec_page we install a set of global mappings in TTBR0, then subsequently invalidate TLBs. While TTBR0 points at the zero page, and the TLBs should be free of stale global entries, we may have stale ASID-tagged entries (e.g. from the EFI runtime services mappings) for the same VAs. Per the ARM ARM these ASID-tagged entries may conflict with newly-allocated global entries, and we must follow a Break-Before-Make approach to avoid issues resulting from this. This patch reworks create_safe_exec_page to invalidate TLBs while the zero page is still in place, ensuring that there are no potential conflicts when the new TTBR0 value is installed. As a single CPU is online while this code executes, we do not need to perform broadcast TLB maintenance, and can call local_flush_tlb_all(), which also subsumes some barriers. The remaining assembly is converted to use write_sysreg() and isb(). Other than this, we safely manipulate TTBRs in the hibernate dance. The code we install as part of the new TTBR0 mapping (the hibernated kernel's swsusp_arch_suspend_exit) installs a zero page into TTBR1, invalidates TLBs, then installs its preferred value. Upon being restored to the middle of swsusp_arch_suspend, the new image will call __cpu_suspend_exit, which will call cpu_uninstall_idmap, installing the zero page in TTBR0 and invalidating all TLB entries. Fixes: 82869ac57b5d ("arm64: kernel: Add support for hibernate/suspend-to-disk") Signed-off-by: Mark Rutland Acked-by: James Morse Tested-by: James Morse Cc: Lorenzo Pieralisi Cc: Will Deacon Cc: # 4.7+ Signed-off-by: Catalin Marinas (cherry picked from commit 0194e760f7d2f42adb5e1db31b27a4331dd89c2f) Signed-off-by: Alex Shi --- arch/arm64/kernel/hibernate.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c index f8df75d740f4..043a07206527 100644 --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -34,6 +34,7 @@ #include #include #include +#include #include /* @@ -216,12 +217,22 @@ static int create_safe_exec_page(void *src_start, size_t length, set_pte(pte, __pte(virt_to_phys((void *)dst) | pgprot_val(PAGE_KERNEL_EXEC))); - /* Load our new page tables */ - asm volatile("msr ttbr0_el1, %0;" - "isb;" - "tlbi vmalle1is;" - "dsb ish;" - "isb" : : "r"(virt_to_phys(pgd))); + /* + * Load our new page tables. A strict BBM approach requires that we + * ensure that TLBs are free of any entries that may overlap with the + * global mappings we are about to install. + * + * For a real hibernate/resume cycle TTBR0 currently points to a zero + * page, but TLBs may contain stale ASID-tagged entries (e.g. for EFI + * runtime services), while for a userspace-driven test_resume cycle it + * points to userspace page tables (and we must point it at a zero page + * ourselves). Elsewhere we only (un)install the idmap with preemption + * disabled, so T0SZ should be as required regardless. + */ + cpu_set_reserved_ttbr0(); + local_flush_tlb_all(); + write_sysreg(virt_to_phys(pgd), ttbr0_el1); + isb(); *phys_dst_addr = virt_to_phys((void *)dst); From 18fc694578719686df59e31ae32dd862eb5f0ba8 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Thu, 11 Aug 2016 14:11:06 +0100 Subject: [PATCH 24/25] arm64: hibernate: handle allocation failures In create_safe_exec_page(), we create a copy of the hibernate exit text, along with some page tables to map this via TTBR0. We then install the new tables in TTBR0. In swsusp_arch_resume() we call create_safe_exec_page() before trying a number of operations which may fail (e.g. copying the linear map page tables). If these fail, we bail out of swsusp_arch_resume() and return an error code, but leave TTBR0 as-is. Subsequently, the core hibernate code will call free_basic_memory_bitmaps(), which will free all of the memory allocations we made, including the page tables installed in TTBR0. Thus, we may have TTBR0 pointing at dangling freed memory for some period of time. If the hibernate attempt was triggered by a user requesting a hibernate test via the reboot syscall, we may return to userspace with the clobbered TTBR0 value. Avoid these issues by reorganising swsusp_arch_resume() such that we have no failure paths after create_safe_exec_page(). We also add a check that the zero page allocation succeeded, matching what we have for other allocations. Fixes: 82869ac57b5d ("arm64: kernel: Add support for hibernate/suspend-to-disk") Signed-off-by: Mark Rutland Acked-by: James Morse Cc: Lorenzo Pieralisi Cc: Will Deacon Cc: # 4.7+ Signed-off-by: Catalin Marinas (cherry picked from commit dfbca61af0b654990b9af8297ac574a9986d8275) Signed-off-by: Alex Shi --- arch/arm64/kernel/hibernate.c | 59 +++++++++++++++++++---------------- 1 file changed, 32 insertions(+), 27 deletions(-) diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c index 043a07206527..6dd18140ebb8 100644 --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -398,6 +398,38 @@ int swsusp_arch_resume(void) void __noreturn (*hibernate_exit)(phys_addr_t, phys_addr_t, void *, void *, phys_addr_t, phys_addr_t); + /* + * Restoring the memory image will overwrite the ttbr1 page tables. + * Create a second copy of just the linear map, and use this when + * restoring. + */ + tmp_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); + if (!tmp_pg_dir) { + pr_err("Failed to allocate memory for temporary page tables."); + rc = -ENOMEM; + goto out; + } + rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, 0); + if (rc) + goto out; + + /* + * Since we only copied the linear map, we need to find restore_pblist's + * linear map address. + */ + lm_restore_pblist = LMADDR(restore_pblist); + + /* + * We need a zero page that is zero before & after resume in order to + * to break before make on the ttbr1 page tables. + */ + zero_page = (void *)get_safe_page(GFP_ATOMIC); + if (!zero_page) { + pr_err("Failed to allocate zero page."); + rc = -ENOMEM; + goto out; + } + /* * Locate the exit code in the bottom-but-one page, so that *NULL * still has disastrous affects. @@ -423,27 +455,6 @@ int swsusp_arch_resume(void) */ __flush_dcache_area(hibernate_exit, exit_size); - /* - * Restoring the memory image will overwrite the ttbr1 page tables. - * Create a second copy of just the linear map, and use this when - * restoring. - */ - tmp_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); - if (!tmp_pg_dir) { - pr_err("Failed to allocate memory for temporary page tables."); - rc = -ENOMEM; - goto out; - } - rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, 0); - if (rc) - goto out; - - /* - * Since we only copied the linear map, we need to find restore_pblist's - * linear map address. - */ - lm_restore_pblist = LMADDR(restore_pblist); - /* * KASLR will cause the el2 vectors to be in a different location in * the resumed kernel. Load hibernate's temporary copy into el2. @@ -458,12 +469,6 @@ int swsusp_arch_resume(void) __hyp_set_vectors(el2_vectors); } - /* - * We need a zero page that is zero before & after resume in order to - * to break before make on the ttbr1 page tables. - */ - zero_page = (void *)get_safe_page(GFP_ATOMIC); - hibernate_exit(virt_to_phys(tmp_pg_dir), resume_hdr.ttbr1_el1, resume_hdr.reenter_kernel, lm_restore_pblist, resume_hdr.__hyp_stub_vectors, virt_to_phys(zero_page)); From 95bcc3e563748f65c9fc0eb1bbc3f99b76efd4b2 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 26 Aug 2016 16:03:42 +0100 Subject: [PATCH 25/25] arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1 Changes to make the resume from cpu_suspend() code behave more like secondary boot caused debug exceptions to be unmasked early by __cpu_setup(). We then go on to restore mdscr_el1 in cpu_do_resume(), potentially taking break or watch points based on uninitialised registers. Mask debug exceptions in cpu_do_resume(), which is specific to resume from cpu_suspend(). Debug exceptions will be restored to their original state by local_dbg_restore() in cpu_suspend(), which runs after hw_breakpoint_restore() has re-initialised the other registers. Reported-by: Lorenzo Pieralisi Fixes: cabe1c81ea5b ("arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va") Cc: # 4.7+ Signed-off-by: James Morse Acked-by: Will Deacon Signed-off-by: Catalin Marinas (cherry picked from commit 744c6c37cc18705d19e179622f927f5b781fe9cc) Signed-off-by: Alex Shi --- arch/arm64/mm/proc.S | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 5bb61de23201..9d37e967fa19 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -100,7 +100,16 @@ ENTRY(cpu_do_resume) msr tcr_el1, x8 msr vbar_el1, x9 + + /* + * __cpu_setup() cleared MDSCR_EL1.MDE and friends, before unmasking + * debug exceptions. By restoring MDSCR_EL1 here, we may take a debug + * exception. Mask them until local_dbg_restore() in cpu_suspend() + * resets them. + */ + disable_dbg msr mdscr_el1, x10 + msr sctlr_el1, x12 /* * Restore oslsr_el1 by writing oslar_el1