This has been run through Intel's LKP tests across a wide range
of modern sytems and workloads and it wasn't shown to make a
measurable performance difference positive or negative.
Now that we have some shiny new tracepoints, we can actually
figure out what the heck is going on.
During a kernel compile, 60% of the flush_tlb_mm_range() calls
are for a single page. It breaks down like this:
size percent percent<=
V V V
GLOBAL: 2.20% 2.20% avg cycles: 2283
1: 56.92% 59.12% avg cycles: 1276
2: 13.78% 72.90% avg cycles: 1505
3: 8.26% 81.16% avg cycles: 1880
4: 7.41% 88.58% avg cycles: 2447
5: 1.73% 90.31% avg cycles: 2358
6: 1.32% 91.63% avg cycles: 2563
7: 1.14% 92.77% avg cycles: 2862
8: 0.62% 93.39% avg cycles: 3542
9: 0.08% 93.47% avg cycles: 3289
10: 0.43% 93.90% avg cycles: 3570
11: 0.20% 94.10% avg cycles: 3767
12: 0.08% 94.18% avg cycles: 3996
13: 0.03% 94.20% avg cycles: 4077
14: 0.02% 94.23% avg cycles: 4836
15: 0.04% 94.26% avg cycles: 5699
16: 0.06% 94.32% avg cycles: 5041
17: 0.57% 94.89% avg cycles: 5473
18: 0.02% 94.91% avg cycles: 5396
19: 0.03% 94.95% avg cycles: 5296
20: 0.02% 94.96% avg cycles: 6749
21: 0.18% 95.14% avg cycles: 6225
22: 0.01% 95.15% avg cycles: 6393
23: 0.01% 95.16% avg cycles: 6861
24: 0.12% 95.28% avg cycles: 6912
25: 0.05% 95.32% avg cycles: 7190
26: 0.01% 95.33% avg cycles: 7793
27: 0.01% 95.34% avg cycles: 7833
28: 0.01% 95.35% avg cycles: 8253
29: 0.08% 95.42% avg cycles: 8024
30: 0.03% 95.45% avg cycles: 9670
31: 0.01% 95.46% avg cycles: 8949
32: 0.01% 95.46% avg cycles: 9350
33: 3.11% 98.57% avg cycles: 8534
34: 0.02% 98.60% avg cycles: 10977
35: 0.02% 98.62% avg cycles: 11400
We get in to dimishing returns pretty quickly. On pre-IvyBridge
CPUs, we used to set the limit at 8 pages, and it was set at 128
on IvyBrige. That 128 number looks pretty silly considering that
less than 0.5% of the flushes are that large.
The previous code tried to size this number based on the size of
the TLB. Good idea, but it's error-prone, needs maintenance
(which it didn't get up to now), and probably would not matter in
practice much.
Settting it to 33 means that we cover the mallopt
M_TRIM_THRESHOLD, which is the most universally common size to do
flushes.
That's the short version. Here's the long one for why I chose 33:
1. These numbers have a constant bias in the timestamps from the
tracing. Probably counts for a couple hundred cycles in each of
these tests, but it should be fairly _even_ across all of them.
The smallest delta between the tracepoints I have ever seen is
335 cycles. This is one reason the cycles/page cost goes down in
general as the flushes get larger. The true cost is nearer to
100 cycles.
2. A full flush is more expensive than a single invlpg, but not
by much (single percentages).
3. A dtlb miss is 17.1ns (~45 cycles) and a itlb miss is 13.0ns
(~34 cycles). At those rates, refilling the 512-entry dTLB takes
22,000 cycles.
4. 22,000 cycles is approximately the equivalent of doing 85
invlpg operations. But, the odds are that the TLB can
actually be filled up faster than that because TLB misses that
are close in time also tend to leverage the same caches.
6. ~98% of flushes are <=33 pages. There are a lot of flushes of
33 pages, probably because libc's M_TRIM_THRESHOLD is set to
128k (32 pages)
7. I've found no consistent data to support changing the IvyBridge
vs. SandyBridge tunable by a factor of 16
I used the performance counters on this hardware (IvyBridge i5-3320M)
to figure out the tlb miss costs:
ocperf.py stat -e dtlb_load_misses.walk_duration,dtlb_load_misses.walk_completed,dtlb_store_misses.walk_duration,dtlb_store_misses.walk_completed,itlb_misses.walk_duration,itlb_misses.walk_completed,itlb.itlb_flush
7,720,030,970 dtlb_load_misses_walk_duration [57.13%]
169,856,353 dtlb_load_misses_walk_completed [57.15%]
708,832,859 dtlb_store_misses_walk_duration [57.17%]
19,346,823 dtlb_store_misses_walk_completed [57.17%]
2,779,687,402 itlb_misses_walk_duration [57.15%]
82,241,148 itlb_misses_walk_completed [57.13%]
770,717 itlb_itlb_flush [57.11%]
Show that a dtlb miss is 17.1ns (~45 cycles) and a itlb miss is 13.0ns
(~34 cycles). At those rates, refilling the 512-entry dTLB takes
22,000 cycles. On a SandyBridge system with more cores and larger
caches, those are dtlb=13.4ns and itlb=9.5ns.
cat perf.stat.txt | perl -pe 's/,//g'
| awk '/itlb_misses_walk_duration/ { icyc+=$1 }
/itlb_misses_walk_completed/ { imiss+=$1 }
/dtlb_.*_walk_duration/ { dcyc+=$1 }
/dtlb_.*.*completed/ { dmiss+=$1 }
END {print "itlb cyc/miss: ", icyc/imiss, " dtlb cyc/miss: ", dcyc/dmiss, " ----- ", icyc,imiss, dcyc,dmiss }
On Westmere CPUs, the counters to use are: itlb_flush,itlb_misses.walk_cycles,itlb_misses.any,dtlb_misses.walk_cycles,dtlb_misses.any
The assumptions that this code went in under:
https://lkml.org/lkml/2012/6/12/119 say that a flush and a refill are
about 100ns. Being generous, that is over by a factor of 6 on the
refill side, although it is fairly close on the cost of an invlpg.
An increase of a single invlpg operation seems to lengthen the flush
range operation by about 200 cycles. Here is one example of the data
collected for flushing 10 and 11 pages (full data are below):
10: 0.43% 93.90% avg cycles: 3570 cycles/page: 357 samples: 4714
11: 0.20% 94.10% avg cycles: 3767 cycles/page: 342 samples: 2145
How to generate this table:
echo 10000 > /sys/kernel/debug/tracing/buffer_size_kb
echo x86-tsc > /sys/kernel/debug/tracing/trace_clock
echo 'reason != 0' > /sys/kernel/debug/tracing/events/tlb/tlb_flush/filter
echo 1 > /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
Pipe the trace output in to this script:
http://sr71.net/~dave/intel/201402-tlb/trace-time-diff-process.pl.txt
Note that these data were gathered with the invlpg threshold set to
150 pages. Only data points with >=50 of samples were printed:
Flush % of %<=
in flush this
pages es size
------------------------------------------------------------------------------
-1: 2.20% 2.20% avg cycles: 2283 cycles/page: xxxx samples: 23960
1: 56.92% 59.12% avg cycles: 1276 cycles/page: 1276 samples: 620895
2: 13.78% 72.90% avg cycles: 1505 cycles/page: 752 samples: 150335
3: 8.26% 81.16% avg cycles: 1880 cycles/page: 626 samples: 90131
4: 7.41% 88.58% avg cycles: 2447 cycles/page: 611 samples: 80877
5: 1.73% 90.31% avg cycles: 2358 cycles/page: 471 samples: 18885
6: 1.32% 91.63% avg cycles: 2563 cycles/page: 427 samples: 14397
7: 1.14% 92.77% avg cycles: 2862 cycles/page: 408 samples: 12441
8: 0.62% 93.39% avg cycles: 3542 cycles/page: 442 samples: 6721
9: 0.08% 93.47% avg cycles: 3289 cycles/page: 365 samples: 917
10: 0.43% 93.90% avg cycles: 3570 cycles/page: 357 samples: 4714
11: 0.20% 94.10% avg cycles: 3767 cycles/page: 342 samples: 2145
12: 0.08% 94.18% avg cycles: 3996 cycles/page: 333 samples: 864
13: 0.03% 94.20% avg cycles: 4077 cycles/page: 313 samples: 289
14: 0.02% 94.23% avg cycles: 4836 cycles/page: 345 samples: 236
15: 0.04% 94.26% avg cycles: 5699 cycles/page: 379 samples: 390
16: 0.06% 94.32% avg cycles: 5041 cycles/page: 315 samples: 643
17: 0.57% 94.89% avg cycles: 5473 cycles/page: 321 samples: 6229
18: 0.02% 94.91% avg cycles: 5396 cycles/page: 299 samples: 224
19: 0.03% 94.95% avg cycles: 5296 cycles/page: 278 samples: 367
20: 0.02% 94.96% avg cycles: 6749 cycles/page: 337 samples: 185
21: 0.18% 95.14% avg cycles: 6225 cycles/page: 296 samples: 1964
22: 0.01% 95.15% avg cycles: 6393 cycles/page: 290 samples: 83
23: 0.01% 95.16% avg cycles: 6861 cycles/page: 298 samples: 61
24: 0.12% 95.28% avg cycles: 6912 cycles/page: 288 samples: 1307
25: 0.05% 95.32% avg cycles: 7190 cycles/page: 287 samples: 533
26: 0.01% 95.33% avg cycles: 7793 cycles/page: 299 samples: 94
27: 0.01% 95.34% avg cycles: 7833 cycles/page: 290 samples: 66
28: 0.01% 95.35% avg cycles: 8253 cycles/page: 294 samples: 73
29: 0.08% 95.42% avg cycles: 8024 cycles/page: 276 samples: 846
30: 0.03% 95.45% avg cycles: 9670 cycles/page: 322 samples: 296
31: 0.01% 95.46% avg cycles: 8949 cycles/page: 288 samples: 79
32: 0.01% 95.46% avg cycles: 9350 cycles/page: 292 samples: 60
33: 3.11% 98.57% avg cycles: 8534 cycles/page: 258 samples: 33936
34: 0.02% 98.60% avg cycles: 10977 cycles/page: 322 samples: 268
35: 0.02% 98.62% avg cycles: 11400 cycles/page: 325 samples: 177
36: 0.01% 98.63% avg cycles: 11504 cycles/page: 319 samples: 161
37: 0.02% 98.65% avg cycles: 11596 cycles/page: 313 samples: 182
38: 0.02% 98.66% avg cycles: 11850 cycles/page: 311 samples: 195
39: 0.01% 98.68% avg cycles: 12158 cycles/page: 311 samples: 128
40: 0.01% 98.68% avg cycles: 11626 cycles/page: 290 samples: 78
41: 0.04% 98.73% avg cycles: 11435 cycles/page: 278 samples: 477
42: 0.01% 98.73% avg cycles: 12571 cycles/page: 299 samples: 74
43: 0.01% 98.74% avg cycles: 12562 cycles/page: 292 samples: 78
44: 0.01% 98.75% avg cycles: 12991 cycles/page: 295 samples: 108
45: 0.01% 98.76% avg cycles: 13169 cycles/page: 292 samples: 78
46: 0.02% 98.78% avg cycles: 12891 cycles/page: 280 samples: 261
47: 0.01% 98.79% avg cycles: 13099 cycles/page: 278 samples: 67
48: 0.01% 98.80% avg cycles: 13851 cycles/page: 288 samples: 77
49: 0.01% 98.80% avg cycles: 13749 cycles/page: 280 samples: 66
50: 0.01% 98.81% avg cycles: 13949 cycles/page: 278 samples: 73
52: 0.00% 98.82% avg cycles: 14243 cycles/page: 273 samples: 52
54: 0.01% 98.83% avg cycles: 15312 cycles/page: 283 samples: 87
55: 0.01% 98.84% avg cycles: 15197 cycles/page: 276 samples: 109
56: 0.02% 98.86% avg cycles: 15234 cycles/page: 272 samples: 208
57: 0.00% 98.86% avg cycles: 14888 cycles/page: 261 samples: 53
58: 0.01% 98.87% avg cycles: 15037 cycles/page: 259 samples: 59
59: 0.01% 98.87% avg cycles: 15752 cycles/page: 266 samples: 63
62: 0.00% 98.89% avg cycles: 16222 cycles/page: 261 samples: 54
64: 0.02% 98.91% avg cycles: 17179 cycles/page: 268 samples: 248
65: 0.12% 99.03% avg cycles: 18762 cycles/page: 288 samples: 1324
85: 0.00% 99.10% avg cycles: 21649 cycles/page: 254 samples: 50
127: 0.01% 99.18% avg cycles: 32397 cycles/page: 255 samples: 75
128: 0.13% 99.31% avg cycles: 31711 cycles/page: 247 samples: 1466
129: 0.18% 99.49% avg cycles: 33017 cycles/page: 255 samples: 1927
181: 0.33% 99.84% avg cycles: 2489 cycles/page: 13 samples: 3547
256: 0.05% 99.91% avg cycles: 2305 cycles/page: 9 samples: 550
512: 0.03% 99.95% avg cycles: 2133 cycles/page: 4 samples: 304
1512: 0.01% 99.99% avg cycles: 3038 cycles/page: 2 samples: 65
Here are the tlb counters during a 10-second slice of a kernel compile
for a SandyBridge system. It's better than IvyBridge, but probably
due to the larger caches since this was one of the 'X' extreme parts.
10,873,007,282 dtlb_load_misses_walk_duration
250,711,333 dtlb_load_misses_walk_completed
1,212,395,865 dtlb_store_misses_walk_duration
31,615,772 dtlb_store_misses_walk_completed
5,091,010,274 itlb_misses_walk_duration
163,193,511 itlb_misses_walk_completed
1,321,980 itlb_itlb_flush
10.008045158 seconds time elapsed
# cat perf.stat.1392743721.txt | perl -pe 's/,//g' | awk '/itlb_misses_walk_duration/ { icyc+=$1 } /itlb_misses_walk_completed/ { imiss+=$1 } /dtlb_.*_walk_duration/ { dcyc+=$1 } /dtlb_.*.*completed/ { dmiss+=$1 } END {print "itlb cyc/miss: ", icyc/imiss/3.3, " dtlb cyc/miss: ", dcyc/dmiss/3.3, " ----- ", icyc,imiss, dcyc,dmiss }'
itlb ns/miss: 9.45338 dtlb ns/miss: 12.9716
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154103.10C1115E@viggo.jf.intel.com
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
322 lines
8.6 KiB
C
322 lines
8.6 KiB
C
#include <linux/init.h>
|
|
|
|
#include <linux/mm.h>
|
|
#include <linux/spinlock.h>
|
|
#include <linux/smp.h>
|
|
#include <linux/interrupt.h>
|
|
#include <linux/module.h>
|
|
#include <linux/cpu.h>
|
|
|
|
#include <asm/tlbflush.h>
|
|
#include <asm/mmu_context.h>
|
|
#include <asm/cache.h>
|
|
#include <asm/apic.h>
|
|
#include <asm/uv/uv.h>
|
|
#include <linux/debugfs.h>
|
|
|
|
DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate)
|
|
= { &init_mm, 0, };
|
|
|
|
/*
|
|
* Smarter SMP flushing macros.
|
|
* c/o Linus Torvalds.
|
|
*
|
|
* These mean you can really definitely utterly forget about
|
|
* writing to user space from interrupts. (Its not allowed anyway).
|
|
*
|
|
* Optimizations Manfred Spraul <manfred@colorfullife.com>
|
|
*
|
|
* More scalable flush, from Andi Kleen
|
|
*
|
|
* Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi
|
|
*/
|
|
|
|
struct flush_tlb_info {
|
|
struct mm_struct *flush_mm;
|
|
unsigned long flush_start;
|
|
unsigned long flush_end;
|
|
};
|
|
|
|
/*
|
|
* We cannot call mmdrop() because we are in interrupt context,
|
|
* instead update mm->cpu_vm_mask.
|
|
*/
|
|
void leave_mm(int cpu)
|
|
{
|
|
struct mm_struct *active_mm = this_cpu_read(cpu_tlbstate.active_mm);
|
|
if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
|
|
BUG();
|
|
if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
|
|
cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
|
|
load_cr3(swapper_pg_dir);
|
|
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
|
|
}
|
|
}
|
|
EXPORT_SYMBOL_GPL(leave_mm);
|
|
|
|
/*
|
|
* The flush IPI assumes that a thread switch happens in this order:
|
|
* [cpu0: the cpu that switches]
|
|
* 1) switch_mm() either 1a) or 1b)
|
|
* 1a) thread switch to a different mm
|
|
* 1a1) set cpu_tlbstate to TLBSTATE_OK
|
|
* Now the tlb flush NMI handler flush_tlb_func won't call leave_mm
|
|
* if cpu0 was in lazy tlb mode.
|
|
* 1a2) update cpu active_mm
|
|
* Now cpu0 accepts tlb flushes for the new mm.
|
|
* 1a3) cpu_set(cpu, new_mm->cpu_vm_mask);
|
|
* Now the other cpus will send tlb flush ipis.
|
|
* 1a4) change cr3.
|
|
* 1a5) cpu_clear(cpu, old_mm->cpu_vm_mask);
|
|
* Stop ipi delivery for the old mm. This is not synchronized with
|
|
* the other cpus, but flush_tlb_func ignore flush ipis for the wrong
|
|
* mm, and in the worst case we perform a superfluous tlb flush.
|
|
* 1b) thread switch without mm change
|
|
* cpu active_mm is correct, cpu0 already handles flush ipis.
|
|
* 1b1) set cpu_tlbstate to TLBSTATE_OK
|
|
* 1b2) test_and_set the cpu bit in cpu_vm_mask.
|
|
* Atomically set the bit [other cpus will start sending flush ipis],
|
|
* and test the bit.
|
|
* 1b3) if the bit was 0: leave_mm was called, flush the tlb.
|
|
* 2) switch %%esp, ie current
|
|
*
|
|
* The interrupt must handle 2 special cases:
|
|
* - cr3 is changed before %%esp, ie. it cannot use current->{active_,}mm.
|
|
* - the cpu performs speculative tlb reads, i.e. even if the cpu only
|
|
* runs in kernel space, the cpu could load tlb entries for user space
|
|
* pages.
|
|
*
|
|
* The good news is that cpu_tlbstate is local to each cpu, no
|
|
* write/read ordering problems.
|
|
*/
|
|
|
|
/*
|
|
* TLB flush funcation:
|
|
* 1) Flush the tlb entries if the cpu uses the mm that's being flushed.
|
|
* 2) Leave the mm if we are in the lazy tlb mode.
|
|
*/
|
|
static void flush_tlb_func(void *info)
|
|
{
|
|
struct flush_tlb_info *f = info;
|
|
|
|
inc_irq_stat(irq_tlb_count);
|
|
|
|
if (f->flush_mm != this_cpu_read(cpu_tlbstate.active_mm))
|
|
return;
|
|
if (!f->flush_end)
|
|
f->flush_end = f->flush_start + PAGE_SIZE;
|
|
|
|
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
|
|
if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
|
|
if (f->flush_end == TLB_FLUSH_ALL) {
|
|
local_flush_tlb();
|
|
trace_tlb_flush(TLB_REMOTE_SHOOTDOWN, TLB_FLUSH_ALL);
|
|
} else {
|
|
unsigned long addr;
|
|
unsigned long nr_pages =
|
|
f->flush_end - f->flush_start / PAGE_SIZE;
|
|
addr = f->flush_start;
|
|
while (addr < f->flush_end) {
|
|
__flush_tlb_single(addr);
|
|
addr += PAGE_SIZE;
|
|
}
|
|
trace_tlb_flush(TLB_REMOTE_SHOOTDOWN, nr_pages);
|
|
}
|
|
} else
|
|
leave_mm(smp_processor_id());
|
|
|
|
}
|
|
|
|
void native_flush_tlb_others(const struct cpumask *cpumask,
|
|
struct mm_struct *mm, unsigned long start,
|
|
unsigned long end)
|
|
{
|
|
struct flush_tlb_info info;
|
|
info.flush_mm = mm;
|
|
info.flush_start = start;
|
|
info.flush_end = end;
|
|
|
|
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
|
|
if (is_uv_system()) {
|
|
unsigned int cpu;
|
|
|
|
cpu = smp_processor_id();
|
|
cpumask = uv_flush_tlb_others(cpumask, mm, start, end, cpu);
|
|
if (cpumask)
|
|
smp_call_function_many(cpumask, flush_tlb_func,
|
|
&info, 1);
|
|
return;
|
|
}
|
|
smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
|
|
}
|
|
|
|
void flush_tlb_current_task(void)
|
|
{
|
|
struct mm_struct *mm = current->mm;
|
|
|
|
preempt_disable();
|
|
|
|
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
|
|
local_flush_tlb();
|
|
trace_tlb_flush(TLB_LOCAL_SHOOTDOWN, TLB_FLUSH_ALL);
|
|
if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
|
|
flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
|
|
preempt_enable();
|
|
}
|
|
|
|
/*
|
|
* See Documentation/x86/tlb.txt for details. We choose 33
|
|
* because it is large enough to cover the vast majority (at
|
|
* least 95%) of allocations, and is small enough that we are
|
|
* confident it will not cause too much overhead. Each single
|
|
* flush is about 100 ns, so this caps the maximum overhead at
|
|
* _about_ 3,000 ns.
|
|
*
|
|
* This is in units of pages.
|
|
*/
|
|
unsigned long tlb_single_page_flush_ceiling = 33;
|
|
|
|
void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
|
|
unsigned long end, unsigned long vmflag)
|
|
{
|
|
unsigned long addr;
|
|
/* do a global flush by default */
|
|
unsigned long base_pages_to_flush = TLB_FLUSH_ALL;
|
|
|
|
preempt_disable();
|
|
if (current->active_mm != mm)
|
|
goto out;
|
|
|
|
if (!current->mm) {
|
|
leave_mm(smp_processor_id());
|
|
goto out;
|
|
}
|
|
|
|
if ((end != TLB_FLUSH_ALL) && !(vmflag & VM_HUGETLB))
|
|
base_pages_to_flush = (end - start) >> PAGE_SHIFT;
|
|
|
|
if (base_pages_to_flush > tlb_single_page_flush_ceiling) {
|
|
base_pages_to_flush = TLB_FLUSH_ALL;
|
|
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
|
|
local_flush_tlb();
|
|
} else {
|
|
/* flush range by one by one 'invlpg' */
|
|
for (addr = start; addr < end; addr += PAGE_SIZE) {
|
|
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ONE);
|
|
__flush_tlb_single(addr);
|
|
}
|
|
}
|
|
trace_tlb_flush(TLB_LOCAL_MM_SHOOTDOWN, base_pages_to_flush);
|
|
out:
|
|
if (base_pages_to_flush == TLB_FLUSH_ALL) {
|
|
start = 0UL;
|
|
end = TLB_FLUSH_ALL;
|
|
}
|
|
if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
|
|
flush_tlb_others(mm_cpumask(mm), mm, start, end);
|
|
preempt_enable();
|
|
}
|
|
|
|
void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
|
|
{
|
|
struct mm_struct *mm = vma->vm_mm;
|
|
|
|
preempt_disable();
|
|
|
|
if (current->active_mm == mm) {
|
|
if (current->mm)
|
|
__flush_tlb_one(start);
|
|
else
|
|
leave_mm(smp_processor_id());
|
|
}
|
|
|
|
if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
|
|
flush_tlb_others(mm_cpumask(mm), mm, start, 0UL);
|
|
|
|
preempt_enable();
|
|
}
|
|
|
|
static void do_flush_tlb_all(void *info)
|
|
{
|
|
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
|
|
__flush_tlb_all();
|
|
if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_LAZY)
|
|
leave_mm(smp_processor_id());
|
|
}
|
|
|
|
void flush_tlb_all(void)
|
|
{
|
|
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
|
|
on_each_cpu(do_flush_tlb_all, NULL, 1);
|
|
}
|
|
|
|
static void do_kernel_range_flush(void *info)
|
|
{
|
|
struct flush_tlb_info *f = info;
|
|
unsigned long addr;
|
|
|
|
/* flush range by one by one 'invlpg' */
|
|
for (addr = f->flush_start; addr < f->flush_end; addr += PAGE_SIZE)
|
|
__flush_tlb_single(addr);
|
|
}
|
|
|
|
void flush_tlb_kernel_range(unsigned long start, unsigned long end)
|
|
{
|
|
|
|
/* Balance as user space task's flush, a bit conservative */
|
|
if (end == TLB_FLUSH_ALL ||
|
|
(end - start) > tlb_single_page_flush_ceiling * PAGE_SIZE) {
|
|
on_each_cpu(do_flush_tlb_all, NULL, 1);
|
|
} else {
|
|
struct flush_tlb_info info;
|
|
info.flush_start = start;
|
|
info.flush_end = end;
|
|
on_each_cpu(do_kernel_range_flush, &info, 1);
|
|
}
|
|
}
|
|
|
|
static ssize_t tlbflush_read_file(struct file *file, char __user *user_buf,
|
|
size_t count, loff_t *ppos)
|
|
{
|
|
char buf[32];
|
|
unsigned int len;
|
|
|
|
len = sprintf(buf, "%ld\n", tlb_single_page_flush_ceiling);
|
|
return simple_read_from_buffer(user_buf, count, ppos, buf, len);
|
|
}
|
|
|
|
static ssize_t tlbflush_write_file(struct file *file,
|
|
const char __user *user_buf, size_t count, loff_t *ppos)
|
|
{
|
|
char buf[32];
|
|
ssize_t len;
|
|
int ceiling;
|
|
|
|
len = min(count, sizeof(buf) - 1);
|
|
if (copy_from_user(buf, user_buf, len))
|
|
return -EFAULT;
|
|
|
|
buf[len] = '\0';
|
|
if (kstrtoint(buf, 0, &ceiling))
|
|
return -EINVAL;
|
|
|
|
if (ceiling < 0)
|
|
return -EINVAL;
|
|
|
|
tlb_single_page_flush_ceiling = ceiling;
|
|
return count;
|
|
}
|
|
|
|
static const struct file_operations fops_tlbflush = {
|
|
.read = tlbflush_read_file,
|
|
.write = tlbflush_write_file,
|
|
.llseek = default_llseek,
|
|
};
|
|
|
|
static int __init create_tlb_single_page_flush_ceiling(void)
|
|
{
|
|
debugfs_create_file("tlb_single_page_flush_ceiling", S_IRUSR | S_IWUSR,
|
|
arch_debugfs_dir, NULL, &fops_tlbflush);
|
|
return 0;
|
|
}
|
|
late_initcall(create_tlb_single_page_flush_ceiling);
|