linux-pinenote/kernel
Rik van Riel 4142c3ebb6 sched/numa: Spread memory according to CPU and memory use
The pseudo-interleaving in NUMA placement has a fundamental problem:
using hard usage thresholds to spread memory equally between nodes
can prevent workloads from converging, or keep memory "trapped" on
nodes where the workload is barely running any more.

In order for workloads to properly converge, the memory migration
should not be stopped when nodes reach parity, but instead be
distributed according to how heavily memory is used from each node.
This way memory migration and task migration reinforce each other,
instead of one putting the brakes on the other.

Remove the hard thresholds from the pseudo-interleaving code, and
instead use a more gradual policy on memory placement. This also
seems to improve convergence of workloads that do not run flat out,
but sleep in between bursts of activity.

We still want to slow down NUMA scanning and migration once a workload
has settled on a few actively used nodes, so keep the 3/4 hysteresis
in place. Keep track of whether a workload is actively running on
multiple nodes, so task_numa_migrate does a full scan of the system
for better task placement.

In the case of running 3 SPECjbb2005 instances on a 4 node system,
this code seems to result in fairer distribution of memory between
nodes, with more memory bandwidth for each instance.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mgorman@suse.de
Link: http://lkml.kernel.org/r/20160125170739.2fc9a641@annuminas.surriel.com
[ Minor readability tweaks. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 14:47:18 +01:00
..
bpf perf/bpf: Convert perf_event_array to use struct file 2016-01-29 08:35:25 +01:00
configs kconfig: add xenconfig defconfig helper 2015-06-16 11:04:29 +01:00
debug module: use a structure to encapsulate layout. 2015-12-04 22:46:25 +01:00
events Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:38:27 -08:00
gcov gcov: use within_module() helper. 2015-12-04 22:46:25 +01:00
irq Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 14:48:58 -08:00
livepatch livepatch: Cleanup module page permission changes 2015-12-04 22:51:07 +01:00
locking rtmutex: Make wait_lock irq safe 2016-01-26 11:08:35 +01:00
power PM: APM_EMULATION does not depend on PM 2016-01-27 23:20:14 +01:00
printk kernel: printk: specify alignment for struct printk_log 2016-01-20 17:09:18 -08:00
rcu Merge branches 'doc.2015.12.05a', 'exp.2015.12.07a', 'fixes.2015.12.07a', 'list.2015.12.04b' and 'torture.2015.12.05a' into HEAD 2015-12-07 17:02:54 -08:00
sched sched/numa: Spread memory according to CPU and memory use 2016-02-09 14:47:18 +01:00
time Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:49:06 -08:00
trace A cleanup to the stack tracer broke stack tracing on s390. 2016-02-03 09:31:34 -08:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c
async.c async: export current_is_async() 2015-11-19 17:51:48 +01:00
audit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-01-17 19:13:15 -08:00
audit.h security: Make inode argument of inode_getsecid non-const 2015-12-24 11:09:39 -05:00
audit_fsnotify.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
auditfilter.c audit: fix comment block whitespace 2015-11-04 08:23:51 -05:00
auditsc.c security: Make inode argument of inode_getsecid non-const 2015-12-24 11:09:39 -05:00
backtracetest.c
bounds.c
capability.c
cgroup.c Merge branch 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-01-12 19:20:32 -08:00
cgroup_freezer.c cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends 2015-12-03 10:24:08 -05:00
cgroup_pids.c cgroup_pids: fix a typo. 2015-12-14 14:54:37 -05:00
compat.c compat: cleanup coding in compat_get_bitmap() and compat_put_bitmap() 2015-06-04 23:57:18 +02:00
configs.c
context_tracking.c context_tracking: Switch to new static_branch API 2015-11-24 09:56:43 +01:00
cpu.c kernel/cpu.c: make set_cpu_* static inlines 2016-01-20 17:09:18 -08:00
cpu_pm.c kernel/cpu_pm: fix cpu_cluster_pm_exit comment 2015-09-03 02:42:20 +02:00
cpuset.c Merge branch 'for-4.4-fixes' into for-4.5 2015-12-03 10:22:52 -05:00
crash_dump.c
cred.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
delayacct.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
dma.c
elfcore.c
exec_domain.c
exit.c exit: remove unneeded declaration of exit_mm() 2016-01-20 17:09:18 -08:00
extable.c kernel/extable.c: remove duplicated include 2015-09-10 13:29:01 -07:00
fork.c mm: rework virtual memory accounting 2016-01-14 16:00:49 -08:00
freezer.c
futex.c rtmutex: Make wait_lock irq safe 2016-01-26 11:08:35 +01:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
groups.c
hung_task.c
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
kallsyms.c
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c kexec: set KEXEC_TYPE_CRASH before sanity_check_segment_list() 2016-01-20 17:09:18 -08:00
kexec_core.c kernel/kexec_core.c: use list_for_each_entry_safe in kimage_free_page_list 2016-01-20 17:09:18 -08:00
kexec_file.c kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kexec_internal.h kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c perf/x86/hw_breakpoints: Disallow kernel breakpoints unless kprobe-safe 2015-08-04 10:16:54 +02:00
ksysfs.c rcu: Remove TINY_RCU bloat from pointless boot parameters 2015-12-07 16:59:37 -08:00
kthread.c kernel/kthread.c:kthread_create_on_node(): clarify documentation 2015-09-04 16:54:41 -07:00
latencytop.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
Makefile sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
membarrier.c sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
memremap.c phys_to_pfn_t: use phys_addr_t 2016-01-31 09:10:19 -08:00
module-internal.h
module.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching 2016-01-14 16:38:02 -08:00
module_signing.c KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c
padata.c
panic.c printk: do cond_resched() between lines while outputting to consoles 2016-01-16 11:17:25 -08:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:44:04 -08:00
pid_namespace.c
profile.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
ptrace.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
range.c
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relay.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
resource.c restrict /dev/mem to idle io memory ranges 2016-01-09 06:30:49 -08:00
seccomp.c seccomp: always propagate NO_NEW_PRIVS on tsync 2016-01-27 07:38:25 -08:00
signal.c kernel/signal.c: unexport sigsuspend() 2015-11-20 16:17:32 -08:00
smp.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
smpboot.c stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() 2015-10-20 10:23:55 +02:00
smpboot.h
softirq.c
stacktrace.c
stop_machine.c kernel/stop_machine.c: remove CONFIG_SMP dependencies 2016-01-16 11:17:24 -08:00
sys.c prctl: take mmap sem for writing to protect against others 2016-01-20 17:09:18 -08:00
sys_ni.c vfs: add copy_file_range syscall and vfs helper 2015-12-01 14:00:53 -05:00
sysctl.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
sysctl_binary.c
task_work.c task_work: remove fifo ordering guarantee 2015-09-05 13:46:58 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Consolidate cond_resched_rcu_qs() into stutter_wait() 2015-10-06 11:25:01 -07:00
tracepoint.c tracepoint: Give priority to probes of tracepoints 2015-10-25 21:33:54 -04:00
tsacct.c
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c kernel/*: switch to memdup_user_nul() 2016-01-04 10:27:55 -05:00
utsname.c
utsname_sysctl.c
watchdog.c Merge branch 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-01-11 18:53:13 -08:00
workqueue.c workqueue: simplify the apply_workqueue_attrs_locked() 2016-01-07 11:04:34 -05:00
workqueue_internal.h