linux-uconsole/kernel/sched
Dietmar Eggemann a08906f651 sched: EAS & 'single cpu per cluster'/cpu hotplug interoperability
For Energy-Aware Scheduling (EAS) to work properly, even in the
case that there is only one cpu per cluster or that cpus are hot-plugged
out, the Energy Model (EM) data on all energy-aware sched domains (sd)
has to be present for all online cpus.

Mainline sd hierarchy setup code will remove sd's which are not useful
for task scheduling e.g. in the following situations:

1. Only 1 cpu is/remains in one cluster of a multi cluster system.

   This remaining cpu only has DIE and no MC sd.

2. A complete cluster in a two cluster system is hot-plugged out.

   The cpus of the remaining cluster only have MC and no DIE sd.

To make sure that all online cpus keep all their energy-aware sd's,
the sd degenerate functionality has been changed to not free a sd if
its first sched group (sg) contains EM data in case:

1. There is only 1 cpu left in the sd.

2. There have to be at least 2 sg's if certain sd flags are set.

Instead of freeing such a sd it now clears only its SD_LOAD_BALANCE
flag. This will make sure that the EAS functionality will always see
all energy-aware sd's for all online cpus.

It will introduce a tiny performance degradation for operations on
affected cpus since the hot-path macro for_each_domain() has to deal
with sd's not contributing to task scheduling at all now.

In most cases the exisiting code makes sure that task scheduling is not
invoked on a sd with !SD_LOAD_BALANCE.

However, a small change is necessary in update_sd_lb_stats() to make
sure that sd->parent is only initialized to !NULL in case the parent sd
contains more than 1 sg.

The handling of newidle decay values before the SD_LOAD_BALANCE check in
rebalance_domains() stays unchanged.

Test (w/ CONFIG_SCHED_DEBUG):

JUNO r0 default system:

$ cat /proc/cpuinfo | grep "^CPU part"
CPU part        : 0xd03
CPU part        : 0xd07
CPU part        : 0xd07
CPU part        : 0xd03
CPU part        : 0xd03
CPU part        : 0xd03

SD names and flags:

$ cat /proc/sys/kernel/sched_domain/cpu*/domain*/name
MC
DIE
MC
DIE
MC
DIE
MC
DIE
MC
DIE
MC
DIE

$ printf "%x\n" `cat /proc/sys/kernel/sched_domain/cpu*/domain*/flags`
832f
102f
832f
102f
832f
102f
832f
102f
832f
102f
832f
102f

Test 1: Hotplug-out one A57 (CPU part 0xd07) cpu:

$ echo 0 > /sys/devices/system/cpu/cpu1/online

$ cat /proc/cpuinfo | grep "^CPU part"
CPU part        : 0xd03
CPU part        : 0xd07
CPU part        : 0xd03
CPU part        : 0xd03
CPU part        : 0xd03

SD names and flags for remaining A57 (cpu2) cpu:

$ cat /proc/sys/kernel/sched_domain/cpu2/domain*/name
MC
DIE

$ printf "%x\n" `cat /proc/sys/kernel/sched_domain/cpu2/domain*/flags`
832e <-- MC SD with !SD_LOAD_BALANCE
102f

Test 2: Hotplug-out the entire A57 cluster:

$ echo 0 > /sys/devices/system/cpu/cpu1/online
$ echo 0 > /sys/devices/system/cpu/cpu2/online

$ cat /proc/cpuinfo | grep "^CPU part"
CPU part        : 0xd03
CPU part        : 0xd03
CPU part        : 0xd03
CPU part        : 0xd03

SD names and flags for the remaining A53 (CPU part 0xd03) cluster:

$ cat /proc/sys/kernel/sched_domain/cpu*/domain*/name
MC
DIE
MC
DIE
MC
DIE
MC
DIE

$ printf "%x\n" `cat /proc/sys/kernel/sched_domain/cpu*/domain*/flags`
832f
102e <-- DIE SD with !SD_LOAD_BALANCE
832f
102e
832f
102e
832f
102e

Change-Id: If24aa2b2628f334abbf0207d39e2a86168d9d673
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
2017-06-21 16:37:38 +05:30
..
auto_group.c sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE() 2015-05-08 12:11:32 +02:00
auto_group.h sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE() 2015-05-08 12:11:32 +02:00
clock.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
completion.c sched/completion: Serialize completion_done() with complete() 2015-02-18 14:27:40 +01:00
core.c sched: EAS & 'single cpu per cluster'/cpu hotplug interoperability 2017-06-21 16:37:38 +05:30
cpuacct.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
cpuacct.h
cpudeadline.c sched/deadline: Unify dl_time_before() usage 2015-09-23 09:51:25 +02:00
cpudeadline.h sched/deadline: Unify dl_time_before() usage 2015-09-23 09:51:25 +02:00
cpufreq.c sched: backport cpufreq hooks from 4.9-rc4 2017-06-21 16:34:04 +05:30
cpufreq_sched.c sched/cpufreq: fix tunables for schedfreq governor 2017-06-21 16:34:04 +05:30
cpufreq_schedutil.c FROM-LIST: cpufreq: schedutil: Redefine the rate_limit_us tunable 2017-06-21 16:37:20 +05:30
cpupri.c Merge commit '3cf2f34' into sched/core, to fix build error 2014-06-12 13:46:37 +02:00
cpupri.h sched/cpupri: Remove unnecessary definitions in cpupri.h 2014-11-16 10:58:59 +01:00
cputime.c Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-09-20 15:18:54 +08:00
deadline.c sched: backport cpufreq hooks from 4.9-rc4 2017-06-21 16:34:04 +05:30
debug.c sched/fair: Provide runnable_load_avg back to cfs_rq 2015-08-03 12:24:31 +02:00
energy.c sched: Support for extracting EAS energy costs from DT 2016-09-14 14:48:50 +05:30
fair.c sched: EAS & 'single cpu per cluster'/cpu hotplug interoperability 2017-06-21 16:37:38 +05:30
features.h sched: Add Kconfig option DEFAULT_USE_ENERGY_AWARE to set ENERGY_AWARE feature flag 2016-10-12 17:34:22 +05:30
idle.c vmstat: make vmstat_updater deferrable again and shut down on idle 2016-09-14 15:02:22 +05:30
idle_task.c sched: Make sched_class::set_cpus_allowed() unconditional 2015-08-12 12:06:09 +02:00
loadavg.c sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems 2016-06-01 12:15:49 -07:00
Makefile BACKPORT: cpufreq: schedutil: New governor based on scheduler utilization data 2017-06-21 16:34:04 +05:30
rt.c sched: backport cpufreq hooks from 4.9-rc4 2017-06-21 16:34:04 +05:30
sched.h UPSTREAM: sched/fair: Propagate load during synchronous attach/detach 2017-06-21 16:37:38 +05:30
stats.c sched: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:37 -08:00
stats.h sched/stat: Simplify the sched_info accounting dependency 2015-07-04 10:04:30 +02:00
stop_task.c sched: Introduce Window Assisted Load Tracking (WALT) 2016-09-14 15:02:22 +05:30
tune.c sched: tune: Fix lacking spinlock initialization 2016-12-01 15:18:44 +05:30
tune.h sched/tune: Introducing a new schedtune attribute prefer_idle 2016-09-14 15:02:22 +05:30
wait.c sched/wait: Fix the signal handling fix 2015-12-13 14:30:59 -08:00
walt.c sched/walt: kill {min,max}_capacity 2017-01-02 13:54:08 +05:30
walt.h sched/walt: Accounting for number of irqs pending on each core 2016-09-14 15:02:22 +05:30