linux-uconsole/kernel/sched
Qais Yousef 13685c4a08 sched/uclamp: Add a new sysctl to control RT default boost value
RT tasks by default run at the highest capacity/performance level. When
uclamp is selected this default behavior is retained by enforcing the
requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be
uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
value.

This is also referred to as 'the default boost value of RT tasks'.

See commit 1a00d99997 ("sched/uclamp: Set default clamps for RT tasks").

On battery powered devices, it is desired to control this default
(currently hardcoded) behavior at runtime to reduce energy consumed by
RT tasks.

For example, a mobile device manufacturer where big.LITTLE architecture
is dominant, the performance of the little cores varies across SoCs, and
on high end ones the big cores could be too power hungry.

Given the diversity of SoCs, the new knob allows manufactures to tune
the best performance/power for RT tasks for the particular hardware they
run on.

They could opt to further tune the value when the user selects
a different power saving mode or when the device is actively charging.

The runtime aspect of it further helps in creating a single kernel image
that can be run on multiple devices that require different tuning.

Keep in mind that a lot of RT tasks in the system are created by the
kernel. On Android for instance I can see over 50 RT tasks, only
a handful of which created by the Android framework.

To control the default behavior globally by system admins and device
integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default
to change the default boost value of the RT tasks.

I anticipate this to be mostly in the form of modifying the init script
of a particular device.

To avoid polluting the fast path with unnecessary code, the approach
taken is to synchronously do the update by traversing all the existing
tasks in the system. This could race with a concurrent fork(), which is
dealt with by introducing sched_post_fork() function which will ensure
the racy fork will get the right update applied.

Tested on Juno-r2 in combination with the RT capacity awareness [1].
By default an RT task will go to the highest capacity CPU and run at the
maximum frequency, which is particularly energy inefficient on high end
mobile devices because the biggest core[s] are 'huge' and power hungry.

With this patch the RT task can be controlled to run anywhere by
default, and doesn't cause the frequency to be maximum all the time.
Yet any task that really needs to be boosted can easily escape this
default behavior by modifying its requested uclamp.min value
(p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall.

[1] 804d402fb6: ("sched/rt: Make RT capacity-aware")

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200716110347.19553-2-qais.yousef@arm.com
2020-07-29 13:51:47 +02:00
..
autogroup.c sched/autogroup: Make autogroup_path() always available 2019-06-24 19:23:40 +02:00
autogroup.h sched/headers: Simplify and clean up header usage in the scheduler 2018-03-04 12:39:29 +01:00
clock.c sched/clock: Use static_branch_likely() with sched_clock_running 2019-11-29 08:10:54 +01:00
completion.c completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() 2020-03-23 18:40:25 +01:00
core.c sched/uclamp: Add a new sysctl to control RT default boost value 2020-07-29 13:51:47 +02:00
cpuacct.c sched/cpuacct: Fix charge cpuacct.usage_sys 2020-05-19 20:34:14 +02:00
cpudeadline.c sched/deadline: Implement fallback mechanism for !fit case 2020-06-15 14:10:05 +02:00
cpudeadline.h sched/headers: Simplify and clean up header usage in the scheduler 2018-03-04 12:39:29 +01:00
cpufreq.c cpufreq: Avoid leaving stale IRQ work items during CPU offline 2019-12-12 17:59:43 +01:00
cpufreq_schedutil.c sched/uclamp: Protect uclamp fast path code with static key 2020-07-08 11:39:01 +02:00
cpupri.c sched/rt: cpupri_find: Trigger a full search as fallback 2020-03-20 13:06:20 +01:00
cpupri.h sched/rt: Optimize cpupri_find() on non-heterogenous systems 2020-03-06 12:57:27 +01:00
cputime.c sched/cputime: Improve cputime_adjust() 2020-06-15 14:10:00 +02:00
deadline.c Merge branch 'sched/urgent' 2020-07-08 11:38:59 +02:00
debug.c sched: Add rq::ttwu_pending 2020-05-28 10:54:16 +02:00
fair.c sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal 2020-07-22 10:22:04 +02:00
features.h sched/fair/util_est: Implement faster ramp-up EWMA on utilization increases 2019-10-29 10:01:07 +01:00
idle.c Merge branch 'sched/urgent' 2020-07-08 11:38:59 +02:00
isolation.c isolcpus: Affine unbound kernel threads to housekeeping cpus 2020-06-15 14:10:03 +02:00
loadavg.c sched: nohz: stop passing around unused "ticks" parameter. 2020-07-22 10:22:04 +02:00
Makefile kcsan: Improve various small stylistic details 2019-11-20 10:47:23 +01:00
membarrier.c membarrier: Fix RCU locking bug caused by faulty merge 2019-10-01 21:27:50 +02:00
pelt.c sched: Add a tracepoint to track rq->nr_running 2020-07-08 11:39:02 +02:00
pelt.h sched/pelt: Cleanup PELT divider 2020-06-15 14:10:06 +02:00
psi.c psi: eliminate kthread_worker from psi trigger scheduling mechanism 2020-06-15 14:10:03 +02:00
rt.c sched: Remove struct sched_class::next field 2020-06-25 13:45:44 +02:00
sched-pelt.h sched/fair: Fix "runnable_avg_yN_inv" not used warnings 2019-06-17 12:15:58 +02:00
sched.h sched: Remove duplicated tick_nohz_full_enabled() check 2020-07-28 13:27:54 +02:00
smp.h sched/headers: Split out open-coded prototypes into kernel/sched/smp.h 2020-05-28 11:03:20 +02:00
stats.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
stats.h psi: Move PF_MEMSTALL out of task->flags 2020-03-20 13:06:19 +01:00
stop_task.c sched: Remove struct sched_class::next field 2020-06-25 13:45:44 +02:00
swait.c sched/swait: Prepare usage in completions 2020-03-21 16:00:23 +01:00
topology.c sched: correct SD_flags returned by tl->sd_flags() 2020-06-15 14:10:04 +02:00
wait.c Add wake_up_interruptible_sync_poll_locked() 2019-10-31 15:12:23 +00:00
wait_bit.c sched/wait: fix ___wait_var_event(exclusive) 2019-12-17 13:32:50 +01:00