linux-uconsole/kernel/sched
Xuewei Zhang 742f2319cb sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision
commit 4929a4e6fa upstream.

The quota/period ratio is used to ensure a child task group won't get
more bandwidth than the parent task group, and is calculated as:

  normalized_cfs_quota() = [(quota_us << 20) / period_us]

If the quota/period ratio was changed during this scaling due to
precision loss, it will cause inconsistency between parent and child
task groups.

See below example:

A userspace container manager (kubelet) does three operations:

 1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
 2) Create a few children cgroups.
 3) Set quota to 1,000us and period to 10,000us on a child cgroup.

These operations are expected to succeed. However, if the scaling of
147/128 happens before step 3, quota and period of the parent cgroup
will be changed:

  new_quota: 1148437ns,   1148us
 new_period: 11484375ns, 11484us

And when step 3 comes in, the ratio of the child cgroup will be
104857, which will be larger than the parent cgroup ratio (104821),
and will fail.

Scaling them by a factor of 2 will fix the problem.

Tested-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Xuewei Zhang <xueweiz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Phil Auld <pauld@redhat.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: 2e8e192263 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:52:35 +01:00
..
autogroup.c
autogroup.h
clock.c
completion.c
core.c sched/core: Avoid spurious lock dependencies 2019-12-13 08:51:04 +01:00
cpuacct.c
cpudeadline.c
cpudeadline.h
cpufreq.c
cpufreq_schedutil.c sched/cpufreq: Align trace event behavior of fast switching 2019-10-05 13:09:51 +02:00
cpupri.c
cpupri.h
cputime.c sched/vtime: Fix guest/system mis-accounting on task switch 2019-11-06 13:06:01 +01:00
deadline.c sched/deadline: Fix bandwidth accounting at all levels after offline migration 2019-10-05 13:09:36 +02:00
debug.c
fair.c sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision 2019-12-13 08:52:35 +01:00
features.h
idle.c idle: Prevent late-arriving interrupts from disrupting offline 2019-10-05 13:09:40 +02:00
isolation.c
loadavg.c
Makefile
membarrier.c sched/membarrier: Fix private expedited registration check 2019-10-11 18:21:22 +02:00
pelt.c
pelt.h
rt.c
sched-pelt.h sched/fair: Fix "runnable_avg_yN_inv" not used warnings 2019-07-26 09:14:08 +02:00
sched.h sched/debug: Explicitly cast sched_feat() to bool 2019-11-20 18:46:13 +01:00
stats.c
stats.h
stop_task.c
swait.c
topology.c sched/topology: Fix off by one bug 2019-12-01 09:17:16 +01:00
wait.c
wait_bit.c