linux-uconsole/kernel/cgroup
Cong Wang 0505cc4c90 cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
[ Upstream commit ad0f75e5f5 ]

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reported-by: Daniël Sonck <dsonck92@gmail.com>
Reported-by: Zhang Qiang <qiang.zhang@windriver.com>
Tested-by: Cameron Berkenpas <cam@neo-zeon.de>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22 09:32:00 +02:00
..
cgroup-internal.h cgroup/tracing: Move taking of spin lock out of trace event handlers 2018-07-11 10:48:47 -07:00
cgroup-v1.c cgroup1: don't call release_agent when it is "" 2020-04-02 15:28:14 +02:00
cgroup.c cgroup: fix cgroup_sk_alloc() for sk_clone_lock() 2020-07-22 09:32:00 +02:00
cpuset.c cpuset: restore sanity to cpuset_cpus_allowed_fallback() 2019-07-10 09:53:39 +02:00
debug.c debug cgroup: use task_css_set instead of rcu_dereference 2017-11-27 11:37:33 -08:00
freezer.c cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS 2017-07-21 11:14:51 -04:00
Makefile cgroup: Rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c 2018-04-26 14:29:04 -07:00
namespace.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pids.c cgroup: pids: use atomic64_t for pids->limit 2019-12-17 20:34:56 +01:00
rdma.c rdmacg: Convert to use match_string() helper 2018-05-07 09:27:26 -07:00
rstat.c Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window" 2020-06-07 13:17:53 +02:00