Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Generic  process - grouping  system . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Based  originally  on  the  cpuset  system ,  extracted  by  Paul  Menage 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Copyright  ( C )  2006  Google ,  Inc 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *   Notifications  support 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Copyright  ( C )  2009  Nokia  Corporation 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Author :  Kirill  A .  Shutemov 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 *   Copyright  notices  from  the  original  cpuset  code : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Copyright  ( C )  2003  BULL  SA . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Copyright  ( C )  2004 - 2006  Silicon  Graphics ,  Inc . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   Portions  derived  from  Patrick  Mochel ' s  sysfs  code . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   sysfs  is  Copyright  ( c )  2001 - 3  Patrick  Mochel 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   2003 - 10 - 10  Written  by  Simon  Derr . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   2003 - 10 - 22  Updates  by  Stephen  Hemminger . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   2004  May - July  Rework  by  Paul  Jackson . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   This  file  is  subject  to  the  terms  and  conditions  of  the  GNU  General  Public 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   License .   See  the  file  COPYING  in  the  main  directory  of  the  Linux 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   distribution  for  more  details . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/cgroup.h> 
  
						 
					
						
							
								
									
										
										
										
											2011-06-02 21:20:51 +10:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/cred.h> 
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/ctype.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <linux/errno.h> 
  
						 
					
						
							
								
									
										
										
										
											2011-06-02 21:20:51 +10:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/init_task.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <linux/kernel.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/list.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/mm.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/mutex.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/mount.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/pagemap.h> 
  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/proc_fs.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <linux/rcupdate.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/sched.h> 
  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/backing-dev.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <linux/seq_file.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/slab.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/magic.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/spinlock.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/string.h> 
  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/sort.h> 
  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/kmod.h> 
  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/module.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/delayacct.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/cgroupstats.h> 
  
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/hashtable.h> 
  
						 
					
						
							
								
									
										
										
										
											2008-07-26 03:46:43 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/namei.h> 
  
						 
					
						
							
								
									
										
										
										
											2009-07-29 15:04:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/pid_namespace.h> 
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/idr.h> 
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/vmalloc.h> /* TODO: replace with more sophisticated array */ 
  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/eventfd.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <linux/poll.h> 
  
						 
					
						
							
								
									
										
										
										
											2013-03-13 09:17:09 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/flex_array.h> /* used in cgroup_attach_task */ 
  
						 
					
						
							
								
									
										
										
										
											2012-04-21 09:13:46 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/kthread.h> 
  
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/file.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 16:09:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <linux/atomic.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_mutex  is  the  master  lock .   Any  modification  to  cgroup  or  its 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  hierarchy  must  be  performed  while  holding  it . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_root_mutex  nests  inside  cgroup_mutex  and  should  be  held  to  modify 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroupfs_root  of  any  cgroup  hierarchy  -  subsys  list ,  flags , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  release_agent_path  and  so  on .   Modifying  requires  both  cgroup_mutex  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_root_mutex .   Readers  can  acquire  either  of  the  two .   This  is  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  break  the  following  locking  order  cycle . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   A .  cgroup_mutex  - >  cred_guard_mutex  - >  s_type - > i_mutex_key  - >  namespace_sem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   B .  namespace_sem  - >  cgroup_mutex 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  B  happens  only  through  cgroup_show_options ( )  and  using  cgroup_root_mutex 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  breaks  it . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# ifdef CONFIG_PROVE_RCU 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								DEFINE_MUTEX ( cgroup_mutex ) ;  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:22 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_mutex ) ; 	/* only for lockdep */  
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# else 
  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_MUTEX ( cgroup_mutex ) ;  
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# endif 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_MUTEX ( cgroup_root_mutex ) ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use a dedicated workqueue for cgroup destruction
Since be44562613851 ("cgroup: remove synchronize_rcu() from
cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
freeing is performed from a work item from that point on and a later
commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two
steps"), moves css offlining to workqueue too.
As cgroup destruction isn't depended upon for memory reclaim, the
destruction work items were put on the system_wq; unfortunately, some
controller may block in the destruction path for considerable duration
while holding cgroup_mutex.  As large part of destruction path is
synchronized through cgroup_mutex, when combined with high rate of
cgroup removals, this has potential to fill up system_wq's max_active
of 256.
Also, it turns out that memcg's css destruction path ends up queueing
and waiting for work items on system_wq through work_on_cpu().  If
such operation happens while system_wq is fully occupied by cgroup
destruction work items, work_on_cpu() can't make forward progress
because system_wq is full and other destruction work items on
system_wq can't make forward progress because the work item waiting
for work_on_cpu() is holding cgroup_mutex, leading to deadlock.
This can be fixed by queueing destruction work items on a separate
workqueue.  This patch creates a dedicated workqueue -
cgroup_destroy_wq - for this purpose.  As these work items shouldn't
have inter-dependencies and mostly serialized by cgroup_mutex anyway,
giving high concurrency level doesn't buy anything and the workqueue's
@max_active is set to 1 so that destruction work items are executed
one by one on each CPU.
Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
separate core_initcall().  In the future, we probably want to reorder
so that workqueue init happens before cgroup_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Shawn Bohrer <shawn.bohrer@gmail.com>
Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
Cc: stable@vger.kernel.org # v3.9+
											 
										 
										
											2013-11-22 17:14:39 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup  destruction  makes  heavy  use  of  work  items  and  there  can  be  a  lot 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  of  concurrent  destructions .   Use  a  separate  workqueue  so  that  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  destruction  work  items  don ' t  end  up  filling  up  max_active  of  system_wq 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  which  may  lead  to  deadlock . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  workqueue_struct  * cgroup_destroy_wq ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Generate  an  array  of  cgroup  subsystem  pointers .  At  boot  time ,  this  is 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  populated  with  the  built  in  subsystems ,  and  modular  subsystems  are 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  registered  after  that .  The  mutable  section  of  this  array  is  protected  by 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_mutex . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-12 16:12:06 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# define SUBSYS(_x) [_x ## _subsys_id] = &_x ## _subsys, 
  
						 
					
						
							
								
									
										
										
										
											2012-09-12 16:12:05 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option) 
  
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroup_subsys  * cgroup_subsys [ CGROUP_SUBSYS_COUNT ]  =  {  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <linux/cgroup_subsys.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  The  dummy  hierarchy ,  reserved  for  the  subsystems  that  are  otherwise 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  unattached  -  it  never  has  more  than  a  single  cgroup ,  and  all  tasks  are 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  part  of  that  cgroup . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroupfs_root  cgroup_dummy_root ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* dummy_top is a shorthand for the dummy hierarchy's top cgroup */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  cgroup  *  const  cgroup_dummy_top  =  & cgroup_dummy_root . top_cgroup ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroupfs  file  entry ,  pointed  to  from  leaf  dentry - > d_fsdata . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cfent  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head 		node ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  dentry 			* dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype 			* type ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state 	* css ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-18 23:09:52 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* file xattrs */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  simple_xattrs 		xattrs ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2011-03-30 22:57:33 -03:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_event  represents  events  which  userspace  want  to  receive . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_event  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  css  which  the  event  belongs  to . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Control  file  which  the  event  associated . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  eventfd  to  signal  userspace  about  the  event . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  eventfd_ctx  * eventfd ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Each  of  these  stored  in  a  list  by  the  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  list ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  All  fields  below  needed  to  unregister  event  when 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  userspace  closes  eventfd . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									poll_table  pt ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									wait_queue_head_t  * wqh ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									wait_queue_t  wait ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  work_struct  remove ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/* The list of hierarchy roots */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  LIST_HEAD ( cgroup_roots ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_root_count ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Hierarchy  ID  allocation  and  mapping .   It  follows  the  same  exclusion 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  rules  as  other  root  ops  -  both  cgroup_mutex  and  cgroup_root_mutex  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  writes ,  either  for  reads . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_IDR ( cgroup_hierarchy_idr ) ;  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroup_name  root_cgroup_name  =  {  . name  =  " / "  } ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:53:53 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Assign  a  monotonically  increasing  serial  number  to  cgroups .   It 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  guarantees  cgroups  with  bigger  numbers  are  newer  than  those  with  smaller 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  numbers .   Also ,  as  cgroups  are  always  appended  to  the  parent ' s 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  - > children  list ,  it  guarantees  that  sibling  cgroups  are  always  sorted  in 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 11:14:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  the  ascending  serial  number  order  on  the  list .   Protected  by 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_mutex . 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:53:53 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 11:14:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  u64  cgroup_serial_nr_next  =  1 ;  
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:53:53 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/* This flag indicates whether tasks in the fork and exit paths should
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  check  for  fork / exit  handlers  to  call .  This  avoids  us  having  to  do 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  extra  work  in  the  fork / exit  path  if  none  of  the  subsystems  need  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  be  called . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  need_forkexit_callback  __read_mostly ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cftype  cgroup_base_files [ ] ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_destroy_css_killed ( struct  cgroup  * cgrp ) ;  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_destroy_locked ( struct  cgroup  * cgrp ) ;  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_addrm_files ( struct  cgroup  * cgrp ,  struct  cftype  cfts [ ] ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											      bool  is_add ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-11-27 18:16:21 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_file_release ( struct  inode  * inode ,  struct  file  * file ) ;  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_css  -  obtain  a  cgroup ' s  css  for  the  specified  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  of  interest 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ ss :  the  subsystem  of  interest  ( % NULL  returns  the  dummy_css ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Return  @ cgrp ' s  css  ( cgroup_subsys_state )  associated  with  @ ss .   This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  must  be  called  either  under  cgroup_mutex  or  rcu_read_lock ( )  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  caller  is  responsible  for  pinning  the  returned  css  if  it  wants  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  keep  accessing  it  outside  the  said  locks .   This  function  may  return 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  % NULL  if  @ cgrp  doesn ' t  have  @ subsys_id  enabled . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  cgroup_subsys_state  * cgroup_css ( struct  cgroup  * cgrp ,  
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													      struct  cgroup_subsys  * ss ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  rcu_dereference_check ( cgrp - > subsys [ ss - > subsys_id ] , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													     lockdep_is_held ( & cgroup_mutex ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  & cgrp - > dummy_css ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/* convenient tests for these bits */  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:53 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  bool  cgroup_is_dead ( const  struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:53 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  test_bit ( CGRP_DEAD ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-08 19:00:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_is_descendant  -  test  ancestry 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  to  be  tested 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ancestor :  possible  ancestor  of  @ cgrp 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Test  whether  @ cgrp  is  a  descendant  of  @ ancestor .   It  also  returns  % true 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  if  @ cgrp  = =  @ ancestor .   This  function  is  safe  to  call  as  long  as  @ cgrp 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  and  @ ancestor  are  accessible . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  cgroup_is_descendant ( struct  cgroup  * cgrp ,  struct  cgroup  * ancestor )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( cgrp )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( cgrp  = =  ancestor ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgrp  =  cgrp - > parent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_is_descendant ) ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:13:46 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_is_releasable ( const  struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									const  int  bits  = 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										( 1  < <  CGRP_RELEASABLE )  | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										( 1  < <  CGRP_NOTIFY_ON_RELEASE ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ( cgrp - > flags  &  bits )  = =  bits ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:13:46 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  notify_on_release ( const  struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  test_bit ( CGRP_NOTIFY_ON_RELEASE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  for_each_subsys  -  iterate  all  loaded  cgroup  subsystems 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  the  iteration  cursor 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ i :  the  index  of  @ ss ,  CGROUP_SUBSYS_COUNT  after  reaching  the  end 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Should  be  called  under  cgroup_mutex . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define for_each_subsys(ss, i)						\ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( ( i )  =  0 ;  ( i )  <  CGROUP_SUBSYS_COUNT ;  ( i ) + + ) 			\
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ( {  lockdep_assert_held ( & cgroup_mutex ) ; 		\
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										       ! ( ( ss )  =  cgroup_subsys [ i ] ) ;  } ) )  {  } 		\
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  for_each_builtin_subsys  -  iterate  all  built - in  cgroup  subsystems 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  the  iteration  cursor 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ i :  the  index  of  @ ss ,  CGROUP_BUILTIN_SUBSYS_COUNT  after  reaching  the  end 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Bulit - in  subsystems  are  always  present  and  iteration  itself  doesn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  require  any  synchronization . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define for_each_builtin_subsys(ss, i)					\ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( ( i )  =  0 ;  ( i )  <  CGROUP_BUILTIN_SUBSYS_COUNT  & & 		\
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									     ( ( ( ss )  =  cgroup_subsys [ i ] )  | |  true ) ;  ( i ) + + ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* iterate each subsystem attached to a hierarchy */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define for_each_root_subsys(root, ss)					\ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry ( ( ss ) ,  & ( root ) - > subsys_list ,  sibling ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* iterate across the active hierarchies */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define for_each_active_root(root)					\ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry ( ( root ) ,  & cgroup_roots ,  root_list ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  struct  cgroup  * __d_cgrp ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  dentry - > d_fsdata ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  struct  cfent  * __d_cfe ( struct  dentry  * dentry )  
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  dentry - > d_fsdata ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  struct  cftype  * __d_cft ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  __d_cfe ( dentry ) - > type ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_lock_live_group  -  take  cgroup_mutex  and  check  that  cgrp  is  alive . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  to  be  checked  for  liveness 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  On  success ,  returns  true ;  the  mutex  should  be  later  unlocked .   On 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  failure  returns  false  with  no  lock  held . 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  bool  cgroup_lock_live_group ( struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:53 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cgroup_is_dead ( cgrp ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* the list of cgroups eligible for automatic release. Protected by
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  release_list_lock  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  LIST_HEAD ( release_list ) ;  
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_RAW_SPINLOCK ( release_list_lock ) ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_release_agent ( struct  work_struct  * work ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  DECLARE_WORK ( release_agent_work ,  cgroup_release_agent ) ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  check_for_release ( struct  cgroup  * cgrp ) ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  cgroup  can  be  associated  with  multiple  css_sets  as  different  tasks  may 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  belong  to  different  cgroups  on  different  hierarchies .   In  the  other 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  direction ,  a  css_set  is  naturally  associated  with  multiple  cgroups . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  M : N  relationship  is  represented  by  the  following  link  structure 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  which  exists  for  each  association  and  allows  traversing  the  associations 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  from  both  sides . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgrp_cset_link  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* the cgroup and css_set this link associates */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup 		* cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_set 		* cset ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* list of cgrp_cset_links anchored at cgrp->cset_links */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head 	cset_link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* list of cgrp_cset_links anchored at css_set->cgrp_links */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head 	cgrp_link ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* The default css_set - used by init and its children prior to any
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  hierarchies  being  mounted .  It  contains  a  pointer  to  the  root  state 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  for  each  subsystem .  Also  used  to  anchor  the  list  of  css_sets .  Not 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  reference - counted ,  to  improve  performance  when  child  cgroups 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  haven ' t  been  created . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  css_set  init_css_set ;  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgrp_cset_link  init_cgrp_cset_link ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  css_set_lock  protects  the  list  of  css_set  objects ,  and  the  chain  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  tasks  off  each  css_set .   Nests  outside  task - > alloc_lock  due  to 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_task_iter_start ( ) . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_RWLOCK ( css_set_lock ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  css_set_count ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  hash  table  for  cgroup  groups .  This  improves  the  performance  to  find 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  an  existing  css_set .  This  hash  doesn ' t  ( currently )  take  into 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  account  cgroups  in  empty  hierarchies . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# define CSS_SET_HASH_BITS	7 
  
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  DEFINE_HASHTABLE ( css_set_table ,  CSS_SET_HASH_BITS ) ;  
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  unsigned  long  css_set_hash ( struct  cgroup_subsys_state  * css [ ] )  
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  key  =  0UL ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( ss ,  i ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										key  + =  ( unsigned  long ) css [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									key  =  ( key  > >  16 )  ^  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  We  don ' t  maintain  the  lists  running  through  each  css_set  to  its  task 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  until  after  the  first  call  to  css_task_iter_start ( ) .   This  reduces  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  fork ( ) / exit ( )  overhead  for  people  who  have  cgroups  compiled  into  their 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  kernel  but  not  actually  in  use . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  use_task_css_set_links  __read_mostly ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  __put_css_set ( struct  css_set  * cset ,  int  taskexit )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ,  * tmp_link ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Ensure  that  the  refcount  doesn ' t  hit  zero  while  any  readers 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  can  see  it .  Similar  to  atomic_dec_and_lock ( ) ,  but  for  an 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  rwlock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( atomic_add_unless ( & cset - > refcount ,  - 1 ,  1 ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! atomic_dec_and_test ( & cset - > refcount ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* This css_set is dead. unlink it and release cgroup refcounts */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									hash_del ( & cset - > hlist ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_set_count - - ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry_safe ( link ,  tmp_link ,  & cset - > cgrp_links ,  cgrp_link )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * cgrp  =  link - > cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_del ( & link - > cset_link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & link - > cgrp_link ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:43:28 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* @cgrp can't go away while we're holding css_set_lock */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( list_empty ( & cgrp - > cset_links )  & &  notify_on_release ( cgrp ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( taskexit ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												set_bit ( CGRP_RELEASABLE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											check_for_release ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( link ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree_rcu ( cset ,  rcu_head ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  refcounted  get / put  for  css_set  objects 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  void  get_css_set ( struct  css_set  * cset )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									atomic_inc ( & cset - > refcount ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  void  put_css_set ( struct  css_set  * cset )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__put_css_set ( cset ,  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  void  put_css_set_taskexit ( struct  css_set  * cset )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__put_css_set ( cset ,  1 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  compare_css_sets  -  helper  function  for  find_existing_css_set ( ) . 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ cset :  candidate  css_set  being  tested 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ old_cset :  existing  css_set  for  a  task 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ new_cgrp :  cgroup  that ' s  being  entered  by  the  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ template :  desired  set  of  css  pointers  in  css_set  ( pre - calculated ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 16:18:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Returns  true  if  " cset "  matches  " old_cset "  except  for  the  hierarchy 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  which  " new_cgrp "  belongs  to ,  for  which  it  should  match  " new_cgrp " . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  bool  compare_css_sets ( struct  css_set  * cset ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											     struct  css_set  * old_cset , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											     struct  cgroup  * new_cgrp , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											     struct  cgroup_subsys_state  * template [ ] ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  * l1 ,  * l2 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( memcmp ( template ,  cset - > subsys ,  sizeof ( cset - > subsys ) ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* Not all subsystems matched */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Compare  cgroup  pointers  in  order  to  distinguish  between 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  different  cgroups  in  heirarchies  with  no  subsystems .  We 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  could  get  by  with  just  this  check  alone  ( and  skip  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  memcmp  above )  but  on  most  setups  the  memcmp  check  will 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  avoid  the  need  for  this  more  expensive  check  on  almost  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  candidates . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l1  =  & cset - > cgrp_links ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l2  =  & old_cset - > cgrp_links ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									while  ( 1 )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgrp_cset_link  * link1 ,  * link2 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * cgrp1 ,  * cgrp2 ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										l1  =  l1 - > next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										l2  =  l2 - > next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* See if we reached the end - both lists are equal length. */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( l1  = =  & cset - > cgrp_links )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG_ON ( l2  ! =  & old_cset - > cgrp_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										}  else  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											BUG_ON ( l2  = =  & old_cset - > cgrp_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* Locate the cgroups associated with these links. */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										link1  =  list_entry ( l1 ,  struct  cgrp_cset_link ,  cgrp_link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										link2  =  list_entry ( l2 ,  struct  cgrp_cset_link ,  cgrp_link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgrp1  =  link1 - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgrp2  =  link2 - > cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* Hierarchies should be linked in the same order. */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( cgrp1 - > root  ! =  cgrp2 - > root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  If  this  hierarchy  is  the  hierarchy  of  the  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  that ' s  changing ,  then  we  need  to  check  that  this 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  css_set  points  to  the  new  cgroup ;  if  it ' s  any  other 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  hierarchy ,  then  this  css_set  should  point  to  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  same  cgroup  as  the  old  css_set . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( cgrp1 - > root  = =  new_cgrp - > root )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( cgrp1  ! =  new_cgrp ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										}  else  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( cgrp1  ! =  cgrp2 ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  find_existing_css_set  -  init  css  array  and  find  the  matching  css_set 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ old_cset :  the  css_set  that  we ' re  using  before  the  cgroup  transition 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  that  we ' re  moving  into 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ template :  out  param  for  the  new  set  of  csses ,  should  be  clear  on  entry 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  css_set  * find_existing_css_set ( struct  css_set  * old_cset ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													struct  cgroup  * cgrp , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													struct  cgroup_subsys_state  * template [ ] ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  cgrp - > root ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Build  the  set  of  subsystem  state  objects  that  we  want  to  see  in  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  new  css_set .  while  subsystems  can  change  globally ,  the  entries  here 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  won ' t  change ,  so  no  need  for  locking . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( ss ,  i )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( root - > subsys_mask  &  ( 1UL  < <  i ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* Subsystem is in this hierarchy. So we want
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  the  subsystem  state  from  the  new 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  cgroup  */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											template [ i ]  =  cgroup_css ( cgrp ,  ss ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Subsystem is not in this hierarchy, so we
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  don ' t  want  to  change  the  subsystem  state  */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											template [ i ]  =  old_cset - > subsys [ i ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									key  =  css_set_hash ( template ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									hash_for_each_possible ( css_set_table ,  cset ,  hlist ,  key )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! compare_css_sets ( cset ,  old_cset ,  cgrp ,  template ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* This css_set matches what we need */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* No existing cgroup group matched */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  free_cgrp_cset_links ( struct  list_head  * links_to_free )  
						 
					
						
							
								
									
										
										
										
											2008-07-29 22:33:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ,  * tmp_link ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-29 22:33:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry_safe ( link ,  tmp_link ,  links_to_free ,  cset_link )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & link - > cset_link ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-29 22:33:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										kfree ( link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  allocate_cgrp_cset_links  -  allocate  cgrp_cset_links 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ count :  the  number  of  links  to  allocate 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tmp_links :  list_head  the  allocated  links  are  put  on 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Allocate  @ count  cgrp_cset_link  structures  and  chain  them  on  @ tmp_links 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  through  - > cset_link .   Returns  0  on  success  or  - errno . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  allocate_cgrp_cset_links ( int  count ,  struct  list_head  * tmp_links )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( tmp_links ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  count ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										link  =  kzalloc ( sizeof ( * link ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! link )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											free_cgrp_cset_links ( tmp_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_add ( & link - > cset_link ,  tmp_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  link_css_set  -  a  helper  function  to  link  a  css_set  to  a  cgroup 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ tmp_links :  cgrp_cset_link  objects  allocated  by  allocate_cgrp_cset_links ( ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ cset :  the  css_set  to  be  linked 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ cgrp :  the  destination  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  link_css_set ( struct  list_head  * tmp_links ,  struct  css_set  * cset ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 struct  cgroup  * cgrp ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( list_empty ( tmp_links ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									link  =  list_first_entry ( tmp_links ,  struct  cgrp_cset_link ,  cset_link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									link - > cset  =  cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									link - > cgrp  =  cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_move ( & link - > cset_link ,  & cgrp - > cset_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Always  add  links  to  the  tail  of  the  list  so  that  the  list 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  is  sorted  by  order  of  hierarchy  creation 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_add_tail ( & link - > cgrp_link ,  & cset - > cgrp_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  find_css_set  -  return  a  new  css_set  with  one  cgroup  updated 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ old_cset :  the  baseline  css_set 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  to  be  updated 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Return  a  new  css_set  that ' s  equivalent  to  @ old_cset ,  but  with  @ cgrp 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  substituted  into  the  appropriate  hierarchy . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  css_set  * find_css_set ( struct  css_set  * old_cset ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												    struct  cgroup  * cgrp ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * template [ CGROUP_SUBSYS_COUNT ]  =  {  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  list_head  tmp_links ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* First see if we already have a cgroup group that matches
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  desired  set  */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cset  =  find_existing_css_set ( old_cset ,  cgrp ,  template ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cset ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										get_css_set ( cset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cset ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cset  =  kzalloc ( sizeof ( * cset ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! cset ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Allocate all the cgrp_cset_link objects that we'll need */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( allocate_cgrp_cset_links ( cgroup_root_count ,  & tmp_links )  <  0 )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										kfree ( cset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									atomic_set ( & cset - > refcount ,  1 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cset - > cgrp_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cset - > tasks ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_HLIST_NODE ( & cset - > hlist ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Copy the set of subsystem state objects generated in
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  find_existing_css_set ( )  */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									memcpy ( cset - > subsys ,  template ,  sizeof ( cset - > subsys ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Add reference counts and links from the new css_set. */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry ( link ,  & old_cset - > cgrp_links ,  cgrp_link )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * c  =  link - > cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( c - > root  = =  cgrp - > root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											c  =  cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										link_css_set ( & tmp_links ,  cset ,  c ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! list_empty ( & tmp_links ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									css_set_count + + ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Add this cgroup group to the hash table */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									key  =  css_set_hash ( cset - > subsys ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									hash_add ( css_set_table ,  & cset - > hlist ,  key ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Return  the  cgroup  for  " task "  from  the  given  hierarchy .  Must  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  called  with  cgroup_mutex  held . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  cgroup  * task_cgroup_from_root ( struct  task_struct  * task ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													    struct  cgroupfs_root  * root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * res  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! mutex_is_locked ( & cgroup_mutex ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  No  need  to  lock  the  task  -  since  we  hold  cgroup_mutex  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  task  can ' t  change  groups ,  so  the  only  thing  that  can  happen 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  is  that  it  exits  and  its  css  is  set  back  to  init_css_set . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cset  =  task_css_set ( task ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cset  = =  & init_css_set )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										res  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgrp_cset_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_for_each_entry ( link ,  & cset - > cgrp_links ,  cgrp_link )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											struct  cgroup  * c  =  link - > cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( c - > root  = =  root )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												res  =  c ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! res ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  res ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  There  is  one  global  cgroup  mutex .  We  also  require  taking 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  task_lock ( )  when  dereferencing  a  task ' s  cgroup  subsys  pointers . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  See  " The task_lock() exception " ,  at  the  end  of  this  comment . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  task  must  hold  cgroup_mutex  to  modify  cgroups . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Any  task  can  increment  and  decrement  the  count  field  without  lock . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  So  in  general ,  code  holding  cgroup_mutex  can ' t  rely  on  the  count 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  field  not  changing .   However ,  if  the  count  goes  to  zero ,  then  only 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:43 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_attach_task ( )  can  increment  it  again .   Because  a  count  of  zero 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 *  means  that  no  tasks  are  currently  attached ,  therefore  there  is  no 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  way  a  task  attached  to  that  cgroup  can  fork  ( the  other  way  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  increment  the  count ) .   So  code  holding  cgroup_mutex  can  safely 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  assume  that  if  the  count  is  zero ,  it  will  stay  zero .  Similarly ,  if 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  a  task  holds  cgroup_mutex  on  a  cgroup  with  zero  count ,  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  knows  that  the  cgroup  won ' t  be  removed ,  as  cgroup_rmdir ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  needs  that  mutex . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  fork  and  exit  callbacks  cgroup_fork ( )  and  cgroup_exit ( ) ,  don ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  ( usually )  take  cgroup_mutex .   These  are  the  two  most  performance 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  critical  pieces  of  code  here .   The  exception  occurs  on  cgroup_exit ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  when  a  task  in  a  notify_on_release  cgroup  exits .   Then  cgroup_mutex 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  is  taken ,  and  if  the  cgroup  count  is  zero ,  a  usermode  call  made 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  to  the  release  agent  with  the  name  of  the  cgroup  ( path  relative  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  root  of  cgroup  file  system )  as  the  argument . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  cgroup  can  only  be  deleted  if  both  its  ' count '  of  using  tasks 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  is  zero ,  and  its  list  of  ' children '  cgroups  is  empty .   Since  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  tasks  in  the  system  use  _some_  cgroup ,  and  since  there  is  always  at 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  least  one  task  in  the  system  ( init ,  pid  = =  1 ) ,  therefore ,  top_cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  always  has  either  children  cgroups  and / or  using  tasks .   So  we  don ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  need  a  special  hack  to  ensure  that  top_cgroup  cannot  be  deleted . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 	The  task_lock ( )  exception 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  need  for  this  exception  arises  from  the  action  of 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-20 22:06:18 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_attach_task ( ) ,  which  overwrites  one  task ' s  cgroup  pointer  with 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  another .   It  does  so  using  cgroup_mutex ,  however  there  are 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 *  several  performance  critical  places  that  need  to  reference 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  task - > cgroup  without  the  expense  of  grabbing  a  system  global 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  mutex .   Therefore  except  as  noted  below ,  when  dereferencing  or ,  as 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-20 22:06:18 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  in  cgroup_attach_task ( ) ,  modifying  a  task ' s  cgroup  pointer  we  use 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 *  task_lock ( ) ,  which  acts  on  a  spinlock  ( task - > alloc_lock )  already  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  task_struct  routinely  used  for  such  matters . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  P . S .   One  more  locking  exception .   RCU  is  used  to  guard  the 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:43 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  update  of  a  tasks  cgroup  pointer  by  cgroup_attach_task ( ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  couple  of  forward  declarations  required ,  due  to  cyclic  reference  loop : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_mkdir  - >  cgroup_create  - >  cgroup_populate_dir  - > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_add_file  - >  cgroup_create_file  - >  cgroup_dir_inode_operations 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  - >  cgroup_mkdir . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:41:39 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_mkdir ( struct  inode  * dir ,  struct  dentry  * dentry ,  umode_t  mode ) ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_rmdir ( struct  inode  * unused_dir ,  struct  dentry  * dentry ) ;  
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_populate_dir ( struct  cgroup  * cgrp ,  unsigned  long  subsys_mask ) ;  
						 
					
						
							
								
									
										
										
										
											2009-09-21 17:01:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  inode_operations  cgroup_dir_inode_operations ;  
						 
					
						
							
								
									
										
										
										
											2009-10-01 15:43:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  file_operations  proc_cgroupstats_operations ;  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  backing_dev_info  cgroup_backing_dev_info  =  {  
						 
					
						
							
								
									
										
										
										
											2009-06-12 14:45:52 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. name 		=  " cgroup " , 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-30 00:54:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. capabilities 	=  BDI_CAP_NO_ACCT_AND_WRITEBACK , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  inode  * cgroup_new_inode ( umode_t  mode ,  struct  super_block  * sb )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  inode  * inode  =  new_inode ( sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( inode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-23 11:19:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										inode - > i_ino  =  get_next_ino ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										inode - > i_mode  =  mode ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-14 10:39:12 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										inode - > i_uid  =  current_fsuid ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_gid  =  current_fsgid ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										inode - > i_atime  =  inode - > i_mtime  =  inode - > i_ctime  =  CURRENT_TIME ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_mapping - > backing_dev_info  =  & cgroup_backing_dev_info ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  inode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroup_name  * cgroup_alloc_name ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_name  * name ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									name  =  kmalloc ( sizeof ( * name )  +  dentry - > d_name . len  +  1 ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! name ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									strcpy ( name - > name ,  dentry - > d_name . name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  name ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_free_fn ( struct  work_struct  * work )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:42 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  container_of ( work ,  struct  cgroup ,  destroy_work ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp - > root - > number_of_cgroups - - ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-08 14:35:02 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  We  get  a  ref  to  the  parent ' s  dentry ,  and  put  the  ref  when 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  this  cgroup  is  being  freed ,  so  it ' s  guaranteed  that  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  parent  won ' t  be  destroyed  before  its  children . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dput ( cgrp - > parent - > dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Drop  the  active  superblock  reference  that  we  took  when  we 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-26 11:58:02 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  created  the  cgroup .  This  will  free  cgrp - > root ,  if  we  are 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  holding  the  last  reference  to  @ sb . 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									deactivate_super ( cgrp - > root - > sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  if  we ' re  getting  rid  of  the  cgroup ,  refcount  should  ensure 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  that  there  are  no  pidlists  left . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! list_empty ( & cgrp - > pidlists ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									simple_xattrs_free ( & cgrp - > xattrs ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( rcu_dereference_raw ( cgrp - > name ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_free_rcu ( struct  rcu_head  * head )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  container_of ( head ,  struct  cgroup ,  rcu_head ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:42 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_WORK ( & cgrp - > destroy_work ,  cgroup_free_fn ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use a dedicated workqueue for cgroup destruction
Since be44562613851 ("cgroup: remove synchronize_rcu() from
cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
freeing is performed from a work item from that point on and a later
commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two
steps"), moves css offlining to workqueue too.
As cgroup destruction isn't depended upon for memory reclaim, the
destruction work items were put on the system_wq; unfortunately, some
controller may block in the destruction path for considerable duration
while holding cgroup_mutex.  As large part of destruction path is
synchronized through cgroup_mutex, when combined with high rate of
cgroup removals, this has potential to fill up system_wq's max_active
of 256.
Also, it turns out that memcg's css destruction path ends up queueing
and waiting for work items on system_wq through work_on_cpu().  If
such operation happens while system_wq is fully occupied by cgroup
destruction work items, work_on_cpu() can't make forward progress
because system_wq is full and other destruction work items on
system_wq can't make forward progress because the work item waiting
for work_on_cpu() is holding cgroup_mutex, leading to deadlock.
This can be fixed by queueing destruction work items on a separate
workqueue.  This patch creates a dedicated workqueue -
cgroup_destroy_wq - for this purpose.  As these work items shouldn't
have inter-dependencies and mostly serialized by cgroup_mutex anyway,
giving high concurrency level doesn't buy anything and the workqueue's
@max_active is set to 1 so that destruction work items are executed
one by one on each CPU.
Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
separate core_initcall().  In the future, we probably want to reorder
so that workqueue init happens before cgroup_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Shawn Bohrer <shawn.bohrer@gmail.com>
Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
Cc: stable@vger.kernel.org # v3.9+
											 
										 
										
											2013-11-22 17:14:39 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									queue_work ( cgroup_destroy_wq ,  & cgrp - > destroy_work ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_diput ( struct  dentry  * dentry ,  struct  inode  * inode )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* is dentry a directory ? if so, kfree() associated cgroup */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( S_ISDIR ( inode - > i_mode ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * cgrp  =  dentry - > d_fsdata ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:53 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( ! ( cgroup_is_dead ( cgrp ) ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-12-17 11:13:39 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  XXX :  cgrp - > id  is  only  used  to  look  up  css ' s .   As  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  and  css ' s  lifetimes  will  be  decoupled ,  it  should  be  made 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  per - subsystem  and  moved  to  css - > id  so  that  lookups  are 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  successful  until  the  target  css  is  released . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										idr_remove ( & cgrp - > root - > cgroup_idr ,  cgrp - > id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgrp - > id  =  - 1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:31:42 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										call_rcu ( & cgrp - > rcu_head ,  cgroup_free_rcu ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cfent  * cfe  =  __d_cfe ( dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup  * cgrp  =  dentry - > d_parent - > d_fsdata ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										WARN_ONCE ( ! list_empty ( & cfe - > node )  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											  cgrp  ! =  & cgrp - > root - > top_cgroup , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											  " cfe still linked for %s \n " ,  cfe - > type - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-18 23:09:52 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										simple_xattrs_free ( & cfe - > xattrs ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										kfree ( cfe ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									iput ( inode ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  remove_dir ( struct  dentry  * d )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  dentry  * parent  =  dget ( d - > d_parent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									d_delete ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									simple_rmdir ( parent - > d_inode ,  d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dput ( parent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_rm_file ( struct  cgroup  * cgrp ,  const  struct  cftype  * cft )  
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cfent  * cfe ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  we ' re  doing  cleanup  due  to  failure  of  cgroup_create ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  corresponding  @ cfe  may  not  exist . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry ( cfe ,  & cgrp - > files ,  node )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  dentry  * d  =  cfe - > dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( cft  & &  cfe - > type  ! =  cft ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dget ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										d_delete ( d ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-07-03 10:38:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										simple_unlink ( cgrp - > dentry - > d_inode ,  d ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_del_init ( & cfe - > node ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dput ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										break ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_clear_dir  -  remove  subsys  files  in  a  cgroup  directory 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:10 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ cgrp :  target  cgroup 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ subsys_mask :  mask  of  the  subsystem  ids  whose  files  should  be  removed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_clear_dir ( struct  cgroup  * cgrp ,  unsigned  long  subsys_mask )  
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 12:34:02 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 12:34:02 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( ss ,  i )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cftype_set  * set ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 12:34:02 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! test_bit ( i ,  & subsys_mask ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_for_each_entry ( set ,  & ss - > cftsets ,  node ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											cgroup_addrm_files ( cgrp ,  set - > cfts ,  false ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  NOTE  :  the  dentry  must  have  been  dget ( ) ' ed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_d_remove_dir ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2011-01-07 17:49:34 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  dentry  * parent ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-01-07 17:49:34 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									parent  =  dentry - > d_parent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_lock ( & parent - > d_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-01-14 11:34:34 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_lock_nested ( & dentry - > d_lock ,  DENTRY_D_LOCK_NESTED ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									list_del_init ( & dentry - > d_u . d_child ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-01-07 17:49:34 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									spin_unlock ( & dentry - > d_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_unlock ( & parent - > d_lock ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									remove_dir ( dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Call  with  cgroup_mutex  held .  Drops  reference  counts  on  modules ,  including 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  any  duplicate  ones  that  parse_cgroupfs_options  took .  If  this  function 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  an  error ,  no  reference  counts  are  touched . 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  rebind_subsystems ( struct  cgroupfs_root  * root ,  
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											     unsigned  long  added_mask ,  unsigned  removed_mask ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  pinned  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ,  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! mutex_is_locked ( & cgroup_mutex ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! mutex_is_locked ( & cgroup_root_mutex ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									/* Check that any added subsystems are currently free */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( ss ,  i )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! ( added_mask  &  ( 1  < <  i ) ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* is the subsystem mounted elsewhere? */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ss - > root  ! =  & cgroup_dummy_root )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ret  =  - EBUSY ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  out_put ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* pin the module */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! try_module_get ( ss - > module ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											ret  =  - ENOENT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  out_put ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pinned  | =  1  < <  i ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* subsys could be missing if unloaded between parsing and here */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( added_mask  ! =  pinned )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - ENOENT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_put ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  cgroup_populate_dir ( cgrp ,  added_mask ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_put ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Nothing  can  fail  from  this  point  on .   Remove  files  for  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  removed  subsystems  and  rebind  each  subsystem . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_clear_dir ( cgrp ,  removed_mask ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( ss ,  i )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										unsigned  long  bit  =  1UL  < <  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( bit  &  added_mask )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											/* We're binding this subsystem to this hierarchy */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											BUG_ON ( cgroup_css ( cgrp ,  ss ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG_ON ( ! cgroup_css ( cgroup_dummy_top ,  ss ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG_ON ( cgroup_css ( cgroup_dummy_top ,  ss ) - > cgroup  ! =  cgroup_dummy_top ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											rcu_assign_pointer ( cgrp - > subsys [ i ] , 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													   cgroup_css ( cgroup_dummy_top ,  ss ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cgroup_css ( cgrp ,  ss ) - > cgroup  =  cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											list_move ( & ss - > sibling ,  & root - > subsys_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ss - > root  =  root ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											if  ( ss - > bind ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												ss - > bind ( cgroup_css ( cgrp ,  ss ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* refcount was already taken, and we're keeping it */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											root - > subsys_mask  | =  bit ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										}  else  if  ( bit  &  removed_mask )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											/* We're removing this subsystem */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											BUG_ON ( cgroup_css ( cgrp ,  ss )  ! =  cgroup_css ( cgroup_dummy_top ,  ss ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG_ON ( cgroup_css ( cgrp ,  ss ) - > cgroup  ! =  cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											if  ( ss - > bind ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												ss - > bind ( cgroup_css ( cgroup_dummy_top ,  ss ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											cgroup_css ( cgroup_dummy_top ,  ss ) - > cgroup  =  cgroup_dummy_top ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											RCU_INIT_POINTER ( cgrp - > subsys [ i ] ,  NULL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											cgroup_subsys [ i ] - > root  =  & cgroup_dummy_root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											list_move ( & ss - > sibling ,  & cgroup_dummy_root . subsys_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* subsystem is now free - drop reference on module */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											module_put ( ss - > module ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											root - > subsys_mask  & =  ~ bit ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 18:04:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Mark  @ root  has  finished  binding  subsystems .   @ root - > subsys_mask 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  now  matches  the  bound  subsystems . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									root - > flags  | =  CGRP_ROOT_SUBSYS_BOUND ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out_put :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( ss ,  i ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( pinned  &  ( 1  < <  i ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											module_put ( ss - > module ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-08 21:32:45 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_show_options ( struct  seq_file  * seq ,  struct  dentry  * dentry )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2011-12-08 21:32:45 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  dentry - > d_sb - > s_fs_info ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_root_subsys ( root ,  ss ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										seq_printf ( seq ,  " ,%s " ,  ss - > name ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: introduce sane_behavior mount option
It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.
As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.
Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.
This patch introduces the mount option and changes the following
behaviors in cgroup core.
* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.
* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.
* Remount is disallowed.
The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.
v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
											 
										 
										
											2013-04-14 20:15:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( root - > flags  &  CGRP_ROOT_SANE_BEHAVIOR ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_puts ( seq ,  " ,sane_behavior " ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:15:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( root - > flags  &  CGRP_ROOT_NOPREFIX ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										seq_puts ( seq ,  " ,noprefix " ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:15:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( root - > flags  &  CGRP_ROOT_XATTR ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_puts ( seq ,  " ,xattr " ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( strlen ( root - > release_agent_path ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_printf ( seq ,  " ,release_agent=%s " ,  root - > release_agent_path ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( test_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & root - > top_cgroup . flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_puts ( seq ,  " ,clone_children " ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( strlen ( root - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_printf ( seq ,  " ,name=%s " ,  root - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_sb_opts  {  
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  subsys_mask ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									unsigned  long  flags ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  * release_agent ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									bool  cpuset_clone_children ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  * name ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* User explicitly requested empty subsystem */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									bool  none ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * new_root ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Convert  a  hierarchy  specifier  into  a  bitmask  of  subsystems  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  flags .  Call  with  cgroup_mutex  held  to  protect  the  cgroup_subsys [ ] 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  array .  This  function  takes  refcounts  on  subsystems  to  be  used ,  unless  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  error ,  in  which  case  no  refcounts  are  taken . 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  parse_cgroupfs_options ( char  * data ,  struct  cgroup_sb_opts  * opts )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  * token ,  * o  =  data ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									bool  all_ss  =  false ,  one_ss  =  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-06-17 16:26:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  mask  =  ( unsigned  long ) - 1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-06-17 16:26:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! mutex_is_locked ( & cgroup_mutex ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-06-17 16:26:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# ifdef CONFIG_CPUSETS 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mask  =  ~ ( 1UL  < <  cpuset_subsys_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# endif 
  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									memset ( opts ,  0 ,  sizeof ( * opts ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( ( token  =  strsep ( & o ,  " , " ) )  ! =  NULL )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! * token ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " none " ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* Explicitly have no subsystems */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											opts - > none  =  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " all " ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Mutually exclusive option 'all' + subsystem name */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( one_ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											all_ss  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: introduce sane_behavior mount option
It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.
As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.
Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.
This patch introduces the mount option and changes the following
behaviors in cgroup core.
* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.
* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.
* Remount is disallowed.
The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.
v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
											 
										 
										
											2013-04-14 20:15:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " __DEVEL__sane_behavior " ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											opts - > flags  | =  CGRP_ROOT_SANE_BEHAVIOR ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " noprefix " ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:15:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											opts - > flags  | =  CGRP_ROOT_NOPREFIX ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " clone_children " ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											opts - > cpuset_clone_children  =  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! strcmp ( token ,  " xattr " ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:15:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											opts - > flags  | =  CGRP_ROOT_XATTR ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! strncmp ( token ,  " release_agent= " ,  14 ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											/* Specifying two release agents is forbidden */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( opts - > release_agent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											opts - > release_agent  = 
							 
						 
					
						
							
								
									
										
										
										
											2010-08-10 18:02:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												kstrndup ( token  +  14 ,  PATH_MAX  -  1 ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ! opts - > release_agent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! strncmp ( token ,  " name= " ,  5 ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											const  char  * name  =  token  +  5 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Can't specify an empty name */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! strlen ( name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Must match [\w.-]+ */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											for  ( i  =  0 ;  i  <  strlen ( name ) ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												char  c  =  name [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												if  ( isalnum ( c ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												if  ( ( c  = =  ' . ' )  | |  ( c  = =  ' - ' )  | |  ( c  = =  ' _ ' ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Specifying two names is forbidden */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( opts - > name ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											opts - > name  =  kstrndup ( name , 
							 
						 
					
						
							
								
									
										
										
										
											2010-08-10 18:02:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													      MAX_CGROUP_ROOT_NAMELEN  -  1 , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													      GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! opts - > name ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										for_each_subsys ( ss ,  i )  { 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( strcmp ( token ,  ss - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ss - > disabled ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Mutually exclusive option 'all' + subsystem name */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( all_ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											set_bit ( i ,  & opts - > subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											one_ss  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( i  = =  CGROUP_SUBSYS_COUNT ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - ENOENT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  the  ' all '  option  was  specified  select  all  the  subsystems , 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-27 14:25:55 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  otherwise  if  ' none ' ,  ' name = '  and  a  subsystem  name  options 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  were  not  specified ,  let ' s  default  to  ' all ' 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( all_ss  | |  ( ! one_ss  & &  ! opts - > none  & &  ! opts - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for_each_subsys ( ss ,  i ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! ss - > disabled ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												set_bit ( i ,  & opts - > subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Consistency checks */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: introduce sane_behavior mount option
It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.
As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.
Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.
This patch introduces the mount option and changes the following
behaviors in cgroup core.
* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.
* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.
* Remount is disallowed.
The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.
v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
											 
										 
										
											2013-04-14 20:15:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( opts - > flags  &  CGRP_ROOT_SANE_BEHAVIOR )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										pr_warning ( " cgroup: sane_behavior: this is still under development and its behaviors will change, proceed at your own risk \n " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( opts - > flags  &  CGRP_ROOT_NOPREFIX )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											pr_err ( " cgroup: sane_behavior: noprefix is not allowed \n " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( opts - > cpuset_clone_children )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											pr_err ( " cgroup: sane_behavior: clone_children is not allowed \n " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-06-17 16:26:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Option  noprefix  was  introduced  just  for  backward  compatibility 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  with  the  old  cpuset ,  so  we  allow  noprefix  only  if  mounting  just 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  cpuset  subsystem . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:15:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ( opts - > flags  &  CGRP_ROOT_NOPREFIX )  & &  ( opts - > subsys_mask  &  mask ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-06-17 16:26:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Can't specify "none" and some subsystems */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( opts - > subsys_mask  & &  opts - > none ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  We  either  have  to  specify  by  name  or  by  subsystems .  ( So  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  empty  hierarchies  must  have  a  name ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! opts - > subsys_mask  & &  ! opts - > name ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_remount ( struct  super_block  * sb ,  int  * flags ,  char  * data )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  sb - > s_fs_info ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cgroup_sb_opts  opts ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  added_mask ,  removed_mask ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: introduce sane_behavior mount option
It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.
As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.
Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.
This patch introduces the mount option and changes the following
behaviors in cgroup core.
* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.
* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.
* Remount is disallowed.
The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.
v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
											 
										 
										
											2013-04-14 20:15:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( root - > flags  &  CGRP_ROOT_SANE_BEHAVIOR )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										pr_err ( " cgroup: sane_behavior: remount is not allowed \n " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* See what subsystems are wanted */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  parse_cgroupfs_options ( data ,  & opts ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_unlock ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( opts . subsys_mask  ! =  root - > subsys_mask  | |  opts . release_agent ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pr_warning ( " cgroup: option changes via remount are deprecated (pid=%d comm=%s) \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   task_tgid_nr ( current ) ,  current - > comm ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									added_mask  =  opts . subsys_mask  &  ~ root - > subsys_mask ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									removed_mask  =  root - > subsys_mask  &  ~ opts . subsys_mask ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Don't allow flags or name to change at remount */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-27 19:37:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ( ( opts . flags  ^  root - > flags )  &  CGRP_ROOT_OPTION_MASK )  | | 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									    ( opts . name  & &  strcmp ( opts . name ,  root - > name ) ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-27 19:37:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pr_err ( " cgroup: option or name mismatch, new: 0x%lx  \" %s \" , old: 0x%lx  \" %s \" \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										       opts . flags  &  CGRP_ROOT_OPTION_MASK ,  opts . name  ? :  " " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										       root - > flags  &  CGRP_ROOT_OPTION_MASK ,  root - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_unlock ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* remounting is not allowed for populated hierarchies */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( root - > number_of_cgroups  >  1 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - EBUSY ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: don't change release_agent when remount failed
Remount can fail in either case:
  - wrong mount options is specified, or option 'noprefix' is changed.
  - a to-be-added subsys is already mounted/active.
When using remount to change 'release_agent', for the above former failure
case, remount will return errno with release_agent unchanged, but for the
latter case, remount will return EBUSY with relase_agent changed, which is
unexpected I think:
 # mount -t cgroup -o cpu xxx /cgrp1
 # mount -t cgroup -o cpuset,release_agent=agent1 yyy /cgrp2
 # cat /cgrp2/release_agent
 agent1
 # mount -t cgroup -o remount,cpuset,noprefix,release_agent=agent2 yyy /cgrp2
 mount: /cgrp2 not mounted already, or bad option
 # cat /cgrp2/release_agent
 agent1     <-- ok
 # mount -t cgroup -o remount,cpu,cpuset,release_agent=agent2 yyy /cgrp2
 mount: /cgrp2 is busy
 # cat /cgrp2/release_agent
 agent2     <-- unexpected!
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_unlock ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  rebind_subsystems ( root ,  added_mask ,  removed_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: don't change release_agent when remount failed
Remount can fail in either case:
  - wrong mount options is specified, or option 'noprefix' is changed.
  - a to-be-added subsys is already mounted/active.
When using remount to change 'release_agent', for the above former failure
case, remount will return errno with release_agent unchanged, but for the
latter case, remount will return EBUSY with relase_agent changed, which is
unexpected I think:
 # mount -t cgroup -o cpu xxx /cgrp1
 # mount -t cgroup -o cpuset,release_agent=agent1 yyy /cgrp2
 # cat /cgrp2/release_agent
 agent1
 # mount -t cgroup -o remount,cpuset,noprefix,release_agent=agent2 yyy /cgrp2
 mount: /cgrp2 not mounted already, or bad option
 # cat /cgrp2/release_agent
 agent1     <-- ok
 # mount -t cgroup -o remount,cpu,cpuset,release_agent=agent2 yyy /cgrp2
 mount: /cgrp2 is busy
 # cat /cgrp2/release_agent
 agent2     <-- unexpected!
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_unlock ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( opts . release_agent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										strcpy ( root - > release_agent_path ,  opts . release_agent ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 out_unlock : 
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( opts . release_agent ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( opts . name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-21 17:01:09 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  super_operations  cgroup_ops  =  {  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									. statfs  =  simple_statfs , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. drop_inode  =  generic_delete_inode , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. show_options  =  cgroup_show_options , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. remount_fs  =  cgroup_remount , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  init_cgroup_housekeeping ( struct  cgroup  * cgrp )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > sibling ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > children ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > files ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > cset_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > release_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > pidlists ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_init ( & cgrp - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp - > dummy_css . cgroup  =  cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & cgrp - > event_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_lock_init ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									simple_xattrs_init ( & cgrp - > xattrs ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  void  init_cgroup_root ( struct  cgroupfs_root  * root )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & root - > subsys_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & root - > root_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									root - > number_of_cgroups  =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp - > root  =  root ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									RCU_INIT_POINTER ( cgrp - > name ,  & root_cgroup_name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_cgroup_housekeeping ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 09:50:50 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									idr_init ( & root - > cgroup_idr ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_init_root_id ( struct  cgroupfs_root  * root ,  int  start ,  int  end )  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  id ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									id  =  idr_alloc_cyclic ( & cgroup_hierarchy_idr ,  root ,  start ,  end , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											      GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( id  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  id ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									root - > hierarchy_id  =  id ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_exit_root_id ( struct  cgroupfs_root  * root )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( root - > hierarchy_id )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										idr_remove ( & cgroup_hierarchy_idr ,  root - > hierarchy_id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										root - > hierarchy_id  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_test_super ( struct  super_block  * sb ,  void  * data )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_sb_opts  * opts  =  data ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  sb - > s_fs_info ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* If we asked for a name then it must match */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( opts - > name  & &  strcmp ( opts - > name ,  root - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  we  asked  for  subsystems  ( or  explicitly  for  no 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  subsystems )  then  they  must  match 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ( opts - > subsys_mask  | |  opts - > none ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    & &  ( opts - > subsys_mask  ! =  root - > subsys_mask ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroupfs_root  * cgroup_root_from_opts ( struct  cgroup_sb_opts  * opts )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! opts - > subsys_mask  & &  ! opts - > none ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									root  =  kzalloc ( sizeof ( * root ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ERR_PTR ( - ENOMEM ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									init_cgroup_root ( root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 18:04:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  We  need  to  set  @ root - > subsys_mask  now  so  that  @ root  can  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  matched  by  cgroup_test_super ( )  before  it  finishes 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  initialization ;  otherwise ,  competing  mounts  with  the  same 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  options  may  try  to  bind  the  same  subsystems  instead  of  waiting 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  for  the  first  one  leading  to  unexpected  mount  errors . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  SUBSYS_BOUND  will  be  set  once  actual  binding  is  complete . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									root - > subsys_mask  =  opts - > subsys_mask ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									root - > flags  =  opts - > flags ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( opts - > release_agent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										strcpy ( root - > release_agent_path ,  opts - > release_agent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( opts - > name ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										strcpy ( root - > name ,  opts - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( opts - > cpuset_clone_children ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										set_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & root - > top_cgroup . flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_free_root ( struct  cgroupfs_root  * root )  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( root )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* hierarhcy ID shoulid already have been released */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										WARN_ON_ONCE ( root - > hierarchy_id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 09:50:50 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										idr_destroy ( & root - > cgroup_idr ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										kfree ( root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_set_super ( struct  super_block  * sb ,  void  * data )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_sb_opts  * opts  =  data ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* If we don't have a new root, we can't set up a new sb */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! opts - > new_root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:31 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! opts - > subsys_mask  & &  ! opts - > none ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  set_anon_super ( sb ,  NULL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									sb - > s_fs_info  =  opts - > new_root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									opts - > new_root - > sb  =  sb ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sb - > s_blocksize  =  PAGE_CACHE_SIZE ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sb - > s_blocksize_bits  =  PAGE_CACHE_SHIFT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sb - > s_magic  =  CGROUP_SUPER_MAGIC ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sb - > s_op  =  & cgroup_ops ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_get_rootdir ( struct  super_block  * sb )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2010-12-21 13:29:29 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									static  const  struct  dentry_operations  cgroup_dops  =  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. d_iput  =  cgroup_diput , 
							 
						 
					
						
							
								
									
										
										
										
											2013-10-25 18:47:37 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. d_delete  =  always_delete_dentry , 
							 
						 
					
						
							
								
									
										
										
										
											2010-12-21 13:29:29 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  inode  * inode  = 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgroup_new_inode ( S_IFDIR  |  S_IRUGO  |  S_IXUGO  |  S_IWUSR ,  sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! inode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									inode - > i_fop  =  & simple_dir_operations ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									inode - > i_op  =  & cgroup_dir_inode_operations ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* directories start off with i_nlink == 2 (for "." entry) */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									inc_nlink ( inode ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-08 22:15:13 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									sb - > s_root  =  d_make_root ( inode ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! sb - > s_root ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-12-21 13:29:29 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* for everything else we want ->d_op set */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sb - > s_d_op  =  & cgroup_dops ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-07-26 13:23:11 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  dentry  * cgroup_mount ( struct  file_system_type  * fs_type ,  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											 int  flags ,  const  char  * unused_dev_name , 
							 
						 
					
						
							
								
									
										
										
										
											2010-07-26 13:23:11 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											 void  * data ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_sb_opts  opts ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  super_block  * sb ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroupfs_root  * new_root ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  list_head  tmp_links ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  inode  * inode ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									const  struct  cred  * cred ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* First find the desired set of subsystems */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									ret  =  parse_cgroupfs_options ( data ,  & opts ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_err ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Allocate  a  new  cgroup  root .  We  may  not  need  it  if  we ' re 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  reusing  an  existing  hierarchy . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									new_root  =  cgroup_root_from_opts ( & opts ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( IS_ERR ( new_root ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  PTR_ERR ( new_root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_err ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									opts . new_root  =  new_root ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Locate an existing or new sb for this hierarchy */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-06-25 12:55:37 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									sb  =  sget ( fs_type ,  cgroup_test_super ,  cgroup_set_super ,  0 ,  & opts ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									if  ( IS_ERR ( sb ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  PTR_ERR ( sb ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_free_root ( opts . new_root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_err ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									root  =  sb - > s_fs_info ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( root  = =  opts . new_root )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* We used the new root structure, so this is a new hierarchy */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * root_cgrp  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroupfs_root  * existing_root ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  css_set  * cset ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( sb - > s_root  ! =  NULL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  cgroup_get_rootdir ( sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  drop_new_super ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										inode  =  sb - > s_root - > d_inode ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_lock ( & inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 09:50:50 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										root_cgrp - > id  =  idr_alloc ( & root - > cgroup_idr ,  root_cgrp , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													   0 ,  1 ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( root_cgrp - > id  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  unlock_drop ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* Check for name clashes with existing mounts */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - EBUSY ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( strlen ( root - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											for_each_active_root ( existing_root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												if  ( ! strcmp ( existing_root - > name ,  root - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													goto  unlock_drop ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We ' re  accessing  css_set_count  without  locking 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  css_set_lock  here ,  but  that ' s  OK  -  it  can  only  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  increased  by  someone  holding  cgroup_lock ,  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  that ' s  us .  The  worst  that  can  happen  is  that  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  have  some  link  structures  left  over 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  allocate_cgrp_cset_links ( css_set_count ,  & tmp_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  unlock_drop ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* ID 0 is reserved for dummy root, 1 for unified hierarchy */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  cgroup_init_root_id ( root ,  2 ,  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  unlock_drop ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										sb - > s_root - > d_fsdata  =  root_cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										root_cgrp - > dentry  =  sb - > s_root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We ' re  inside  get_sb ( )  and  will  call  lookup_one_len ( )  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  create  the  root  files ,  which  doesn ' t  work  if  SELinux  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  in  use .   The  following  cred  dancing  somehow  works  around 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  it .   See  2 ce9738ba  ( " cgroupfs: use init_cred when 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  populating  new  cgroupfs  mount " ) for more details. 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cred  =  override_creds ( & init_cred ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  cgroup_addrm_files ( root_cgrp ,  cgroup_base_files ,  true ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  rm_base_files ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  rebind_subsystems ( root ,  root - > subsys_mask ,  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  rm_base_files ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										revert_creds ( cred ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  There  must  be  no  failure  case  after  here ,  since  rebinding 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  takes  care  of  subsystems '  refcounts ,  which  are  explicitly 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  dropped  in  the  failure  exit  path . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_add ( & root - > root_list ,  & cgroup_roots ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgroup_root_count + + ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* Link the top cgroup in this hierarchy into all
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  the  css_set  objects  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										hash_for_each ( css_set_table ,  i ,  cset ,  hlist ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											link_css_set ( & tmp_links ,  cset ,  root_cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										free_cgrp_cset_links ( & tmp_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( ! list_empty ( & root_cgrp - > children ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										BUG_ON ( root - > number_of_cgroups  ! =  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_unlock ( & inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We  re - used  an  existing  hierarchy  -  the  new  root  ( if 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  any )  is  not  needed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_free_root ( opts . new_root ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: introduce sane_behavior mount option
It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.
As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.
Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.
This patch introduces the mount option and changes the following
behaviors in cgroup core.
* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.
* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.
* Remount is disallowed.
The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.
v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
											 
										 
										
											2013-04-14 20:15:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-29 14:06:10 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ( root - > flags  ^  opts . flags )  &  CGRP_ROOT_OPTION_MASK )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-05-26 21:33:09 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ( root - > flags  |  opts . flags )  &  CGRP_ROOT_SANE_BEHAVIOR )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												pr_err ( " cgroup: sane_behavior: new mount options should match the existing superblock \n " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												ret  =  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												goto  drop_new_super ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												pr_warning ( " cgroup: new mount options do not match the existing superblock, will be ignored \n " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: introduce sane_behavior mount option
It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.
As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.
Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.
This patch introduces the mount option and changes the following
behaviors in cgroup core.
* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.
* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.
* Remount is disallowed.
The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.
v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
											 
										 
										
											2013-04-14 20:15:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( opts . release_agent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( opts . name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-07-26 13:23:11 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  dget ( sb - > s_root ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 rm_base_files : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									free_cgrp_cset_links ( & tmp_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_addrm_files ( & root - > top_cgroup ,  cgroup_base_files ,  false ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									revert_creds ( cred ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 unlock_drop : 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_exit_root_id ( root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 drop_new_super : 
							 
						 
					
						
							
								
									
										
										
										
											2009-05-06 01:34:22 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									deactivate_locked_super ( sb ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 out_err : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( opts . release_agent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( opts . name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-07-26 13:23:11 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ERR_PTR ( ret ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_kill_sb ( struct  super_block  * sb )  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  sb - > s_fs_info ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  & root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ,  * tmp_link ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( root - > number_of_cgroups  ! =  1 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! list_empty ( & cgrp - > children ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Rebind all subsystems back to the default hierarchy */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 18:04:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( root - > flags  &  CGRP_ROOT_SUBSYS_BOUND )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  rebind_subsystems ( root ,  0 ,  root - > subsys_mask ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* Shouldn't be able to fail ... */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( ret ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Release  all  the  links  from  cset_links  to  this  hierarchy ' s 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  root  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry_safe ( link ,  tmp_link ,  & cgrp - > cset_links ,  cset_link )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & link - > cset_link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & link - > cgrp_link ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										kfree ( link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-29 14:25:22 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! list_empty ( & root - > root_list ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & root - > root_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_root_count - - ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-29 14:25:22 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:41 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_exit_root_id ( root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 17:07:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									simple_xattrs_free ( & cgrp - > xattrs ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									kill_litter_super ( sb ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_free_root ( root ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  file_system_type  cgroup_fs_type  =  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. name  =  " cgroup " , 
							 
						 
					
						
							
								
									
										
										
										
											2010-07-26 13:23:11 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. mount  =  cgroup_mount , 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									. kill_sb  =  cgroup_kill_sb , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-08-05 13:53:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  kobject  * cgroup_kobj ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_path  -  generate  the  path  of  a  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  in  question 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ buf :  the  buffer  to  write  the  path  into 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ buflen :  the  length  of  the  buffer 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Writes  path  of  cgroup  into  buf .   Returns  0  on  success ,  - errno  on  error . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  We  can ' t  generate  cgroup  path  using  dentry - > d_name ,  as  accessing 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  dentry - > name  must  be  protected  by  irq - unsafe  dentry - > d_lock  or  parent 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  inode ' s  i_mutex ,  while  on  the  other  hand  cgroup_path ( )  can  be  called 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  with  some  irq - safe  spinlocks  held . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_path ( const  struct  cgroup  * cgrp ,  char  * buf ,  int  buflen )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret  =  - ENAMETOOLONG ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									char  * start ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 10:32:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! cgrp - > parent )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( strlcpy ( buf ,  " / " ,  buflen )  > =  buflen ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - ENAMETOOLONG ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-08 21:36:38 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									start  =  buf  +  buflen  -  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									* start  =  ' \0 ' ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-04-22 17:29:24 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 10:32:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										const  char  * name  =  cgroup_name ( cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										int  len ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										len  =  strlen ( name ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( ( start  - =  len )  <  buf ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											goto  out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										memcpy ( start ,  name ,  len ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-04-22 17:29:24 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( - - start  <  buf ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											goto  out ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										* start  =  ' / ' ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgrp  =  cgrp - > parent ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 10:32:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  while  ( cgrp - > parent ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  0 ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									memmove ( buf ,  start ,  buf  +  buflen  -  start ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:11 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_path ) ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:50:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-07-11 16:34:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  task_cgroup_path  -  cgroup  path  of  a  task  in  the  first  cgroup  hierarchy 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:50:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ task :  target  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ buf :  the  buffer  to  write  the  path  into 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ buflen :  the  length  of  the  buffer 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-11 16:34:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Determine  @ task ' s  cgroup  on  the  first  ( the  one  with  the  lowest  non - zero 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  hierarchy_id )  cgroup  hierarchy  and  copy  its  path  into  @ buf .   This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  grabs  cgroup_mutex  and  shouldn ' t  be  used  inside  locks  used  by 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup  controller  callbacks . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Returns  0  on  success ,  fails  with  - % ENAMETOOLONG  if  @ buflen  is  too  short . 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:50:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-11 16:34:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  task_cgroup_path ( struct  task_struct  * task ,  char  * buf ,  size_t  buflen )  
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:50:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-11 16:34:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  hierarchy_id  =  1 ,  ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( buflen  <  2 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENAMETOOLONG ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:50:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-11 16:34:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									root  =  idr_get_next ( & cgroup_hierarchy_idr ,  & hierarchy_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:50:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( root )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgrp  =  task_cgroup_from_root ( task ,  root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  cgroup_path ( cgrp ,  buf ,  buflen ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-11 16:34:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* if no hierarchy exists, everyone is in "/" */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										memcpy ( buf ,  " / " ,  2 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:50:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2013-07-11 16:34:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( task_cgroup_path ) ;  
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:50:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Control  Group  taskset 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  task_and_cgroup  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct 	* task ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup 		* cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 16:18:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set 		* cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  cgroup_taskset  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_and_cgroup 	single ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  flex_array 	* tc_array ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int 			tc_array_len ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int 			idx ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup 		* cur_cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_taskset_first  -  reset  taskset  and  return  the  first  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tset :  taskset  of  interest 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tset  iteration  is  initialized  and  the  first  task  is  returned . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  task_struct  * cgroup_taskset_first ( struct  cgroup_taskset  * tset )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( tset - > tc_array )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										tset - > idx  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cgroup_taskset_next ( tset ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										tset - > cur_cgrp  =  tset - > single . cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  tset - > single . task ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_taskset_first ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_taskset_next  -  iterate  to  the  next  task  in  taskset 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tset :  taskset  of  interest 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Return  the  next  task  in  @ tset .   Iteration  must  have  been  initialized 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  with  cgroup_taskset_first ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  task_struct  * cgroup_taskset_next ( struct  cgroup_taskset  * tset )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_and_cgroup  * tc ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! tset - > tc_array  | |  tset - > idx  > =  tset - > tc_array_len ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tc  =  flex_array_get ( tset - > tc_array ,  tset - > idx + + ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tset - > cur_cgrp  =  tc - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  tc - > task ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_taskset_next ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_taskset_cur_css  -  return  the  matching  css  for  the  current  task 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ tset :  taskset  of  interest 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ subsys_id :  the  ID  of  the  target  subsystem 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Return  the  css  for  the  current  ( last  returned )  task  of  @ tset  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  subsystem  specified  by  @ subsys_id .   This  function  must  be  preceded  by 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  either  cgroup_taskset_first ( )  or  cgroup_taskset_next ( ) . 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  cgroup_subsys_state  * cgroup_taskset_cur_css ( struct  cgroup_taskset  * tset ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
														   int  subsys_id ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  cgroup_css ( tset - > cur_cgrp ,  cgroup_subsys [ subsys_id ] ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_taskset_cur_css ) ;  
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_taskset_size  -  return  the  number  of  tasks  in  taskset 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tset :  taskset  of  interest 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  cgroup_taskset_size ( struct  cgroup_taskset  * tset )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  tset - > tc_array  ?  tset - > tc_array_len  :  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_taskset_size ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_task_migrate  -  move  a  task  from  one  cgroup  to  another . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-20 22:06:18 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Must  be  called  with  cgroup_mutex  and  threadgroup  locked . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_task_migrate ( struct  cgroup  * old_cgrp ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												struct  task_struct  * tsk , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												struct  css_set  * new_cset ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * old_cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-21 20:18:35 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  We  are  synchronized  through  threadgroup_lock ( )  against  PF_EXITING 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  setting  such  that  we  can ' t  race  against  cgroup_exit ( )  changing  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css_set  to  init_css_set  and  dropping  the  old  one . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-21 20:03:18 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									WARN_ON_ONCE ( tsk - > flags  &  PF_EXITING ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									old_cset  =  task_css_set ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									task_lock ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_assign_pointer ( tsk - > cgroups ,  new_cset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									task_unlock ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Update the css_set linked lists if we're using them */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! list_empty ( & tsk - > cg_list ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_move ( & tsk - > cg_list ,  & new_cset - > tasks ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  We  just  gained  a  reference  on  old_cset  by  taking  it  from  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  task .  As  trading  it  for  new_cset  is  protected  by  cgroup_mutex , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we ' re  safe  to  drop  it  here ;  it  will  be  freed  under  RCU . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									set_bit ( CGRP_RELEASABLE ,  & old_cgrp - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									put_css_set ( old_cset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-03-13 09:17:09 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_attach_task  -  attach  a  task  or  a  whole  threadgroup  to  a  cgroup 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  to  attach  to 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-13 09:17:09 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ tsk :  the  task  or  the  leader  of  the  threadgroup  to  be  attached 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ threadgroup :  attach  the  whole  threadgroup ? 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Call  holding  cgroup_mutex  and  the  group_rwsem  of  the  leader .  Will  take 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-13 09:17:09 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  task_lock  of  @ tsk  or  each  thread  in  the  threadgroup  individually  in  turn . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_attach_task ( struct  cgroup  * cgrp ,  struct  task_struct  * tsk ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											      bool  threadgroup ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  retval ,  i ,  group_size ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ,  * failed_ss  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  cgrp - > root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* threadgroup list cursor and array */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-13 09:17:09 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  task_struct  * leader  =  tsk ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  task_and_cgroup  * tc ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  flex_array  * group ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_taskset  tset  =  {  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  step  0 :  in  order  to  do  expensive ,  possibly  blocking  operations  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  every  thread ,  we  cannot  iterate  the  thread  group  list ,  since  it  needs 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  rcu  or  tasklist  locked .  instead ,  build  an  array  of  all  threads  in  the 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  group  -  group_rwsem  prevents  new  threads  from  appearing ,  and  if 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  threads  exit ,  this  will  just  be  an  over - estimate . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-13 09:17:09 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( threadgroup ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										group_size  =  get_nr_threads ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										group_size  =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* flex_array supports very large thread-groups better than kmalloc. */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									group  =  flex_array_alloc ( sizeof ( * tc ) ,  group_size ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! group ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* pre-allocate to guarantee space while iterating in rcu read-side. */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-12 15:36:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									retval  =  flex_array_prealloc ( group ,  0 ,  group_size ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_free_group_list ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									i  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:31 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Prevent  freeing  of  tasks  while  we  take  a  snapshot .  Tasks  that  are 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  already  PF_EXITING  could  be  freed  from  underneath  us  unless  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  take  an  rcu_read_lock . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  task_and_cgroup  ent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* @tsk either already exited or can't exit until the end */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( tsk - > flags  &  PF_EXITING ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-10-12 10:59:17 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											goto  next ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* as per above, nr_threads may decrease, but not increase. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( i  > =  group_size ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ent . task  =  tsk ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ent . cgrp  =  task_cgroup_from_root ( tsk ,  root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-21 20:18:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* nothing to do if this task is already in the cgroup */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ent . cgrp  = =  cgrp ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-10-12 10:59:17 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											goto  next ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  saying  GFP_ATOMIC  has  no  effect  here  because  we  did  prealloc 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  earlier ,  but  it ' s  good  form  to  communicate  our  expectations . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										retval  =  flex_array_put ( group ,  i ,  & ent ,  GFP_ATOMIC ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( retval  ! =  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										i + + ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-10-12 10:59:17 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									next : 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-13 09:17:09 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! threadgroup ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  while_each_thread ( leader ,  tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:31 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* remember the number of threads in the array for later. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									group_size  =  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									tset . tc_array  =  group ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tset . tc_array_len  =  group_size ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* methods shouldn't be called if no task is actually migrating */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-21 20:18:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! group_size ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-21 20:18:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_free_group_list ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  step  1 :  check  that  we  can  legitimately  attach  to  the  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_root_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css  =  cgroup_css ( cgrp ,  ss ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ss - > can_attach )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											retval  =  ss - > can_attach ( css ,  & tset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( retval )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												failed_ss  =  ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												goto  out_cancel_attach ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  step  2 :  make  sure  css_sets  exist  for  all  threads  to  be  migrated . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we  use  find_css_set ,  which  allocates  a  new  one  if  necessary . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  group_size ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  css_set  * old_cset ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tc  =  flex_array_get ( group ,  i ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										old_cset  =  task_css_set ( tc - > task ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 16:18:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tc - > cset  =  find_css_set ( old_cset ,  cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! tc - > cset )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											retval  =  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  out_put_css_set_refs ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:22 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  step  3 :  now  that  we ' re  guaranteed  success  wrt  the  css_sets , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  proceed  to  move  all  tasks  to  the  new  cgroup .   There  are  no 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  failure  cases  after  here ,  so  this  is  the  commit  point . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( i  =  0 ;  i  <  group_size ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tc  =  flex_array_get ( group ,  i ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 16:18:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_task_migrate ( tc - > cgrp ,  tc - > task ,  tc - > cset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* nothing is sensitive to fork() after this point. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:22 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  step  4 :  do  subsystem  attach  callbacks . 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_root_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css  =  cgroup_css ( cgrp ,  ss ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ss - > attach ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ss - > attach ( css ,  & tset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  step  5 :  success !  and  cleanup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out_put_css_set_refs :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( retval )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for  ( i  =  0 ;  i  <  group_size ;  i + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											tc  =  flex_array_get ( group ,  i ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 16:18:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ! tc - > cset ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 16:18:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											put_css_set ( tc - > cset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-30 12:51:56 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out_cancel_attach :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( retval )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										for_each_root_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											struct  cgroup_subsys_state  * css  =  cgroup_css ( cgrp ,  ss ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:22 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ss  = =  failed_ss ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ss - > cancel_attach ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												ss - > cancel_attach ( css ,  & tset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out_free_group_list :  
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:21 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									flex_array_free ( group ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Find  the  task_struct  of  the  task  to  attach  by  vpid  and  pass  it  along  to  the 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  function  to  attach  either  it  or  all  tasks  in  its  threadgroup .  Will  lock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_mutex  and  threadgroup ;  may  take  task_lock  of  task . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  attach_task_by_pid ( struct  cgroup  * cgrp ,  u64  pid ,  bool  threadgroup )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * tsk ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-14 10:39:19 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									const  struct  cred  * cred  =  current_cred ( ) ,  * tcred ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! cgroup_lock_live_group ( cgrp ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENODEV ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								retry_find_task :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( pid )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:47 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tsk  =  find_task_by_vpid ( pid ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! tsk )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ret =  - ESRCH ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  out_unlock_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  even  if  we ' re  attaching  all  tasks  in  the  thread  group ,  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  only  need  to  check  permissions  on  one  of  them . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-14 10:39:19 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tcred  =  __task_cred ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-03-12 15:44:39 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! uid_eq ( cred - > euid ,  GLOBAL_ROOT_UID )  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										    ! uid_eq ( cred - > euid ,  tcred - > uid )  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										    ! uid_eq ( cred - > euid ,  tcred - > suid ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-14 10:39:19 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ret  =  - EACCES ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  out_unlock_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										tsk  =  current ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( threadgroup ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										tsk  =  tsk - > group_leader ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-21 09:13:46 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-19 13:45:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Workqueue  threads  may  acquire  PF_NO_SETAFFINITY  and  become 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-21 09:13:46 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  trapped  in  a  cpuset ,  or  RT  worker  may  be  born  in  a  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  with  no  rt_runtime  allocated .   Just  say  no . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-19 13:45:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( tsk  = =  kthreadd_task  | |  ( tsk - > flags  &  PF_NO_SETAFFINITY ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-21 09:13:46 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_unlock_cgroup ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									get_task_struct ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									threadgroup_lock ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( threadgroup )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! thread_group_leader ( tsk ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  a  race  with  de_thread  from  another  thread ' s  exec ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  may  strip  us  of  our  leadership ,  if  this  happens , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  there  is  no  choice  but  to  throw  this  task  away  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  try  again ;  this  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  " double-double-toil-and-trouble-check locking " . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											threadgroup_unlock ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											put_task_struct ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  retry_find_task ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-13 09:17:09 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  cgroup_attach_task ( cgrp ,  tsk ,  threadgroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									threadgroup_unlock ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									put_task_struct ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-03 21:18:30 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out_unlock_cgroup :  
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_attach_task_all  -  attach  task  ' tsk '  to  all  cgroups  of  task  ' from ' 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ from :  attach  to  all  cgroups  of  a  given  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tsk :  the  task  to  be  attached 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  cgroup_attach_task_all ( struct  task_struct  * from ,  struct  task_struct  * tsk )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  retval  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_active_root ( root )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 16:18:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * from_cgrp  =  task_cgroup_from_root ( from ,  root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 16:18:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										retval  =  cgroup_attach_task ( from_cgrp ,  tsk ,  false ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_attach_task_all ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_tasks_write ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											      struct  cftype  * cft ,  u64  pid ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  attach_task_by_pid ( css - > cgroup ,  pid ,  false ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_procs_write ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											      struct  cftype  * cft ,  u64  tgid ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  attach_task_by_pid ( css - > cgroup ,  tgid ,  true ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_release_agent_write ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												      struct  cftype  * cft ,  const  char  * buffer ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUILD_BUG_ON ( sizeof ( css - > cgroup - > root - > release_agent_path )  <  PATH_MAX ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( strlen ( buffer )  > =  PATH_MAX ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! cgroup_lock_live_group ( css - > cgroup ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  - ENODEV ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									strcpy ( css - > cgroup - > root - > release_agent_path ,  buffer ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-12 18:12:21 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_release_agent_show ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												     struct  cftype  * cft ,  struct  seq_file  * seq ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  css - > cgroup ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! cgroup_lock_live_group ( cgrp ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENODEV ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									seq_puts ( seq ,  cgrp - > root - > release_agent_path ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									seq_putc ( seq ,  ' \n ' ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_sane_behavior_show ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												     struct  cftype  * cft ,  struct  seq_file  * seq ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: introduce sane_behavior mount option
It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.
As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.
Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.
This patch introduces the mount option and changes the following
behaviors in cgroup core.
* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.
* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.
* Remount is disallowed.
The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.
v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
											 
										 
										
											2013-04-14 20:15:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									seq_printf ( seq ,  " %d \n " ,  cgroup_sane_behavior ( css - > cgroup ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* A buffer size big enough for numbers or short strings */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define CGROUP_LOCAL_BUFFER_SIZE 64 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  ssize_t  cgroup_write_X64 ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												struct  cftype  * cft ,  struct  file  * file , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												const  char  __user  * userbuf ,  size_t  nbytes , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												loff_t  * unused_ppos ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  buffer [ CGROUP_LOCAL_BUFFER_SIZE ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  retval  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									char  * end ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! nbytes ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( nbytes  > =  sizeof ( buffer ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - E2BIG ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( copy_from_user ( buffer ,  userbuf ,  nbytes ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EFAULT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									buffer [ nbytes ]  =  0 ;      /* nul-terminate */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > write_u64 )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-10-26 16:49:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										u64  val  =  simple_strtoull ( strstrip ( buffer ) ,  & end ,  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( * end ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										retval  =  cft - > write_u64 ( css ,  cft ,  val ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-10-26 16:49:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										s64  val  =  simple_strtoll ( strstrip ( buffer ) ,  & end ,  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( * end ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										retval  =  cft - > write_s64 ( css ,  cft ,  val ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  nbytes ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  ssize_t  cgroup_write_string ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   struct  cftype  * cft ,  struct  file  * file , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   const  char  __user  * userbuf ,  size_t  nbytes , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   loff_t  * unused_ppos ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  local_buffer [ CGROUP_LOCAL_BUFFER_SIZE ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  retval  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									size_t  max_bytes  =  cft - > max_write_len ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									char  * buffer  =  local_buffer ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! max_bytes ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										max_bytes  =  sizeof ( local_buffer )  -  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( nbytes  > =  max_bytes ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - E2BIG ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Allocate a dynamic buffer if we need one */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( nbytes  > =  sizeof ( local_buffer ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										buffer  =  kmalloc ( nbytes  +  1 ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( buffer  = =  NULL ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-29 22:33:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( nbytes  & &  copy_from_user ( buffer ,  userbuf ,  nbytes ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  - EFAULT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									buffer [ nbytes ]  =  0 ;      /* nul-terminate */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									retval  =  cft - > write_string ( css ,  cft ,  strstrip ( buffer ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  nbytes ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-29 22:33:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out :  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( buffer  ! =  local_buffer ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( buffer ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  ssize_t  cgroup_file_write ( struct  file  * file ,  const  char  __user  * buf ,  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												 size_t  nbytes ,  loff_t  * ppos ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cfent  * cfe  =  __d_cfe ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft  =  __d_cft ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  =  cfe - > css ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > write ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cft - > write ( css ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > write_u64  | |  cft - > write_s64 ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cgroup_write_X64 ( css ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > write_string ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cgroup_write_string ( css ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > trigger )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										int  ret  =  cft - > trigger ( css ,  ( unsigned  int ) cft - > private ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  ret  ?  ret  :  nbytes ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  - EINVAL ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  ssize_t  cgroup_read_u64 ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       struct  cftype  * cft ,  struct  file  * file , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       char  __user  * buf ,  size_t  nbytes ,  loff_t  * ppos ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  tmp [ CGROUP_LOCAL_BUFFER_SIZE ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									u64  val  =  cft - > read_u64 ( css ,  cft ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  len  =  sprintf ( tmp ,  " %llu \n " ,  ( unsigned  long  long )  val ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_read_from_buffer ( buf ,  nbytes ,  ppos ,  tmp ,  len ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  ssize_t  cgroup_read_s64 ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       struct  cftype  * cft ,  struct  file  * file , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       char  __user  * buf ,  size_t  nbytes ,  loff_t  * ppos ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  tmp [ CGROUP_LOCAL_BUFFER_SIZE ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									s64  val  =  cft - > read_s64 ( css ,  cft ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  len  =  sprintf ( tmp ,  " %lld \n " ,  ( long  long )  val ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_read_from_buffer ( buf ,  nbytes ,  ppos ,  tmp ,  len ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  ssize_t  cgroup_file_read ( struct  file  * file ,  char  __user  * buf ,  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												size_t  nbytes ,  loff_t  * ppos ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cfent  * cfe  =  __d_cfe ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft  =  __d_cft ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  =  cfe - > css ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cft - > read ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cft - > read ( css ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > read_u64 ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cgroup_read_u64 ( css ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > read_s64 ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cgroup_read_s64 ( css ,  cft ,  file ,  buf ,  nbytes ,  ppos ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  seqfile  ops / methods  for  returning  structured  data .  Currently  just 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  supports  string - > u64  maps ,  but  can  be  extended  in  future . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_map_add ( struct  cgroup_map_cb  * cb ,  const  char  * key ,  u64  value )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  seq_file  * sf  =  cb - > state ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  seq_printf ( sf ,  " %s %llu \n " ,  key ,  ( unsigned  long  long ) value ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_seqfile_show ( struct  seq_file  * m ,  void  * arg )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-07-31 17:36:25 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cfent  * cfe  =  m - > private ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft  =  cfe - > type ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  =  cfe - > css ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 17:36:25 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > read_map )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup_map_cb  cb  =  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											. fill  =  cgroup_map_add , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											. state  =  m , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cft - > read_map ( css ,  cft ,  & cb ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  cft - > read_seq_string ( css ,  cft ,  m ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-10-01 15:43:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  file_operations  cgroup_seqfile_operations  =  {  
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. read  =  seq_read , 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. write  =  cgroup_file_write , 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. llseek  =  seq_lseek , 
							 
						 
					
						
							
								
									
										
										
										
											2013-11-27 18:16:21 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. release  =  cgroup_file_release , 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_file_open ( struct  inode  * inode ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cfent  * cfe  =  __d_cfe ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft  =  __d_cft ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  __d_cgrp ( cfe - > dentry - > d_parent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									err  =  generic_file_open ( inode ,  file ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  err ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  the  file  belongs  to  a  subsystem ,  pin  the  css .   Will  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  unpinned  either  on  open  failure  or  release .   This  ensures  that 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  @ css  stays  alive  for  all  file  operations . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css  =  cgroup_css ( cgrp ,  cft - > ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cft - > ss  & &  ! css_tryget ( css ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										css  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-15 11:42:36 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! css ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  - ENODEV ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-15 11:42:36 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  @ cfe - > css  is  used  by  read / write / close  to  determine  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  associated  css .   @ file - > private_data  would  be  a  better  place  but 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  that ' s  already  used  by  seqfile .   Multiple  accessors  may  use  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  simultaneously  which  is  okay  as  the  association  never  changes . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									WARN_ON_ONCE ( cfe - > css  & &  cfe - > css  ! =  css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cfe - > css  =  css ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > read_map  | |  cft - > read_seq_string )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										file - > f_op  =  & cgroup_seqfile_operations ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 17:36:25 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										err  =  single_open ( file ,  cgroup_seqfile_show ,  cfe ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  if  ( cft - > open )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										err  =  cft - > open ( inode ,  file ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 17:36:25 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( css - > ss  & &  err ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css_put ( css ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_file_release ( struct  inode  * inode ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cfent  * cfe  =  __d_cfe ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cftype  * cft  =  __d_cft ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  =  cfe - > css ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									if  ( cft - > release ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  cft - > release ( inode ,  file ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( css - > ss ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css_put ( css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-11-27 18:16:21 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( file - > f_op  = =  & cgroup_seqfile_operations ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										single_release ( inode ,  file ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_rename  -  Only  allow  simple  rename  of  directories  in  place . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_rename ( struct  inode  * old_dir ,  struct  dentry  * old_dentry ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											    struct  inode  * new_dir ,  struct  dentry  * new_dentry ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_name  * name ,  * old_name ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  It ' s  convinient  to  use  parent  dir ' s  i_mutex  to  protected 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgrp - > name . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & old_dir - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									if  ( ! S_ISDIR ( old_dentry - > d_inode - > i_mode ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOTDIR ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( new_dentry - > d_inode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EEXIST ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( old_dir  ! =  new_dir ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EIO ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp  =  __d_cgrp ( old_dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-14 11:18:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  This  isn ' t  a  proper  migration  and  its  usefulness  is  very 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  limited .   Disallow  if  sane_behavior . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cgroup_sane_behavior ( cgrp ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EPERM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									name  =  cgroup_alloc_name ( new_dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! name ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  simple_rename ( old_dir ,  old_dentry ,  new_dir ,  new_dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ret )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									old_name  =  rcu_dereference_protected ( cgrp - > name ,  true ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_assign_pointer ( cgrp - > name ,  name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree_rcu ( old_name ,  rcu_head ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  simple_xattrs  * __d_xattrs ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( S_ISDIR ( dentry - > d_inode - > i_mode ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  & __d_cgrp ( dentry ) - > xattrs ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-18 23:09:52 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  & __d_cfe ( dentry ) - > xattrs ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  inline  int  xattr_enabled ( struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  dentry - > d_sb - > s_fs_info ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 20:15:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  root - > flags  &  CGRP_ROOT_XATTR ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  bool  is_valid_xattr ( const  char  * name )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! strncmp ( name ,  XATTR_TRUSTED_PREFIX ,  XATTR_TRUSTED_PREFIX_LEN )  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    ! strncmp ( name ,  XATTR_SECURITY_PREFIX ,  XATTR_SECURITY_PREFIX_LEN ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_setxattr ( struct  dentry  * dentry ,  const  char  * name ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   const  void  * val ,  size_t  size ,  int  flags ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! xattr_enabled ( dentry ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EOPNOTSUPP ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! is_valid_xattr ( name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_xattr_set ( __d_xattrs ( dentry ) ,  name ,  val ,  size ,  flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_removexattr ( struct  dentry  * dentry ,  const  char  * name )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! xattr_enabled ( dentry ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EOPNOTSUPP ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! is_valid_xattr ( name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_xattr_remove ( __d_xattrs ( dentry ) ,  name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  ssize_t  cgroup_getxattr ( struct  dentry  * dentry ,  const  char  * name ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											       void  * buf ,  size_t  size ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! xattr_enabled ( dentry ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EOPNOTSUPP ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! is_valid_xattr ( name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_xattr_get ( __d_xattrs ( dentry ) ,  name ,  buf ,  size ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  ssize_t  cgroup_listxattr ( struct  dentry  * dentry ,  char  * buf ,  size_t  size )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! xattr_enabled ( dentry ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EOPNOTSUPP ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  simple_xattr_list ( __d_xattrs ( dentry ) ,  buf ,  size ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-10-01 15:43:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  file_operations  cgroup_file_operations  =  {  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									. read  =  cgroup_file_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. write  =  cgroup_file_write , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. llseek  =  generic_file_llseek , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. open  =  cgroup_file_open , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. release  =  cgroup_file_release , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  inode_operations  cgroup_file_inode_operations  =  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. setxattr  =  cgroup_setxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. getxattr  =  cgroup_getxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. listxattr  =  cgroup_listxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. removexattr  =  cgroup_removexattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-21 17:01:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  inode_operations  cgroup_dir_inode_operations  =  {  
						 
					
						
							
								
									
										
										
										
											2013-07-14 17:50:23 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. lookup  =  simple_lookup , 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									. mkdir  =  cgroup_mkdir , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. rmdir  =  cgroup_rmdir , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. rename  =  cgroup_rename , 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. setxattr  =  cgroup_setxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. getxattr  =  cgroup_getxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. listxattr  =  cgroup_listxattr , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. removexattr  =  cgroup_removexattr , 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Check  if  a  file  is  a  control  file 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  inline  struct  cftype  * __file_cft ( struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-01-23 17:07:38 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( file_inode ( file ) - > i_fop  ! =  & cgroup_file_operations ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  ERR_PTR ( - EINVAL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  __d_cft ( file - > f_dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_create_file ( struct  dentry  * dentry ,  umode_t  mode ,  
						 
					
						
							
								
									
										
										
										
											2011-01-07 17:49:20 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												struct  super_block  * sb ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  inode  * inode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! dentry ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOENT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( dentry - > d_inode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EEXIST ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									inode  =  cgroup_new_inode ( mode ,  sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! inode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( S_ISDIR ( mode ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_op  =  & cgroup_dir_inode_operations ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_fop  =  & simple_dir_operations ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* start off with i_nlink == 2 (for "." entry) */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inc_nlink ( inode ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										inc_nlink ( dentry - > d_parent - > d_inode ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Control  reaches  here  with  cgroup_mutex  held . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  @ inode - > i_mutex  should  nest  outside  cgroup_mutex  but  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  want  to  populate  it  immediately  without  releasing 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  cgroup_mutex .   As  @ inode  isn ' t  visible  to  anyone  else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  yet ,  trylock  will  always  succeed  without  affecting 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  lockdep  checks . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										WARN_ON_ONCE ( ! mutex_trylock ( & inode - > i_mutex ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									}  else  if  ( S_ISREG ( mode ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_size  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode - > i_fop  =  & cgroup_file_operations ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										inode - > i_op  =  & cgroup_file_inode_operations ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									d_instantiate ( dentry ,  inode ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dget ( dentry ) ; 	/* Extra count - pin the dentry in core */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:29 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_file_mode  -  deduce  file  mode  of  a  control  file 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cft :  the  control  file  in  question 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  cft - > mode  if  - > mode  is  not  0 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  S_IRUGO | S_IWUSR  if  it  has  both  a  read  and  a  write  handler 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  S_IRUGO  if  it  has  only  a  read  handler 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  S_IWUSR  if  it  has  only  a  write  hander 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  umode_t  cgroup_file_mode ( const  struct  cftype  * cft )  
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:29 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									umode_t  mode  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-04-02 16:57:29 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cft - > mode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  cft - > mode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cft - > read  | |  cft - > read_u64  | |  cft - > read_s64  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    cft - > read_map  | |  cft - > read_seq_string ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mode  | =  S_IRUGO ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cft - > write  | |  cft - > write_u64  | |  cft - > write_s64  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    cft - > write_string  | |  cft - > trigger ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mode  | =  S_IWUSR ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  mode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_add_file ( struct  cgroup  * cgrp ,  struct  cftype  * cft )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  dentry  * dir  =  cgrp - > dentry ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * parent  =  __d_cgrp ( dir ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  dentry  * dentry ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cfent  * cfe ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  error ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									umode_t  mode ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									char  name [ MAX_CGROUP_TYPE_NAMELEN  +  MAX_CFTYPE_NAME  +  2 ]  =  {  0  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cft - > ss  & &  ! ( cft - > flags  &  CFTYPE_NO_PREFIX )  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    ! ( cgrp - > root - > flags  &  CGRP_ROOT_NOPREFIX ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										strcpy ( name ,  cft - > ss - > name ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										strcat ( name ,  " . " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									strcat ( name ,  cft - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! mutex_is_locked ( & dir - > d_inode - > i_mutex ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cfe  =  kzalloc ( sizeof ( * cfe ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cfe ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									dentry  =  lookup_one_len ( name ,  dir ,  strlen ( name ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( IS_ERR ( dentry ) )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										error  =  PTR_ERR ( dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-05-14 19:44:20 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cfe - > type  =  ( void  * ) cft ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cfe - > dentry  =  dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dentry - > d_fsdata  =  cfe ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									simple_xattrs_init ( & cfe - > xattrs ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mode  =  cgroup_file_mode ( cft ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									error  =  cgroup_create_file ( dentry ,  mode  |  S_IFREG ,  cgrp - > root - > sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! error )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_add_tail ( & cfe - > node ,  & parent - > files ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cfe  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dput ( dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( cfe ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  error ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:10 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_addrm_files  -  add  or  remove  files  to  a  cgroup  directory 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  target  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cfts :  array  of  cftypes  to  be  added 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ is_add :  whether  to  add  or  remove 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Depending  on  @ is_add ,  add  or  remove  files  defined  by  @ cfts  on  @ cgrp . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  For  removals ,  this  function  never  fails .   If  addition  fails ,  this 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  doesn ' t  remove  files  already  added .   The  caller  is  responsible 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  for  cleaning  up . 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:10 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_addrm_files ( struct  cgroup  * cgrp ,  struct  cftype  cfts [ ] ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											      bool  is_add ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cftype  * cft ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:10 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( cft  =  cfts ;  cft - > name [ 0 ]  ! =  ' \0 ' ;  cft + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-12-06 14:38:57 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* does cft->flags tell us to skip this file on @cgrp? */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: introduce sane_behavior mount option
It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.
As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.
Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.
This patch introduces the mount option and changes the following
behaviors in cgroup core.
* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.
* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.
* Remount is disallowed.
The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.
v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
											 
										 
										
											2013-04-14 20:15:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ( cft - > flags  &  CFTYPE_INSANE )  & &  cgroup_sane_behavior ( cgrp ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-12-06 14:38:57 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ( cft - > flags  &  CFTYPE_NOT_ON_ROOT )  & &  ! cgrp - > parent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ( cft - > flags  &  CFTYPE_ONLY_ON_ROOT )  & &  cgrp - > parent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( is_add )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ret  =  cgroup_add_file ( cgrp ,  cft ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:10 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ret )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												pr_warn ( " cgroup_addrm_files: failed to add %s, err=%d \n " , 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:10 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													cft - > name ,  ret ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-21 18:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cgroup_rm_file ( cgrp ,  cft ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:10 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_cfts_prepare ( void )  
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:48:37 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__acquires ( & cgroup_mutex ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Thanks  to  the  entanglement  with  vfs  inode  locking ,  we  can ' t  walk 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  existing  cgroups  under  cgroup_mutex  and  create  files . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Instead ,  we  use  css_for_each_descendant_pre ( )  and  drop  RCU  read 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  lock  before  calling  cgroup_addrm_files ( ) . 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_cfts_commit ( struct  cftype  * cfts ,  bool  is_add )  
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:48:37 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__releases ( & cgroup_mutex ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									LIST_HEAD ( pending ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss  =  cfts [ 0 ] . ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * root  =  & ss - > root - > top_cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:40:19 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  super_block  * sb  =  ss - > root - > sb ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:48:37 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  dentry  * prev  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  inode  * inode ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 11:14:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									u64  update_before ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* %NULL @cfts indicates abort and don't bother if @ss isn't attached */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! cfts  | |  ss - > root  = =  & cgroup_dummy_root  | | 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:48:37 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									    ! atomic_inc_not_zero ( & sb - > s_active ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:48:37 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  All  cgroups  which  are  created  after  we  drop  cgroup_mutex  will 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  have  the  updated  set  of  files ,  so  we  only  need  to  update  the 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 11:14:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  cgroups  created  before  the  current  @ cgroup_serial_nr_next . 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 11:14:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									update_before  =  cgroup_serial_nr_next ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:48:37 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* add/rm files for all cgroups created before */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_for_each_descendant_pre ( css ,  cgroup_css ( root ,  ss ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * cgrp  =  css - > cgroup ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:48:37 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( cgroup_is_dead ( cgrp ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										inode  =  cgrp - > dentry - > d_inode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dget ( cgrp - > dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dput ( prev ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										prev  =  cgrp - > dentry ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_lock ( & inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 11:14:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( cgrp - > serial_nr  <  update_before  & &  ! cgroup_is_dead ( cgrp ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ret  =  cgroup_addrm_files ( cgrp ,  cfts ,  is_add ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:48:37 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:48:37 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dput ( prev ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									deactivate_super ( sb ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_add_cftypes  -  add  an  array  of  cftypes  to  a  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  target  cgroup  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cfts :  zero - length  name  terminated  array  of  cftypes 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Register  @ cfts  to  @ ss .   Files  described  by  @ cfts  are  created  for  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  existing  cgroups  to  which  @ ss  is  attached  and  all  future  cgroups  will 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  have  them  too .   This  function  can  be  called  anytime  whether  @ ss  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  attached  or  not . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Returns  0  on  successful  registration ,  - errno  on  failure .   Note  that  this 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  currently  returns  0  as  long  as  @ cfts  registration  is  successful 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  even  if  some  file  creation  attempts  on  existing  cgroups  fail . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add xattr support
This is one of the items in the plumber's wish list.
For use cases:
>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart
v7:
- no changes
v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
											 
										 
										
											2012-08-23 16:53:30 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_add_cftypes ( struct  cgroup_subsys  * ss ,  struct  cftype  * cfts )  
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype_set  * set ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cftype  * cft ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									set  =  kzalloc ( sizeof ( * set ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! set ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for  ( cft  =  cfts ;  cft - > name [ 0 ]  ! =  ' \0 ' ;  cft + + ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cft - > ss  =  ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_cfts_prepare ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									set - > cfts  =  cfts ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add_tail ( & set - > node ,  & ss - > cftsets ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  cgroup_cfts_commit ( cfts ,  true ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_rm_cftypes ( cfts ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_add_cftypes ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_rm_cftypes  -  remove  an  array  of  cftypes  from  a  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cfts :  zero - length  name  terminated  array  of  cftypes 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Unregister  @ cfts .   Files  described  by  @ cfts  are  removed  from  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  existing  cgroups  and  all  future  cgroups  won ' t  have  them  either .   This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  can  be  called  anytime  whether  @ cfts '  subsys  is  attached  or  not . 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Returns  0  on  successful  unregistration ,  - ENOENT  if  @ cfts  is  not 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  registered . 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_rm_cftypes ( struct  cftype  * cfts )  
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cftype_set  * set ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! cfts  | |  ! cfts [ 0 ] . ss ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOENT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_cfts_prepare ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry ( set ,  & cfts [ 0 ] . ss - > cftsets ,  node )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( set - > cfts  = =  cfts )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:41:53 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											list_del ( & set - > node ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											kfree ( set ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											cgroup_cfts_commit ( cfts ,  false ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_cfts_commit ( NULL ,  false ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  - ENOENT ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_task_count  -  count  the  number  of  tasks  in  a  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  the  cgroup  in  question 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Return  the  number  of  tasks  in  the  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  cgroup_task_count ( const  struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  count  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry ( link ,  & cgrp - > cset_links ,  cset_link ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										count  + =  atomic_read ( & link - > cset - > refcount ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  To  reduce  the  fork ( )  overhead  for  systems  that  are  not  actually  using 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  their  cgroups  capability ,  we  don ' t  maintain  the  lists  running  through 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  each  css_set  to  its  tasks  until  we  see  the  list  actually  used  -  in  other 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  words  after  the  first  call  to  css_task_iter_start ( ) . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_enable_task_cg_lists ( void )  
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * p ,  * g ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									use_task_css_set_links  =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-02-08 03:37:27 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  We  need  tasklist_lock  because  RCU  is  not  safe  against 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  while_each_thread ( ) .  Besides ,  a  forking  task  that  has  passed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup_post_fork ( )  without  seeing  use_task_css_set_links  =  1 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  is  not  guaranteed  to  have  its  child  immediately  visible  in  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  tasklist  if  we  walk  through  it  with  RCU . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & tasklist_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									do_each_thread ( g ,  p )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										task_lock ( p ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-17 11:37:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We  should  check  if  the  process  is  exiting ,  otherwise 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  it  will  race  with  cgroup_exit ( )  in  that  the  list 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  entry  won ' t  be  deleted  though  the  process  has  exited . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! ( p - > flags  &  PF_EXITING )  & &  list_empty ( & p - > cg_list ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											list_add ( & p - > cg_list ,  & task_css_set ( p ) - > tasks ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										task_unlock ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  while_each_thread ( g ,  p ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-02-08 03:37:27 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									read_unlock ( & tasklist_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_next_child  -  find  the  next  child  of  a  given  css 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ pos_css :  the  current  position  ( % NULL  to  initiate  traversal ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ parent_css :  css  whose  children  to  walk 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  This  function  returns  the  next  child  of  @ parent_css  and  should  be  called 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  under  RCU  read  lock .   The  only  requirement  is  that  @ parent_css  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ pos_css  are  accessible .   The  next  sibling  is  guaranteed  to  be  returned 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  regardless  of  their  states . 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  cgroup_subsys_state  *  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								css_next_child ( struct  cgroup_subsys_state  * pos_css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									       struct  cgroup_subsys_state  * parent_css ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * pos  =  pos_css  ?  pos_css - > cgroup  :  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  parent_css - > cgroup ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									WARN_ON_ONCE ( ! rcu_read_lock_held ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  @ pos  could  already  have  been  removed .   Once  a  cgroup  is  removed , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  its  - > sibling . next  is  no  longer  updated  when  its  next  sibling 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:42 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  changes .   As  CGRP_DEAD  assertion  is  serialized  and  happens 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  before  the  cgroup  is  taken  off  the  - > sibling  list ,  if  we  see  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  unasserted ,  it ' s  guaranteed  that  the  next  sibling  hasn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  finished  its  grace  period  even  if  it ' s  already  removed ,  and  thus 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  safe  to  dereference  from  this  RCU  critical  section .   If 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  - > sibling . next  is  inaccessible ,  cgroup_is_dead ( )  is  guaranteed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  to  be  visible  as  % true  here . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  @ pos  is  dead ,  its  next  pointer  can ' t  be  dereferenced ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  however ,  as  each  cgroup  is  given  a  monotonically  increasing 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  unique  serial  number  and  always  appended  to  the  sibling  list , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  next  one  can  be  found  by  walking  the  parent ' s  children  until 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we  see  a  cgroup  with  higher  serial  number  than  @ pos ' s .   While 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  this  path  can  be  slower ,  it ' s  taken  only  when  either  the  current 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup  is  removed  or  iteration  and  removal  race . 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! pos )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										next  =  list_entry_rcu ( cgrp - > children . next ,  struct  cgroup ,  sibling ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  if  ( likely ( ! cgroup_is_dead ( pos ) ) )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										next  =  list_entry_rcu ( pos - > sibling . next ,  struct  cgroup ,  sibling ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_for_each_entry_rcu ( next ,  & cgrp - > children ,  sibling ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( next - > serial_nr  >  pos - > serial_nr ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( & next - > sibling  = =  & cgrp - > children ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  cgroup_css ( next ,  parent_css - > ss ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( css_next_child ) ;  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_next_descendant_pre  -  find  the  next  descendant  for  pre - order  walk 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ pos :  the  current  position  ( % NULL  to  initiate  traversal ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ root :  css  whose  descendants  to  walk 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  To  be  used  by  css_for_each_descendant_pre ( ) .   Find  the  next  descendant 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  to  visit  for  pre - order  traversal  of  @ root ' s  descendants .   @ root  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  included  in  the  iteration  and  the  first  node  to  be  visited . 
							 
						 
					
						
							
								
									
										
										
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  While  this  function  requires  RCU  read  locking ,  it  doesn ' t  require  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  whole  traversal  to  be  contained  in  a  single  RCU  critical  section .   This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  will  return  the  correct  next  descendant  as  long  as  both  @ pos 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  and  @ root  are  accessible  and  @ pos  is  a  descendant  of  @ root . 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  cgroup_subsys_state  *  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								css_next_descendant_pre ( struct  cgroup_subsys_state  * pos ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup_subsys_state  * root ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * next ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									WARN_ON_ONCE ( ! rcu_read_lock_held ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* if first iteration, visit @root */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-05-24 10:50:24 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! pos ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  root ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* visit the first child if exists */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									next  =  css_next_child ( NULL ,  pos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( next ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* no child, visit my or the closest ancestor's next sibling */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									while  ( pos  ! =  root )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										next  =  css_next_child ( pos ,  css_parent ( pos ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( next ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											return  next ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pos  =  css_parent ( pos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-05-24 10:50:24 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( css_next_descendant_pre ) ;  
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-07 08:49:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_rightmost_descendant  -  return  the  rightmost  descendant  of  a  css 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ pos :  css  of  interest 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-07 08:49:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Return  the  rightmost  descendant  of  @ pos .   If  there ' s  no  descendant ,  @ pos 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  is  returned .   This  can  be  used  during  pre - order  traversal  to  skip 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-07 08:49:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  subtree  of  @ pos . 
							 
						 
					
						
							
								
									
										
										
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  While  this  function  requires  RCU  read  locking ,  it  doesn ' t  require  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  whole  traversal  to  be  contained  in  a  single  RCU  critical  section .   This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  will  return  the  correct  rightmost  descendant  as  long  as  @ pos  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  accessible . 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-07 08:49:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  cgroup_subsys_state  *  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								css_rightmost_descendant ( struct  cgroup_subsys_state  * pos )  
						 
					
						
							
								
									
										
										
										
											2013-01-07 08:49:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * last ,  * tmp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-07 08:49:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									WARN_ON_ONCE ( ! rcu_read_lock_held ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										last  =  pos ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* ->prev isn't RCU safe, walk ->next till the end */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										pos  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css_for_each_child ( tmp ,  last ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-07 08:49:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											pos  =  tmp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  while  ( pos ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  last ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( css_rightmost_descendant ) ;  
						 
					
						
							
								
									
										
										
										
											2013-01-07 08:49:33 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroup_subsys_state  *  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								css_leftmost_descendant ( struct  cgroup_subsys_state  * pos )  
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * last ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										last  =  pos ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pos  =  css_next_child ( NULL ,  pos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  while  ( pos ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  last ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_next_descendant_post  -  find  the  next  descendant  for  post - order  walk 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ pos :  the  current  position  ( % NULL  to  initiate  traversal ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ root :  css  whose  descendants  to  walk 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  To  be  used  by  css_for_each_descendant_post ( ) .   Find  the  next  descendant 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  to  visit  for  post - order  traversal  of  @ root ' s  descendants .   @ root  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  included  in  the  iteration  and  the  last  node  to  be  visited . 
							 
						 
					
						
							
								
									
										
										
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  While  this  function  requires  RCU  read  locking ,  it  doesn ' t  require  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  whole  traversal  to  be  contained  in  a  single  RCU  critical  section .   This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  will  return  the  correct  next  descendant  as  long  as  both  @ pos 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  and  @ cgroup  are  accessible  and  @ pos  is  a  descendant  of  @ cgroup . 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  cgroup_subsys_state  *  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								css_next_descendant_post ( struct  cgroup_subsys_state  * pos ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 struct  cgroup_subsys_state  * root ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * next ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									WARN_ON_ONCE ( ! rcu_read_lock_held ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-09-06 15:31:08 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* if first iteration, visit leftmost descendant which may be @root */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! pos ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  css_leftmost_descendant ( root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* if we visited @root, we're done */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( pos  = =  root ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* if there's an unvisited sibling, visit its leftmost descendant */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									next  =  css_next_child ( pos ,  css_parent ( pos ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( next ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  css_leftmost_descendant ( next ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* no sibling left, visit parent */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:27 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  css_parent ( pos ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( css_next_descendant_post ) ;  
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_advance_task_iter  -  advance  a  task  itererator  to  the  next  css_set 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ it :  the  iterator  to  advance 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Advance  @ it  to  the  next  css_set  to  walk . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  css_advance_task_iter ( struct  css_task_iter  * it )  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  * l  =  it - > cset_link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  css_set  * cset ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Advance to the next non-empty css_set */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									do  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										l  =  l - > next ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( l  = =  & it - > origin_css - > cgroup - > cset_links )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											it - > cset_link  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										link  =  list_entry ( l ,  struct  cgrp_cset_link ,  cset_link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cset  =  link - > cset ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  while  ( list_empty ( & cset - > tasks ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									it - > cset_link  =  l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									it - > task  =  cset - > tasks . next ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_task_iter_start  -  initiate  task  iteration 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ css :  the  css  to  walk  tasks  of 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ it :  the  task  iterator  to  use 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Initiate  iteration  through  the  tasks  of  @ css .   The  caller  can  call 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  css_task_iter_next ( )  to  walk  through  the  tasks  until  the  function 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  NULL .   On  completion  of  iteration ,  css_task_iter_end ( )  must  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  called . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Note  that  this  function  acquires  a  lock  which  is  released  when  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  iteration  finishes .   The  caller  can ' t  sleep  while  iteration  is  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  progress . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								void  css_task_iter_start ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 struct  css_task_iter  * it ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-12-27 07:46:26 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__acquires ( css_set_lock ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  The  first  time  anyone  tries  to  iterate  across  a  css ,  we  need  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  enable  the  list  linking  each  css_set  to  its  tasks ,  and  fix  up 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  all  existing  tasks . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! use_task_css_set_links ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgroup_enable_task_cg_lists ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									it - > origin_css  =  css ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									it - > cset_link  =  & css - > cgroup - > cset_links ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_advance_task_iter ( it ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_task_iter_next  -  return  the  next  task  for  the  iterator 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ it :  the  task  iterator  being  iterated 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  " next "  function  for  task  iteration .   @ it  should  have  been 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  initialized  via  css_task_iter_start ( ) .   Returns  NULL  when  the  iteration 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  reaches  the  end . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  task_struct  * css_task_iter_next ( struct  css_task_iter  * it )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * res ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  * l  =  it - > task ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* If the iterator cg is NULL, we have no tasks */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! it - > cset_link ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									res  =  list_entry ( l ,  struct  task_struct ,  cg_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Advance iterator to find next entry */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l  =  l - > next ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									link  =  list_entry ( it - > cset_link ,  struct  cgrp_cset_link ,  cset_link ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( l  = =  & link - > cset - > tasks )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We  reached  the  end  of  this  task  list  -  move  on  to  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  next  cgrp_cset_link . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css_advance_task_iter ( it ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										it - > task  =  l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  res ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_task_iter_end  -  finish  task  iteration 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ it :  the  task  iterator  to  finish 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Finish  task  iteration  started  by  css_task_iter_start ( ) . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								void  css_task_iter_end ( struct  css_task_iter  * it )  
						 
					
						
							
								
									
										
										
										
											2011-12-27 07:46:26 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									__releases ( css_set_lock ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  inline  int  started_after_time ( struct  task_struct  * t1 ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												     struct  timespec  * time , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												     struct  task_struct  * t2 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  start_diff  =  timespec_compare ( & t1 - > start_time ,  time ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( start_diff  >  0 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  if  ( start_diff  <  0 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Arbitrarily ,  if  two  processes  started  at  the  same 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  time ,  we ' ll  say  that  the  lower  pointer  value 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  started  first .  Note  that  t2  may  have  exited  by  now 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  so  this  may  not  be  a  valid  pointer  any  longer ,  but 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  that ' s  fine  -  it  still  serves  to  distinguish 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  between  two  tasks  started  ( effectively )  simultaneously . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  t1  >  t2 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  function  is  a  callback  from  heap_insert ( )  and  is  used  to  order 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  heap . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  In  this  case  we  order  the  heap  in  descending  task  start  time . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  inline  int  started_after ( void  * p1 ,  void  * p2 )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * t1  =  p1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * t2  =  p2 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  started_after_time ( t1 ,  & t2 - > start_time ,  t2 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_scan_tasks  -  iterate  though  all  the  tasks  in  a  css 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ css :  the  css  to  iterate  tasks  of 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ test :  optional  test  callback 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ process :  process  callback 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ data :  data  passed  to  @ test  and  @ process 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ heap :  optional  pre - allocated  heap  used  for  task  iteration 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Iterate  through  all  the  tasks  in  @ css ,  calling  @ test  for  each ,  and  if  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  % true ,  call  @ process  for  it  also . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ test  may  be  NULL ,  meaning  always  true  ( select  all  tasks ) ,  which 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  effectively  duplicates  css_task_iter_ { start , next , end } ( )  but  does  not 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  lock  css_set_lock  for  the  call  to  @ process . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  It  is  guaranteed  that  @ process  will  act  on  every  task  that  is  a  member 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  of  @ css  for  the  duration  of  this  call .   This  function  may  or  may  not 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  call  @ process  for  tasks  that  exit  or  move  to  a  different  css  during  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  call ,  or  are  forked  or  move  into  the  css  during  the  call . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Note  that  @ test  may  be  called  with  locks  held ,  and  may  in  some 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  situations  be  called  multiple  times  for  the  same  task ,  so  it  should  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cheap . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  If  @ heap  is  non - NULL ,  a  heap  has  been  pre - allocated  and  will  be  used  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  heap  operations  ( and  its  " gt "  member  will  be  overwritten ) ,  else  a 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  temporary  heap  will  be  used  ( allocation  of  which  may  cause  this  function 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  to  fail ) . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  css_scan_tasks ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										   bool  ( * test ) ( struct  task_struct  * ,  void  * ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										   void  ( * process ) ( struct  task_struct  * ,  void  * ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										   void  * data ,  struct  ptr_heap  * heap ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  retval ,  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_task_iter  it ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  task_struct  * p ,  * dropped ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Never dereference latest_task, since it's not refcounted */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * latest_task  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  ptr_heap  tmp_heap ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  timespec  latest_time  =  {  0 ,  0  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( heap )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* The caller supplied our heap and pre-allocated its memory */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										heap - > gt  =  & started_after ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* We need to allocate our own heap memory */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										heap  =  & tmp_heap ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										retval  =  heap_init ( heap ,  PAGE_SIZE ,  GFP_KERNEL ,  & started_after ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* cannot allocate the heap */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 again : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Scan  tasks  in  the  css ,  using  the  @ test  callback  to  determine 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  which  are  of  interest ,  and  invoking  @ process  callback  on  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  ones  which  need  an  update .   Since  we  don ' t  want  to  hold  any 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  locks  during  the  task  updates ,  gather  tasks  to  be  processed  in  a 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  heap  structure .   The  heap  is  sorted  by  descending  task  start 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  time .   If  the  statically - sized  heap  fills  up ,  we  overflow  tasks 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  that  started  later ,  and  in  future  iterations  only  consider  tasks 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  that  started  after  the  latest  task  in  the  previous  pass .  This 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  guarantees  forward  progress  and  that  we  don ' t  miss  any  tasks . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									heap - > size  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_task_iter_start ( css ,  & it ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( ( p  =  css_task_iter_next ( & it ) ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Only  affect  tasks  that  qualify  per  the  caller ' s  callback , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  if  he  provided  one 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( test  & &  ! test ( p ,  data ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Only  process  tasks  that  started  after  the  last  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  we  processed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! started_after_time ( p ,  & latest_time ,  latest_task ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dropped  =  heap_insert ( heap ,  p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( dropped  = =  NULL )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  The  new  task  was  inserted ;  the  heap  wasn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  previously  full 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											get_task_struct ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										}  else  if  ( dropped  ! =  p )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  The  new  task  was  inserted ,  and  pushed  out  a 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  different  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											get_task_struct ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											put_task_struct ( dropped ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Else  the  new  task  was  newer  than  anything  already  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  the  heap  and  wasn ' t  inserted 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_task_iter_end ( & it ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( heap - > size )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for  ( i  =  0 ;  i  <  heap - > size ;  i + + )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											struct  task_struct  * q  =  heap - > ptrs [ i ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( i  = =  0 )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												latest_time  =  q - > start_time ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												latest_task  =  q ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* Process the task per the caller's callback */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											process ( q ,  data ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 00:59:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											put_task_struct ( q ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:42 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  If  we  had  to  process  any  tasks  at  all ,  scan  again 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  in  case  some  of  them  were  in  the  middle  of  forking 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  children  that  didn ' t  get  processed . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Not  the  most  efficient  way  to  do  it ,  but  it  avoids 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  having  to  take  callback_mutex  in  the  fork  path 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  again ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( heap  = =  & tmp_heap ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										heap_free ( & tmp_heap ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_transfer_one_task ( struct  task_struct  * task ,  void  * data )  
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * new_cgroup  =  data ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_attach_task ( new_cgroup ,  task ,  false ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_trasnsfer_tasks  -  move  tasks  from  one  cgroup  to  another 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ to :  cgroup  to  which  the  tasks  will  be  moved 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ from :  cgroup  in  which  the  tasks  currently  reside 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  cgroup_transfer_tasks ( struct  cgroup  * to ,  struct  cgroup  * from )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  css_scan_tasks ( & from - > dummy_css ,  NULL ,  cgroup_transfer_one_task , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											      to ,  NULL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-07 09:29:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Stuff  for  reading  the  ' tasks ' / ' procs '  files . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Reading  this  file  can  return  large  amounts  of  data  if  a  cgroup  has 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  * lots *  of  attached  tasks .  So  it  may  need  several  calls  to  read ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  but  we  cannot  guarantee  that  the  information  we  produce  is  correct 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  unless  we  produce  it  entirely  atomically . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-01-20 11:58:43 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* which pidlist file are we talking about? */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								enum  cgroup_filetype  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									CGROUP_FILE_PROCS , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									CGROUP_FILE_TASKS , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  pidlist  is  a  list  of  pids  that  virtually  represents  the  contents  of  one 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  of  the  cgroup  files  ( " procs "  or  " tasks " ) .  We  keep  a  list  of  such  pidlists , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  a  pair  ( one  each  for  procs ,  tasks )  for  each  pid  namespace  that ' s  relevant 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  to  the  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_pidlist  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  used  to  find  which  pidlist  is  wanted .  doesn ' t  change  as  long  as 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  this  particular  list  stays  in  the  list . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									*/ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  {  enum  cgroup_filetype  type ;  struct  pid_namespace  * ns ;  }  key ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* array of xids */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid_t  * list ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* how many elements the above list has */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* how many files are using the current array */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  use_count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* each of these stored in a list by its cgroup */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  list_head  links ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* pointer to the cgroup we belong to, for list removal purposes */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * owner ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* protects the other fields */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:52:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  rw_semaphore  rwsem ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-01-20 11:58:43 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  following  two  functions  " fix "  the  issue  where  there  are  more  pids 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  than  kmalloc  will  give  memory  for ;  in  such  cases ,  we  use  vmalloc / vfree . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  TODO :  replace  with  a  kernel - wide  solution  to  this  problem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define PIDLIST_TOO_LARGE(c) ((c) * sizeof(pid_t) > (PAGE_SIZE * 2)) 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  * pidlist_allocate ( int  count )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( PIDLIST_TOO_LARGE ( count ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  vmalloc ( count  *  sizeof ( pid_t ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  kmalloc ( count  *  sizeof ( pid_t ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  pidlist_free ( void  * p )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( is_vmalloc_addr ( p ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										vfree ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( p ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  pidlist_uniq  -  given  a  kmalloc ( ) ed  list ,  strip  out  all  duplicate  entries 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-12 15:36:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Returns  the  number  of  unique  elements . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-12 15:36:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  pidlist_uniq ( pid_t  * list ,  int  length )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  src ,  dest  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we  presume  the  0 th  element  is  unique ,  so  i  starts  at  1.  trivial 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  edge  cases  first ;  no  work  needs  to  be  done  for  either 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( length  = =  0  | |  length  = =  1 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* src and dest walk down the list; dest counts unique elements */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for  ( src  =  1 ;  src  <  length ;  src + + )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* find next unique element */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										while  ( list [ src ]  = =  list [ src - 1 ] )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											src + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( src  = =  length ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												goto  after ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* dest always points to where the next unique element goes */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list [ dest ]  =  list [ src ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dest + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								after :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  dest ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cmppid ( const  void  * a ,  const  void  * b )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  * ( pid_t  * ) a  -  * ( pid_t  * ) b ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  find  the  appropriate  pidlist  for  our  purpose  ( given  procs  vs  tasks ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  returns  with  the  lock  on  that  pidlist  already  held ,  and  takes  care 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  of  the  use  count ,  or  returns  NULL  with  no  locks  held  if  we ' re  out  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  memory . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  cgroup_pidlist  * cgroup_pidlist_find ( struct  cgroup  * cgrp ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
														  enum  cgroup_filetype  type ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* don't need task_nsproxy() if we're looking at ourself */ 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-02 14:51:53 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  pid_namespace  * ns  =  task_active_pid_ns ( current ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:52:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  We  can ' t  drop  the  pidlist_mutex  before  taking  the  l - > rwsem  in  case 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  the  last  ref - holder  is  trying  to  remove  l  from  the  list  at  the  same 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  time .  Holding  the  pidlist_mutex  precludes  somebody  taking  whichever 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  list  we  find  out  from  under  us  -  compare  release_pid_array ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgrp - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry ( l ,  & cgrp - > pidlists ,  links )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( l - > key . type  = =  type  & &  l - > key . ns  = =  ns )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/* make sure l doesn't vanish out from under us */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:52:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											down_write ( & l - > rwsem ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											mutex_unlock ( & cgrp - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											return  l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* entry not found; create a new one */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l  =  kzalloc ( sizeof ( struct  cgroup_pidlist ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! l )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & cgrp - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:52:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_rwsem ( & l - > rwsem ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									down_write ( & l - > rwsem ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l - > key . type  =  type ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l - > key . ns  =  get_pid_ns ( ns ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l - > owner  =  cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add ( & l - > links ,  & cgrp - > pidlists ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgrp - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Load  a  cgroup ' s  pidarray  with  either  procs '  tgids  or  tasks '  pids 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  pidlist_array_load ( struct  cgroup  * cgrp ,  enum  cgroup_filetype  type ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											      struct  cgroup_pidlist  * * lp ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid_t  * array ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  pid ,  n  =  0 ;  /* used for populating the array */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_task_iter  it ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  task_struct  * tsk ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  cgroup  gets  more  users  after  we  read  count ,  we  won ' t  have 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  enough  space  -  tough .   This  race  is  indistinguishable  to  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  caller  from  the  case  that  the  additional  cgroup  users  didn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  show  up  until  sometime  later  on . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									length  =  cgroup_task_count ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									array  =  pidlist_allocate ( length ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! array ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* now, populate the array */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_task_iter_start ( & cgrp - > dummy_css ,  & it ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( ( tsk  =  css_task_iter_next ( & it ) ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( unlikely ( n  = =  length ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* get tgid or pid for procs or tasks file respectively */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( type  = =  CGROUP_FILE_PROCS ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											pid  =  task_tgid_vnr ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											pid  =  task_pid_vnr ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( pid  >  0 )  /* make sure to only use valid results */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											array [ n + + ]  =  pid ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_task_iter_end ( & it ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									length  =  n ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* now sort & (if procs) strip out duplicates */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									sort ( array ,  length ,  sizeof ( pid_t ) ,  cmppid ,  NULL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( type  = =  CGROUP_FILE_PROCS ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-12 15:36:00 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										length  =  pidlist_uniq ( array ,  length ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l  =  cgroup_pidlist_find ( cgrp ,  type ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! l )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pidlist_free ( array ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* store array, freeing old if necessary - lock already held */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									pidlist_free ( l - > list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									l - > list  =  array ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l - > length  =  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l - > use_count + + ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:52:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									up_write ( & l - > rwsem ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									* lp  =  l ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroupstats_build  -  build  and  fill  cgroupstats 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ stats :  cgroupstats  to  fill  information  into 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ dentry :  A  dentry  entry  belonging  to  the  cgroup  for  which  stats  have 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  been  requested . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Build  and  fill  cgroupstats  so  that  taskstats  can  export  it  to  user 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  space . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  cgroupstats_build ( struct  cgroupstats  * stats ,  struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret  =  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_task_iter  it ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  task_struct  * tsk ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-19 15:36:48 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-19 15:36:48 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Validate  dentry  by  checking  the  superblock  operations , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  and  make  sure  it ' s  a  directory . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-11-19 15:36:48 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( dentry - > d_sb - > s_op  ! =  & cgroup_ops  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    ! S_ISDIR ( dentry - > d_inode - > i_mode ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 goto  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp  =  dentry - > d_fsdata ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_task_iter_start ( & cgrp - > dummy_css ,  & it ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( ( tsk  =  css_task_iter_next ( & it ) ) )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										switch  ( tsk - > state )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										case  TASK_RUNNING : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											stats - > nr_running + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										case  TASK_INTERRUPTIBLE : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											stats - > nr_sleeping + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										case  TASK_UNINTERRUPTIBLE : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											stats - > nr_uninterruptible + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										case  TASK_STOPPED : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											stats - > nr_stopped + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										default : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( delayacct_is_task_waiting_on_io ( tsk ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												stats - > nr_io_wait + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_task_iter_end ( & it ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Add cgroupstats
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface.  A new set of cgroup operations are
registered with commands and attributes.  It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add.  Currently
user space requests for statistics by passing the cgroup file
descriptor.  Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								err :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  seq_file  methods  for  the  tasks / procs  files .  The  seq_file  position  is  the 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  next  pid  to  display ;  the  seq_file  iterator  is  a  pointer  to  the  pid 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  in  the  cgroup - > l - > list  array . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  * cgroup_pidlist_start ( struct  seq_file  * s ,  loff_t  * pos )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Initially  we  receive  a  position  value  that  corresponds  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  one  more  than  the  last  pid  shown  ( or  0  on  the  first  call  or 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  after  a  seek  to  the  start ) .  Use  a  binary - search  to  find  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  next  pid  to  display ,  if  any 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l  =  s - > private ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  index  =  0 ,  pid  =  * pos ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  * iter ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:52:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									down_read ( & l - > rwsem ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( pid )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										int  end  =  l - > length ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-21 16:11:20 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										while  ( index  <  end )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											int  mid  =  ( index  +  end )  /  2 ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( l - > list [ mid ]  = =  pid )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												index  =  mid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											}  else  if  ( l - > list [ mid ]  < =  pid ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												index  =  mid  +  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												end  =  mid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* If we're off the end of the array, we're done */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( index  > =  l - > length ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Update the abstract position to be the actual pid that we found */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									iter  =  l - > list  +  index ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									* pos  =  * iter ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  iter ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_pidlist_stop ( struct  seq_file  * s ,  void  * v )  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l  =  s - > private ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:52:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									up_read ( & l - > rwsem ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  * cgroup_pidlist_next ( struct  seq_file  * s ,  void  * v ,  loff_t  * pos )  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l  =  s - > private ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid_t  * p  =  v ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid_t  * end  =  l - > list  +  l - > length ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Advance  to  the  next  pid  in  the  array .  If  this  goes  off  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  end ,  we ' re  done 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									p + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( p  > =  end )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										* pos  =  * p ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  p ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_pidlist_show ( struct  seq_file  * s ,  void  * v )  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  seq_printf ( s ,  " %d \n " ,  * ( int  * ) v ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  seq_operations  functions  for  iterating  on  pidlists  through  seq_file  - 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  independent  of  whether  it ' s  tasks  or  procs 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  const  struct  seq_operations  cgroup_pidlist_seq_operations  =  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. start  =  cgroup_pidlist_start , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. stop  =  cgroup_pidlist_stop , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. next  =  cgroup_pidlist_next , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. show  =  cgroup_pidlist_show , 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_release_pid_array ( struct  cgroup_pidlist  * l )  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  case  where  we ' re  the  last  user  of  this  particular  pidlist  will 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  have  us  remove  it  from  the  cgroup ' s  list ,  which  entails  taking  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  mutex .  since  in  pidlist_find  the  pidlist - > lock  depends  on  cgroup - > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  pidlist_mutex ,  we  have  to  take  pidlist_mutex  first . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & l - > owner - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:52:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									down_write ( & l - > rwsem ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ! l - > use_count ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! - - l - > use_count )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* we're the last user if refcount is 0; remove and free */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del ( & l - > links ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & l - > owner - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:28 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pidlist_free ( l - > list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										put_pid_ns ( l - > key . ns ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:52:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										up_write ( & l - > rwsem ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										kfree ( l ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & l - > owner - > pidlist_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:52:15 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									up_write ( & l - > rwsem ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_pidlist_release ( struct  inode  * inode ,  struct  file  * file )  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! ( file - > f_mode  &  FMODE_READ ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  the  seq_file  will  only  be  initialized  if  the  file  was  opened  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  reading ;  hence  we  check  if  it ' s  not  null  only  in  that  case . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									l  =  ( ( struct  seq_file  * ) file - > private_data ) - > private ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_release_pid_array ( l ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  seq_release ( inode ,  file ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  file_operations  cgroup_pidlist_operations  =  {  
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. read  =  seq_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. llseek  =  seq_lseek , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. write  =  cgroup_file_write , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. release  =  cgroup_pidlist_release , 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  The  following  functions  handle  opens  on  a  file  that  displays  a  pidlist 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  ( tasks  or  procs ) .  Prepare  an  array  of  the  process / thread  IDs  of  whoever ' s 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  in  the  cgroup . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* helper function for the two below it */  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_pidlist_open ( struct  file  * file ,  enum  cgroup_filetype  type )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  __d_cgrp ( file - > f_dentry - > d_parent ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_pidlist  * l ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  retval ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Nothing to do for write-only files */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! ( file - > f_mode  &  FMODE_READ ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* have the array populated */ 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									retval  =  pidlist_array_load ( cgrp ,  type ,  & l ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( retval ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* configure file information */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									file - > f_op  =  & cgroup_pidlist_operations ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									retval  =  seq_open ( file ,  & cgroup_pidlist_seq_operations ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( retval )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_release_pid_array ( l ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  retval ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									( ( struct  seq_file  * ) file - > private_data ) - > private  =  l ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_tasks_open ( struct  inode  * unused ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  cgroup_pidlist_open ( file ,  CGROUP_FILE_TASKS ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_procs_open ( struct  inode  * unused ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:27 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  cgroup_pidlist_open ( file ,  CGROUP_FILE_PROCS ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  u64  cgroup_read_notify_on_release ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													 struct  cftype  * cft ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  notify_on_release ( css - > cgroup ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_write_notify_on_release ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													  struct  cftype  * cft ,  u64  val ) 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									clear_bit ( CGRP_RELEASABLE ,  & css - > cgroup - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( val ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										set_bit ( CGRP_NOTIFY_ON_RELEASE ,  & css - > cgroup - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										clear_bit ( CGRP_NOTIFY_ON_RELEASE ,  & css - > cgroup - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:47:01 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 18:41:10 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  When  dput ( )  is  called  asynchronously ,  if  umount  has  been  done  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  then  deactivate_super ( )  in  cgroup_free_fn ( )  kills  the  superblock , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  there ' s  a  small  window  that  vfs  will  see  the  root  dentry  with  non - zero 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  refcnt  and  trigger  BUG ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  That ' s  why  we  hold  a  reference  before  dput ( )  and  drop  it  right  after . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_dput ( struct  cgroup  * cgrp )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  super_block  * sb  =  cgrp - > root - > sb ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									atomic_inc ( & sb - > s_active ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dput ( cgrp - > dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									deactivate_super ( sb ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Unregister  event  and  free  resources . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Gets  called  from  workqueue . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_event_remove ( struct  work_struct  * work )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_event  * event  =  container_of ( work ,  struct  cgroup_event , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											remove ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  =  event - > css ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									remove_wait_queue ( event - > wqh ,  & event - > wait ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									event - > cft - > unregister_event ( css ,  event - > cft ,  event - > eventfd ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Notify userspace the event is going away. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									eventfd_signal ( event - > eventfd ,  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									eventfd_ctx_put ( event - > eventfd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( event ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_put ( css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Gets  called  on  POLLHUP  on  eventfd  when  user  closes  it . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Called  with  wqh - > lock  held  and  interrupts  disabled . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroup_event_wake ( wait_queue_t  * wait ,  unsigned  mode ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										int  sync ,  void  * key ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_event  * event  =  container_of ( wait , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup_event ,  wait ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  event - > css - > cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  flags  =  ( unsigned  long ) key ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( flags  &  POLLHUP )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 *  If  the  event  has  been  detached  at  cgroup  removal ,  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  can  simply  return  knowing  the  other  side  will  cleanup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  for  us . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  We  can ' t  race  against  event  freeing  since  the  other 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  side  will  require  wqh - > lock  via  remove_wait_queue ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  which  we  hold . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 18:56:14 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										spin_lock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! list_empty ( & event - > list ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											list_del_init ( & event - > list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  We  are  in  atomic  context ,  but  cgroup_event_remove ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 *  may  sleep ,  so  we  have  to  call  it  in  workqueue . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											schedule_work ( & event - > remove ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										spin_unlock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_event_ptable_queue_proc ( struct  file  * file ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										wait_queue_head_t  * wqh ,  poll_table  * pt ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_event  * event  =  container_of ( pt , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											struct  cgroup_event ,  pt ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									event - > wqh  =  wqh ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									add_wait_queue ( wqh ,  & event - > wait ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Parse  input  and  register  new  cgroup  event  handler . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Input  must  be  in  format  ' < event_fd >  < control_fd >  < args > ' . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Interpretation  of  args  is  defined  by  control  file  implementation . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-15 11:43:15 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_write_event_control ( struct  cgroup_subsys_state  * dummy_css ,  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												      struct  cftype  * cft ,  const  char  * buffer ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-15 11:43:15 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  dummy_css - > cgroup ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:51:47 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_event  * event ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * cfile_css ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  int  efd ,  cfd ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  fd  efile ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  fd  cfile ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  * endp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									efd  =  simple_strtoul ( buffer ,  & endp ,  10 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( * endp  ! =  '   ' ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									buffer  =  endp  +  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cfd  =  simple_strtoul ( buffer ,  & endp ,  10 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ( * endp  ! =  '   ' )  & &  ( * endp  ! =  ' \0 ' ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									buffer  =  endp  +  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									event  =  kzalloc ( sizeof ( * event ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! event ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-15 11:43:15 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & event - > list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									init_poll_funcptr ( & event - > pt ,  cgroup_event_ptable_queue_proc ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									init_waitqueue_func_entry ( & event - > wait ,  cgroup_event_wake ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_WORK ( & event - > remove ,  cgroup_event_remove ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									efile  =  fdget ( efd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! efile . file )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - EBADF ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:51:47 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_kfree ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									event - > eventfd  =  eventfd_ctx_fileget ( efile . file ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( IS_ERR ( event - > eventfd ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  PTR_ERR ( event - > eventfd ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:51:47 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_put_efile ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cfile  =  fdget ( cfd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cfile . file )  { 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  - EBADF ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:51:47 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_put_eventfd ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* the process need read permission on control file */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-06-19 12:55:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* AV: shouldn't we check that it's been opened for read instead? */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  inode_permission ( file_inode ( cfile . file ) ,  MAY_READ ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ret  <  0 ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:51:47 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_put_cfile ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									event - > cft  =  __file_cft ( cfile . file ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( IS_ERR ( event - > cft ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  PTR_ERR ( event - > cft ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:51:47 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_put_cfile ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-15 11:43:15 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! event - > cft - > ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - EBADF ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_put_cfile ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 14:13:35 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Determine  the  css  of  @ cfile ,  verify  it  belongs  to  the  same 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup  as  cgroup . event_control ,  and  associate  @ event  with  it . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Remaining  events  are  automatically  removed  on  cgroup  destruction 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  but  the  removal  is  asynchronous ,  so  take  an  extra  ref . 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 14:13:35 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-15 11:43:15 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									event - > css  =  cgroup_css ( cgrp ,  event - > cft - > ss ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cfile_css  =  css_from_dir ( cfile . file - > f_dentry - > d_parent ,  event - > cft - > ss ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( event - > css  & &  event - > css  = =  cfile_css  & &  css_tryget ( event - > css ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-15 11:43:15 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_put_cfile ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-02-18 14:13:35 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! event - > cft - > register_event  | |  ! event - > cft - > unregister_event )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										ret  =  - EINVAL ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_put_css ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-15 11:43:15 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  event - > cft - > register_event ( event - > css ,  event - > cft , 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											event - > eventfd ,  buffer ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  out_put_css ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									efile . file - > f_op - > poll ( efile . file ,  & event - > pt ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_lock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add ( & event - > list ,  & cgrp - > event_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_unlock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									fdput ( cfile ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									fdput ( efile ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out_put_css :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									css_put ( event - > css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:51:47 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out_put_cfile :  
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									fdput ( cfile ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:51:47 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out_put_eventfd :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									eventfd_ctx_put ( event - > eventfd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out_put_efile :  
						 
					
						
							
								
									
										
										
										
											2013-08-30 12:29:49 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									fdput ( efile ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-01 09:51:47 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								out_kfree :  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( event ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  u64  cgroup_clone_children_read ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												      struct  cftype  * cft ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  test_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & css - > cgroup - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_clone_children_write ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												       struct  cftype  * cft ,  u64  val ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( val ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										set_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & css - > cgroup - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										clear_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & css - > cgroup - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-03 19:14:34 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cftype  cgroup_base_files [ ]  =  {  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-03 19:14:34 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. name  =  " cgroup.procs " , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. open  =  cgroup_procs_open , 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. write_u64  =  cgroup_procs_write , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. release  =  cgroup_pidlist_release , 
							 
						 
					
						
							
								
									
										
										
										
											2011-05-26 16:25:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. mode  =  S_IRUGO  |  S_IWUSR , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-03 19:14:34 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. name  =  " cgroup.event_control " , 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:20 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. write_string  =  cgroup_write_event_control , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. mode  =  S_IWUGO , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " cgroup.clone_children " , 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: introduce sane_behavior mount option
It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.
As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.
Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.
This patch introduces the mount option and changes the following
behaviors in cgroup core.
* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.
* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.
* Remount is disallowed.
The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.
v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
											 
										 
										
											2013-04-14 20:15:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. flags  =  CFTYPE_INSANE , 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. read_u64  =  cgroup_clone_children_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. write_u64  =  cgroup_clone_children_write , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: introduce sane_behavior mount option
It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.
As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.
Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.
This patch introduces the mount option and changes the following
behaviors in cgroup core.
* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.
* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.
* Remount is disallowed.
The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.
v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
											 
										 
										
											2013-04-14 20:15:26 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " cgroup.sane_behavior " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. flags  =  CFTYPE_ONLY_ON_ROOT , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_seq_string  =  cgroup_sane_behavior_show , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-03 19:14:34 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Historical  crazy  stuff .   These  don ' t  have  " cgroup. "   prefix  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  don ' t  exist  if  sane_behavior .   If  you ' re  depending  on  these ,  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  prepared  to  be  burned . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " tasks " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. flags  =  CFTYPE_INSANE , 		/* use "procs" instead */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. open  =  cgroup_tasks_open , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. write_u64  =  cgroup_tasks_write , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. release  =  cgroup_pidlist_release , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. mode  =  S_IRUGO  |  S_IWUSR , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " notify_on_release " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. flags  =  CFTYPE_INSANE , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  cgroup_read_notify_on_release , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. write_u64  =  cgroup_write_notify_on_release , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " release_agent " , 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-03 19:13:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. flags  =  CFTYPE_INSANE  |  CFTYPE_ONLY_ON_ROOT , 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										. read_seq_string  =  cgroup_release_agent_show , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. write_string  =  cgroup_release_agent_write , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. max_write_len  =  PATH_MAX , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{  } 	/* terminate */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_populate_dir  -  create  subsys  files  in  a  cgroup  directory 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ cgrp :  target  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ subsys_mask :  mask  of  the  subsystem  ids  whose  files  should  be  added 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  On  failure ,  no  file  is  added . 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_populate_dir ( struct  cgroup  * cgrp ,  unsigned  long  subsys_mask )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 12:34:02 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ,  ret  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* process cftsets of each subsystem */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 12:34:02 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_subsys ( ss ,  i )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cftype_set  * set ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 12:34:02 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! test_bit ( i ,  & subsys_mask ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-08-23 16:53:29 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_for_each_entry ( set ,  & ss - > cftsets ,  node )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ret  =  cgroup_addrm_files ( cgrp ,  set - > cfts ,  true ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ret  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												goto  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								err :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_clear_dir ( cgrp ,  subsys_mask ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: RCU protect each cgroup_subsys_state release
With the planned unified hierarchy, individual css's will be created
and destroyed dynamically across the lifetime of a cgroup.  To enable
such usages, css destruction is being decoupled from cgroup
destruction.  Most of the destruction path has been decoupled but the
actual free of css still depends on cgroup free path.
When all css refs are drained, css_release() kicks off
css_free_work_fn() which puts the cgroup.  When the cgroup refcnt
reaches zero, cgroup_diput() is invoked which in turn schedules RCU
free of the cgroup.  After a grace period, all css's are freed along
with the cgroup itself.
This patch moves the RCU grace period and css freeing from cgroup
release path to css release path.  css_release(), instead of kicking
off css_free_work_fn() directly, schedules RCU callback
css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
RCU grace period.  css_free_work_fn() is updated to free the css
directly.
The five-way punting - percpu ref kill confirmation, a work item,
percpu ref release, RCU grace period, and again a work item - is quite
hairy but the work items are there only to provide process context and
the actual sequence is kill confirm -> release -> RCU free, which
isn't simple but not too crazy.
This removes cgroup_css() usage after offline_css() allowing clearing
cgroup->subsys[] from offline_css(), which makes it consistent with
online_css() and brings it closer to proper lifetime management for
individual css's.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
											 
										 
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  css  destruction  is  four - stage  process . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  1.  Destruction  starts .   Killing  of  the  percpu_ref  is  initiated . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     Implemented  in  kill_css ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  2.  When  the  percpu_ref  is  confirmed  to  be  visible  as  killed  on  all  CPUs 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     and  thus  css_tryget ( )  is  guaranteed  to  fail ,  the  css  can  be  offlined 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     by  invoking  offline_css ( ) .   After  offlining ,  the  base  ref  is  put . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     Implemented  in  css_killed_work_fn ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  3.  When  the  percpu_ref  reaches  zero ,  the  only  possible  remaining 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     accessors  are  inside  RCU  read  sections .   css_release ( )  schedules  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     RCU  callback . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  4.  After  the  grace  period ,  the  css  can  be  freed .   Implemented  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     css_free_work_fn ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  It  is  actually  hairier  because  both  step  2  and  4  require  process  context 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  and  thus  involve  punting  to  css - > destroy_work  adding  two  additional 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  steps  to  the  already  complex  sequence . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  css_free_work_fn ( struct  work_struct  * work )  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  = 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										container_of ( work ,  struct  cgroup_subsys_state ,  destroy_work ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: RCU protect each cgroup_subsys_state release
With the planned unified hierarchy, individual css's will be created
and destroyed dynamically across the lifetime of a cgroup.  To enable
such usages, css destruction is being decoupled from cgroup
destruction.  Most of the destruction path has been decoupled but the
actual free of css still depends on cgroup free path.
When all css refs are drained, css_release() kicks off
css_free_work_fn() which puts the cgroup.  When the cgroup refcnt
reaches zero, cgroup_diput() is invoked which in turn schedules RCU
free of the cgroup.  After a grace period, all css's are freed along
with the cgroup itself.
This patch moves the RCU grace period and css freeing from cgroup
release path to css release path.  css_release(), instead of kicking
off css_free_work_fn() directly, schedules RCU callback
css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
RCU grace period.  css_free_work_fn() is updated to free the css
directly.
The five-way punting - percpu ref kill confirmation, a work item,
percpu ref release, RCU grace period, and again a work item - is quite
hairy but the work items are there only to provide process context and
the actual sequence is kill confirm -> release -> RCU free, which
isn't simple but not too crazy.
This removes cgroup_css() usage after offline_css() allowing clearing
cgroup->subsys[] from offline_css(), which makes it consistent with
online_css() and brings it closer to proper lifetime management for
individual css's.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
											 
										 
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  css - > cgroup ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( css - > parent ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										css_put ( css - > parent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: RCU protect each cgroup_subsys_state release
With the planned unified hierarchy, individual css's will be created
and destroyed dynamically across the lifetime of a cgroup.  To enable
such usages, css destruction is being decoupled from cgroup
destruction.  Most of the destruction path has been decoupled but the
actual free of css still depends on cgroup free path.
When all css refs are drained, css_release() kicks off
css_free_work_fn() which puts the cgroup.  When the cgroup refcnt
reaches zero, cgroup_diput() is invoked which in turn schedules RCU
free of the cgroup.  After a grace period, all css's are freed along
with the cgroup itself.
This patch moves the RCU grace period and css freeing from cgroup
release path to css release path.  css_release(), instead of kicking
off css_free_work_fn() directly, schedules RCU callback
css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
RCU grace period.  css_free_work_fn() is updated to free the css
directly.
The five-way punting - percpu ref kill confirmation, a work item,
percpu ref release, RCU grace period, and again a work item - is quite
hairy but the work items are there only to provide process context and
the actual sequence is kill confirm -> release -> RCU free, which
isn't simple but not too crazy.
This removes cgroup_css() usage after offline_css() allowing clearing
cgroup->subsys[] from offline_css(), which makes it consistent with
online_css() and brings it closer to proper lifetime management for
individual css's.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
											 
										 
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css - > ss - > css_free ( css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_dput ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: RCU protect each cgroup_subsys_state release
With the planned unified hierarchy, individual css's will be created
and destroyed dynamically across the lifetime of a cgroup.  To enable
such usages, css destruction is being decoupled from cgroup
destruction.  Most of the destruction path has been decoupled but the
actual free of css still depends on cgroup free path.
When all css refs are drained, css_release() kicks off
css_free_work_fn() which puts the cgroup.  When the cgroup refcnt
reaches zero, cgroup_diput() is invoked which in turn schedules RCU
free of the cgroup.  After a grace period, all css's are freed along
with the cgroup itself.
This patch moves the RCU grace period and css freeing from cgroup
release path to css release path.  css_release(), instead of kicking
off css_free_work_fn() directly, schedules RCU callback
css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
RCU grace period.  css_free_work_fn() is updated to free the css
directly.
The five-way punting - percpu ref kill confirmation, a work item,
percpu ref release, RCU grace period, and again a work item - is quite
hairy but the work items are there only to provide process context and
the actual sequence is kill confirm -> release -> RCU free, which
isn't simple but not too crazy.
This removes cgroup_css() usage after offline_css() allowing clearing
cgroup->subsys[] from offline_css(), which makes it consistent with
online_css() and brings it closer to proper lifetime management for
individual css's.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
											 
										 
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  css_free_rcu_fn ( struct  rcu_head  * rcu_head )  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  = 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: RCU protect each cgroup_subsys_state release
With the planned unified hierarchy, individual css's will be created
and destroyed dynamically across the lifetime of a cgroup.  To enable
such usages, css destruction is being decoupled from cgroup
destruction.  Most of the destruction path has been decoupled but the
actual free of css still depends on cgroup free path.
When all css refs are drained, css_release() kicks off
css_free_work_fn() which puts the cgroup.  When the cgroup refcnt
reaches zero, cgroup_diput() is invoked which in turn schedules RCU
free of the cgroup.  After a grace period, all css's are freed along
with the cgroup itself.
This patch moves the RCU grace period and css freeing from cgroup
release path to css release path.  css_release(), instead of kicking
off css_free_work_fn() directly, schedules RCU callback
css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
RCU grace period.  css_free_work_fn() is updated to free the css
directly.
The five-way punting - percpu ref kill confirmation, a work item,
percpu ref release, RCU grace period, and again a work item - is quite
hairy but the work items are there only to provide process context and
the actual sequence is kill confirm -> release -> RCU free, which
isn't simple but not too crazy.
This removes cgroup_css() usage after offline_css() allowing clearing
cgroup->subsys[] from offline_css(), which makes it consistent with
online_css() and brings it closer to proper lifetime management for
individual css's.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
											 
										 
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										container_of ( rcu_head ,  struct  cgroup_subsys_state ,  rcu_head ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css  holds  an  extra  ref  to  @ cgrp - > dentry  which  is  put  on  the  last 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: RCU protect each cgroup_subsys_state release
With the planned unified hierarchy, individual css's will be created
and destroyed dynamically across the lifetime of a cgroup.  To enable
such usages, css destruction is being decoupled from cgroup
destruction.  Most of the destruction path has been decoupled but the
actual free of css still depends on cgroup free path.
When all css refs are drained, css_release() kicks off
css_free_work_fn() which puts the cgroup.  When the cgroup refcnt
reaches zero, cgroup_diput() is invoked which in turn schedules RCU
free of the cgroup.  After a grace period, all css's are freed along
with the cgroup itself.
This patch moves the RCU grace period and css freeing from cgroup
release path to css release path.  css_release(), instead of kicking
off css_free_work_fn() directly, schedules RCU callback
css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
RCU grace period.  css_free_work_fn() is updated to free the css
directly.
The five-way punting - percpu ref kill confirmation, a work item,
percpu ref release, RCU grace period, and again a work item - is quite
hairy but the work items are there only to provide process context and
the actual sequence is kill confirm -> release -> RCU free, which
isn't simple but not too crazy.
This removes cgroup_css() usage after offline_css() allowing clearing
cgroup->subsys[] from offline_css(), which makes it consistent with
online_css() and brings it closer to proper lifetime management for
individual css's.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
											 
										 
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  css_put ( ) .   dput ( )  requires  process  context  which  we  don ' t  have . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_WORK ( & css - > destroy_work ,  css_free_work_fn ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use a dedicated workqueue for cgroup destruction
Since be44562613851 ("cgroup: remove synchronize_rcu() from
cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
freeing is performed from a work item from that point on and a later
commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two
steps"), moves css offlining to workqueue too.
As cgroup destruction isn't depended upon for memory reclaim, the
destruction work items were put on the system_wq; unfortunately, some
controller may block in the destruction path for considerable duration
while holding cgroup_mutex.  As large part of destruction path is
synchronized through cgroup_mutex, when combined with high rate of
cgroup removals, this has potential to fill up system_wq's max_active
of 256.
Also, it turns out that memcg's css destruction path ends up queueing
and waiting for work items on system_wq through work_on_cpu().  If
such operation happens while system_wq is fully occupied by cgroup
destruction work items, work_on_cpu() can't make forward progress
because system_wq is full and other destruction work items on
system_wq can't make forward progress because the work item waiting
for work_on_cpu() is holding cgroup_mutex, leading to deadlock.
This can be fixed by queueing destruction work items on a separate
workqueue.  This patch creates a dedicated workqueue -
cgroup_destroy_wq - for this purpose.  As these work items shouldn't
have inter-dependencies and mostly serialized by cgroup_mutex anyway,
giving high concurrency level doesn't buy anything and the workqueue's
@max_active is set to 1 so that destruction work items are executed
one by one on each CPU.
Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
separate core_initcall().  In the future, we probably want to reorder
so that workqueue init happens before cgroup_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Shawn Bohrer <shawn.bohrer@gmail.com>
Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
Cc: stable@vger.kernel.org # v3.9+
											 
										 
										
											2013-11-22 17:14:39 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									queue_work ( cgroup_destroy_wq ,  & css - > destroy_work ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  css_release ( struct  percpu_ref  * ref )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  = 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										container_of ( ref ,  struct  cgroup_subsys_state ,  refcnt ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-12-17 11:13:39 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_assign_pointer ( css - > cgroup - > subsys [ css - > ss - > subsys_id ] ,  NULL ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: RCU protect each cgroup_subsys_state release
With the planned unified hierarchy, individual css's will be created
and destroyed dynamically across the lifetime of a cgroup.  To enable
such usages, css destruction is being decoupled from cgroup
destruction.  Most of the destruction path has been decoupled but the
actual free of css still depends on cgroup free path.
When all css refs are drained, css_release() kicks off
css_free_work_fn() which puts the cgroup.  When the cgroup refcnt
reaches zero, cgroup_diput() is invoked which in turn schedules RCU
free of the cgroup.  After a grace period, all css's are freed along
with the cgroup itself.
This patch moves the RCU grace period and css freeing from cgroup
release path to css release path.  css_release(), instead of kicking
off css_free_work_fn() directly, schedules RCU callback
css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
RCU grace period.  css_free_work_fn() is updated to free the css
directly.
The five-way punting - percpu ref kill confirmation, a work item,
percpu ref release, RCU grace period, and again a work item - is quite
hairy but the work items are there only to provide process context and
the actual sequence is kill confirm -> release -> RCU free, which
isn't simple but not too crazy.
This removes cgroup_css() usage after offline_css() allowing clearing
cgroup->subsys[] from offline_css(), which makes it consistent with
online_css() and brings it closer to proper lifetime management for
individual css's.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
											 
										 
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									call_rcu ( & css - > rcu_head ,  css_free_rcu_fn ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  init_css ( struct  cgroup_subsys_state  * css ,  struct  cgroup_subsys  * ss ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										     struct  cgroup  * cgrp ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css - > cgroup  =  cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:22 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css - > ss  =  ss ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									css - > flags  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cgrp - > parent ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css - > parent  =  cgroup_css ( cgrp - > parent ,  ss ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									else 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css - > flags  | =  CSS_ROOT ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.
This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).
Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.
Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.
  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
											 
										 
										
											2012-04-01 12:09:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( cgroup_css ( cgrp ,  ss ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 16:16:40 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* invoke ->css_online() on a new CSS and mark it online if successful */  
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  online_css ( struct  cgroup_subsys_state  * css )  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss  =  css - > ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  ret  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ss - > css_online ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ret  =  ss - > css_online ( css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! ret )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css - > flags  | =  CSS_ONLINE ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css - > cgroup - > nr_css + + ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										rcu_assign_pointer ( css - > cgroup - > subsys [ ss - > subsys_id ] ,  css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 16:16:40 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* if the CSS is online, invoke ->css_offline() on it and mark it offline */  
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  offline_css ( struct  cgroup_subsys_state  * css )  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss  =  css - > ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! ( css - > flags  &  CSS_ONLINE ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-03-12 15:35:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ss - > css_offline ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ss - > css_offline ( css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css - > flags  & =  ~ CSS_ONLINE ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css - > cgroup - > nr_css - - ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: RCU protect each cgroup_subsys_state release
With the planned unified hierarchy, individual css's will be created
and destroyed dynamically across the lifetime of a cgroup.  To enable
such usages, css destruction is being decoupled from cgroup
destruction.  Most of the destruction path has been decoupled but the
actual free of css still depends on cgroup free path.
When all css refs are drained, css_release() kicks off
css_free_work_fn() which puts the cgroup.  When the cgroup refcnt
reaches zero, cgroup_diput() is invoked which in turn schedules RCU
free of the cgroup.  After a grace period, all css's are freed along
with the cgroup itself.
This patch moves the RCU grace period and css freeing from cgroup
release path to css release path.  css_release(), instead of kicking
off css_free_work_fn() directly, schedules RCU callback
css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
RCU grace period.  css_free_work_fn() is updated to free the css
directly.
The five-way punting - percpu ref kill confirmation, a work item,
percpu ref release, RCU grace period, and again a work item - is quite
hairy but the work items are there only to provide process context and
the actual sequence is kill confirm -> release -> RCU free, which
isn't simple but not too crazy.
This removes cgroup_css() usage after offline_css() allowing clearing
cgroup->subsys[] from offline_css(), which makes it consistent with
online_css() and brings it closer to proper lifetime management for
individual css's.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
											 
										 
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									RCU_INIT_POINTER ( css - > cgroup - > subsys [ ss - > subsys_id ] ,  css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_create  -  create  a  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ parent :  cgroup  that  will  be  parent  of  the  new  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ dentry :  dentry  of  the  new  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ mode :  mode  to  set  on  new  inode 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Must  be  called  with  the  mutex  on  the  parent  inode  held 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  long  cgroup_create ( struct  cgroup  * parent ,  struct  dentry  * dentry ,  
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:55:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											     umode_t  mode ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css_ar [ CGROUP_SUBSYS_COUNT ]  =  {  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_name  * name ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root  =  parent - > root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  err  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  super_block  * sb  =  root - > sb ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* allocate the cgroup and its ID, 0 is reserved for the root */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp  =  kzalloc ( sizeof ( * cgrp ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cgrp ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									name  =  cgroup_alloc_name ( dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! name ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  err_free_cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_assign_pointer ( cgrp - > name ,  name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 09:50:50 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Temporarily  set  the  pointer  to  NULL ,  so  idr_find ( )  won ' t  return 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  a  half - baked  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp - > id  =  idr_alloc ( & root - > cgroup_idr ,  NULL ,  1 ,  0 ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cgrp - > id  <  0 ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  err_free_name ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:59 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Only  live  parents  can  have  children .   Note  that  the  liveliness 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  check  isn ' t  strictly  necessary  because  cgroup_mkdir ( )  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup_rmdir ( )  are  fully  synchronized  by  i_mutex ;  however ,  do  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  anyway  so  that  locking  is  contained  inside  cgroup  proper  and  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  don ' t  get  nasty  surprises  if  we  ever  grow  another  caller . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cgroup_lock_live_group ( parent ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										err  =  - ENODEV ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  err_free_id ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:59 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									/* Grab a reference on the superblock so the hierarchy doesn't
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  get  deleted  on  unmount  if  there  are  child  cgroups .   This 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  can  be  done  outside  cgroup_mutex ,  since  the  sb  can ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  disappear  while  someone  has  an  open  control  file  on  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  fs  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									atomic_inc ( & sb - > s_active ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_cgroup_housekeeping ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-01-24 14:30:22 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									dentry - > d_fsdata  =  cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp - > dentry  =  dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp - > parent  =  parent ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp - > dummy_css . parent  =  & parent - > dummy_css ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp - > root  =  parent - > root ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-03-04 14:28:19 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( notify_on_release ( parent ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										set_bit ( CGRP_NOTIFY_ON_RELEASE ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( test_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & parent - > flags ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										set_bit ( CGRP_CPUSET_CLONE_CHILDREN ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-10-27 15:33:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_root_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 12:20:58 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-02-02 13:44:10 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css  =  ss - > css_alloc ( cgroup_css ( parent ,  ss ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( IS_ERR ( css ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											err  =  PTR_ERR ( css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											goto  err_free_all ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										css_ar [ ss - > subsys_id ]  =  css ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										err  =  percpu_ref_init ( & css - > refcnt ,  css_release ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( err ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											goto  err_free_all ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										init_css ( css ,  ss ,  cgrp ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Create  directory .   cgroup_create_file ( )  returns  with  the  new 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  directory  locked  on  success  so  that  it  can  be  populated  without 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  dropping  cgroup_mutex . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									err  =  cgroup_create_file ( dentry ,  S_IFDIR  |  mode ,  sb ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									if  ( err  <  0 ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										goto  err_free_all ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									lockdep_assert_held ( & dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-18 11:14:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp - > serial_nr  =  cgroup_serial_nr_next + + ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.
It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.
This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.
This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.
Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.
v2: Typo fix as per Serge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
											 
										 
										
											2013-05-24 10:55:38 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* allocation complete, commit to creation */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add_tail_rcu ( & cgrp - > sibling ,  & cgrp - > parent - > children ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									root - > number_of_cgroups + + ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-08 14:35:02 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* hold a ref to the parent's dentry */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									dget ( parent - > dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* creation succeeded, notify subsystems */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_root_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css  =  css_ar [ ss - > subsys_id ] ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										err  =  online_css ( css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  err_destroy ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-30 17:31:23 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-12-06 15:07:32 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* each css holds a ref to the cgroup's dentry and parent css */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										dget ( dentry ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										css_get ( css - > parent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* mark it consumed for error path */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										css_ar [ ss - > subsys_id ]  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-30 17:31:23 +04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ss - > broken_hierarchy  & &  ! ss - > warned_broken_hierarchy  & & 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										    parent - > parent )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											pr_warning ( " cgroup: %s (%d) created nested cgroup for controller  \" %s \"  which has incomplete hierarchy support. Nested cgroups may change behavior in the future. \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   current - > comm ,  current - > pid ,  ss - > name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											if  ( ! strcmp ( ss - > name ,  " memory " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												pr_warning ( " cgroup:  \" memory \"  requires setting use_hierarchy to 1 on the root. \n " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											ss - > warned_broken_hierarchy  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 09:50:50 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									idr_replace ( & root - > cgroup_idr ,  cgrp ,  cgrp - > id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									err  =  cgroup_addrm_files ( cgrp ,  cgroup_base_files ,  true ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-28 16:24:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  err_destroy ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									err  =  cgroup_populate_dir ( cgrp ,  root - > subsys_mask ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  err_destroy ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgrp - > dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								err_free_all :  
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_root_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css  =  css_ar [ ss - > subsys_id ] ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( css )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											percpu_ref_cancel_init ( & css - > refcnt ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											ss - > css_free ( css ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* Release the reference count that we took on the superblock */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									deactivate_super ( sb ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 09:02:12 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								err_free_id :  
						 
					
						
							
								
									
										
										
										
											2013-07-31 09:50:50 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									idr_remove ( & root - > cgroup_idr ,  cgrp - > id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:01:56 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								err_free_name :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( rcu_dereference_raw ( cgrp - > name ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								err_free_cgrp :  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( cgrp ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  err ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								err_destroy :  
						 
					
						
							
								
									
										
										
										
											2013-12-06 15:07:32 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_root_subsys ( root ,  ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css  =  css_ar [ ss - > subsys_id ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( css )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											percpu_ref_cancel_init ( & css - > refcnt ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											ss - > css_free ( css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_destroy_locked ( cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & dentry - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  err ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2011-07-26 01:41:39 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_mkdir ( struct  inode  * dir ,  struct  dentry  * dentry ,  umode_t  mode )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * c_parent  =  dentry - > d_parent - > d_fsdata ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* the vfs holds inode->i_mutex already */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  cgroup_create ( c_parent ,  dentry ,  mode  |  S_IFDIR ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  is  called  when  the  refcnt  of  a  css  is  confirmed  to  be  killed . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  css_tryget ( )  is  now  guaranteed  to  fail . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  css_killed_work_fn ( struct  work_struct  * work )  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  = 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										container_of ( work ,  struct  cgroup_subsys_state ,  destroy_work ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp  =  css - > cgroup ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css_tryget ( )  is  guaranteed  to  fail  now .   Tell  subsystems  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  initate  destruction . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									offline_css ( css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  If  @ cgrp  is  marked  dead ,  it ' s  waiting  for  refs  of  all  css ' s  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  be  disabled  before  proceeding  to  the  second  phase  of  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  destruction .   If  we  are  the  last  one ,  kick  it  off . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ! cgrp - > nr_css  & &  cgroup_is_dead ( cgrp ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgroup_destroy_css_killed ( cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Put  the  css  refs  from  kill_css ( ) .   Each  css  holds  an  extra 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  reference  to  the  cgroup ' s  dentry  and  cgroup  removal  proceeds 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  regardless  of  css  refs .   On  the  last  put  of  each  css ,  whenever 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  that  may  be ,  the  extra  dentry  ref  is  put  so  that  dentry 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  destruction  happens  only  after  all  css ' s  are  released . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									css_put ( css ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/* css kill confirmation processing requires process context, bounce */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  css_killed_ref_fn ( struct  percpu_ref  * ref )  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  = 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										container_of ( ref ,  struct  cgroup_subsys_state ,  refcnt ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_WORK ( & css - > destroy_work ,  css_killed_work_fn ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use a dedicated workqueue for cgroup destruction
Since be44562613851 ("cgroup: remove synchronize_rcu() from
cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
freeing is performed from a work item from that point on and a later
commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two
steps"), moves css offlining to workqueue too.
As cgroup destruction isn't depended upon for memory reclaim, the
destruction work items were put on the system_wq; unfortunately, some
controller may block in the destruction path for considerable duration
while holding cgroup_mutex.  As large part of destruction path is
synchronized through cgroup_mutex, when combined with high rate of
cgroup removals, this has potential to fill up system_wq's max_active
of 256.
Also, it turns out that memcg's css destruction path ends up queueing
and waiting for work items on system_wq through work_on_cpu().  If
such operation happens while system_wq is fully occupied by cgroup
destruction work items, work_on_cpu() can't make forward progress
because system_wq is full and other destruction work items on
system_wq can't make forward progress because the work item waiting
for work_on_cpu() is holding cgroup_mutex, leading to deadlock.
This can be fixed by queueing destruction work items on a separate
workqueue.  This patch creates a dedicated workqueue -
cgroup_destroy_wq - for this purpose.  As these work items shouldn't
have inter-dependencies and mostly serialized by cgroup_mutex anyway,
giving high concurrency level doesn't buy anything and the workqueue's
@max_active is set to 1 so that destruction work items are executed
one by one on each CPU.
Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
separate core_initcall().  In the future, we probably want to reorder
so that workqueue init happens before cgroup_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Shawn Bohrer <shawn.bohrer@gmail.com>
Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
Cc: stable@vger.kernel.org # v3.9+
											 
										 
										
											2013-11-22 17:14:39 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									queue_work ( cgroup_destroy_wq ,  & css - > destroy_work ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  kill_css  -  destroy  a  css 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ css :  css  to  destroy 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  This  function  initiates  destruction  of  @ css  by  removing  cgroup  interface 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  files  and  putting  its  base  reference .   - > css_offline ( )  will  be  invoked 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  asynchronously  once  css_tryget ( )  is  guaranteed  to  fail  and  when  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  reference  count  reaches  zero ,  @ css  will  be  released . 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  kill_css ( struct  cgroup_subsys_state  * css )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_clear_dir ( css - > cgroup ,  1  < <  css - > ss - > subsys_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Killing  would  put  the  base  ref ,  but  we  need  to  keep  it  alive 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  until  after  - > css_offline ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									css_get ( css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup  core  guarantees  that ,  by  the  time  - > css_offline ( )  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  invoked ,  no  new  css  reference  will  be  given  out  via 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css_tryget ( ) .   We  can ' t  simply  call  percpu_ref_kill ( )  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  proceed  to  offlining  css ' s  because  percpu_ref_kill ( )  doesn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  guarantee  that  the  ref  is  seen  as  killed  on  all  CPUs  on  return . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Use  percpu_ref_kill_and_confirm ( )  to  get  notifications  as  each 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css  is  confirmed  to  be  seen  as  killed  on  all  CPUs . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									percpu_ref_kill_and_confirm ( & css - > refcnt ,  css_killed_ref_fn ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_destroy_locked  -  the  first  stage  of  cgroup  destruction 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ cgrp :  cgroup  to  be  destroyed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  css ' s  make  use  of  percpu  refcnts  whose  killing  latency  shouldn ' t  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  exposed  to  userland  and  are  RCU  protected .   Also ,  cgroup  core  needs  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  guarantee  that  css_tryget ( )  won ' t  succeed  by  the  time  - > css_offline ( )  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  invoked .   To  satisfy  all  the  requirements ,  destruction  is  implemented  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  following  two  steps . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  s1 .  Verify  @ cgrp  can  be  destroyed  and  mark  it  dying .   Remove  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *      userland  visible  parts  and  start  killing  the  percpu  refcnts  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *      css ' s .   Set  up  so  that  the  next  stage  will  be  kicked  off  once  all 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *      the  percpu  refcnts  are  confirmed  to  be  killed . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  s2 .  Invoke  - > css_offline ( ) ,  mark  the  cgroup  dead  and  proceed  with  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *      rest  of  destruction .   Once  all  cgroup  references  are  gone ,  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *      cgroup  is  RCU - freed . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  function  implements  s1 .   After  this  step ,  @ cgrp  is  gone  as  far  as 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  userland  is  concerned  and  a  new  cgroup  with  the  same  name  may  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  created .   As  cgroup  doesn ' t  care  about  the  names  internally ,  this 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  doesn ' t  cause  any  problem . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_destroy_locked ( struct  cgroup  * cgrp )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									__releases ( & cgroup_mutex )  __acquires ( & cgroup_mutex ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  dentry  * d  =  cgrp - > dentry ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:34 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_event  * event ,  * tmp ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-28 16:31:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup  * child ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									bool  empty ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									lockdep_assert_held ( & d - > d_inode - > i_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  css_set_lock  synchronizes  access  to  - > cset_links  and  prevents 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  @ cgrp  from  being  removed  while  __put_css_set ( )  is  in  progress . 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-28 16:31:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									empty  =  list_empty ( & cgrp - > cset_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:54 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! empty ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										return  - EBUSY ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-28 16:31:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Make  sure  there ' s  no  live  children .   We  can ' t  test  - > children 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  emptiness  as  dead  children  linger  on  it  while  being  destroyed ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  otherwise ,  " rmdir parent/child parent "  may  fail  with  - EBUSY . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									empty  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry_rcu ( child ,  & cgrp - > children ,  sibling )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										empty  =  cgroup_is_dead ( child ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! empty ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! empty ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EBUSY ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-07-29 15:04:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Initiate  massacre  of  all  css ' s .   cgroup_destroy_css_killed ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  will  be  invoked  to  perform  the  rest  of  destruction  once  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  percpu  refs  of  all  css ' s  are  confirmed  to  be  killed . 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-29 15:04:06 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-12-06 15:07:32 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_root_subsys ( cgrp - > root ,  ss )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										struct  cgroup_subsys_state  * css  =  cgroup_css ( cgrp ,  ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( css ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											kill_css ( css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:41 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Mark  @ cgrp  dead .   This  prevents  further  task  migration  and  child 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  creation  by  disabling  cgroup_lock_live_group ( ) .   Note  that 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  CGRP_DEAD  assertion  is  depended  upon  by  css_next_child ( )  to 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:41 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  resume  iteration  after  dropping  RCU  read  lock .   See 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  css_next_child ( )  for  details . 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:41 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:53 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									set_bit ( CGRP_DEAD ,  & cgrp - > flags ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:41 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* CGRP_DEAD is set, remove from ->release_list for the last time */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									raw_spin_lock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! list_empty ( & cgrp - > release_list ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del_init ( & cgrp - > release_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									raw_spin_unlock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  If  @ cgrp  has  css ' s  attached ,  the  second  stage  of  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  destruction  is  kicked  off  from  css_killed_work_fn ( )  after  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  refs  of  all  attached  css ' s  are  killed .   If  @ cgrp  doesn ' t  have 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  any  css ,  we  kick  it  off  here . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cgrp - > nr_css ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgroup_destroy_css_killed ( cgrp ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:41 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  Clear  the  base  files  and  remove  @ cgrp  directory .   The  removal 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  puts  the  base  ref  but  we  aren ' t  quite  done  with  @ cgrp  yet ,  so 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  hold  onto  it . 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:41 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_addrm_files ( cgrp ,  cgroup_base_files ,  false ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:41 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									dget ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_d_remove_dir ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Unregister  events  and  notify  userspace . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Notify  userspace  about  cgroup  removing  only  after  rmdir  of  cgroup 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  directory  to  avoid  race  between  userspace  and  kernelspace . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_lock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_for_each_entry_safe ( event ,  tmp ,  & cgrp - > event_list ,  list )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_del_init ( & event - > list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										schedule_work ( & event - > remove ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									spin_unlock ( & cgrp - > event_list_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:42 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_destroy_css_killed  -  the  second  step  of  cgroup  destruction 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ work :  cgroup - > destroy_free_work 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  function  is  invoked  from  a  work  item  for  a  cgroup  which  is  being 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  destroyed  after  all  css ' s  are  offlined  and  performs  the  rest  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  destruction .   This  is  the  second  step  of  destruction  described  in  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  comment  above  cgroup_destroy_locked ( ) . 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().
    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
											 
										 
										
											2013-06-13 19:39:16 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  cgroup_destroy_css_killed ( struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:42 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * parent  =  cgrp - > parent ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  dentry  * d  =  cgrp - > dentry ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									lockdep_assert_held ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-13 19:27:42 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:08:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* delete this cgroup from parent->children */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_del_rcu ( & cgrp - > sibling ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-05 09:16:58 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									dput ( d ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									set_bit ( CGRP_RELEASABLE ,  & parent - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									check_for_release ( parent ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_rmdir ( struct  inode  * unused_dir ,  struct  dentry  * dentry )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ret  =  cgroup_destroy_locked ( dentry - > d_fsdata ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  __init_or_module  cgroup_init_cftsets ( struct  cgroup_subsys  * ss )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & ss - > cftsets ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  base_cftset  is  embedded  in  subsys  itself ,  no  need  to  worry  about 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  deregistration . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ss - > base_cftypes )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cftype  * cft ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for  ( cft  =  ss - > base_cftypes ;  cft - > name [ 0 ]  ! =  ' \0 ' ;  cft + + ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cft - > ss  =  ss ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										ss - > base_cftset . cfts  =  ss - > base_cftypes ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_add_tail ( & ss - > base_cftset . node ,  & ss - > cftsets ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:07 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  __init  cgroup_init_subsys ( struct  cgroup_subsys  * ss )  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-11-14 16:58:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									printk ( KERN_INFO  " Initializing cgroup subsys %s \n " ,  ss - > name ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* init base cftset */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_init_cftsets ( ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									/* Create the top cgroup state for this subsystem */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_add ( & ss - > sibling ,  & cgroup_dummy_root . subsys_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ss - > root  =  & cgroup_dummy_root ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css  =  ss - > css_alloc ( cgroup_css ( cgroup_dummy_top ,  ss ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									/* We don't handle early failures gracefully */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( IS_ERR ( css ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_css ( css ,  ss ,  cgroup_dummy_top ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Update the init_css_set to contain a subsys
 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  pointer  to  this  state  -  since  the  subsystem  is 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  newly  registered ,  all  tasks  and  hence  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  init_css_set  is  in  the  subsystem ' s  top  cgroup .  */ 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_css_set . subsys [ ss - > subsys_id ]  =  css ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									need_forkexit_callback  | =  ss - > fork  | |  ss - > exit ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* At system boot, before all subsystems have been
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  registered ,  no  tasks  have  been  forked ,  so  we  don ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  need  to  invoke  fork  callbacks  here .  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! list_empty ( & init_task . tasks ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( online_css ( css ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:36 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* this function shouldn't be used with modular subsystems, since they
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  need  to  register  a  subsys_id ,  among  other  things  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ss - > module ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_load_subsys :  load  and  register  a  modular  subsystem  at  runtime 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  the  subsystem  to  load 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  function  should  be  called  in  a  modular  subsystem ' s  initcall .  If  the 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-16 11:47:56 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  subsystem  is  built  as  a  module ,  it  will  be  assigned  a  new  subsys_id  and  set 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  up  for  use .  If  the  subsystem  is  built - in  anyway ,  work  is  delegated  to  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  simpler  cgroup_init_subsys . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  __init_or_module  cgroup_load_subsys ( struct  cgroup_subsys  * ss )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ,  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
        list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
        hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2013-02-27 17:06:00 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  hlist_node  * tmp ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* check name and function validity */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ss - > name  = =  NULL  | |  strlen ( ss - > name )  >  MAX_CGROUP_TYPE_NAMELEN  | | 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									    ss - > css_alloc  = =  NULL  | |  ss - > css_free  = =  NULL ) 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we  don ' t  support  callbacks  in  modular  subsystems .  this  check  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  before  the  ss - > module  check  for  consistency ;  a  subsystem  that  could 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  be  a  module  should  still  have  no  callbacks  even  if  the  user  isn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  compiling  it  as  one . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ss - > fork  | |  ss - > exit ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  - EINVAL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  an  optionally  modular  subsystem  is  built - in :  we  want  to  do  nothing , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  since  cgroup_init_subsys  will  have  already  taken  care  of  it . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ss - > module  = =  NULL )  { 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* a sanity check */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( cgroup_subsys [ ss - > subsys_id ]  ! =  ss ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* init base cftset */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_init_cftsets ( ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_subsys [ ss - > subsys_id ]  =  ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  no  ss - > css_alloc  seems  to  need  anything  important  in  the  ss 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  struct ,  so  this  can  happen  first  ( i . e .  before  the  dummy  root 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  attachment ) . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css  =  ss - > css_alloc ( cgroup_css ( cgroup_dummy_top ,  ss ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( IS_ERR ( css ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* failure case - need to deassign the cgroup_subsys[] slot. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cgroup_subsys [ ss - > subsys_id ]  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  PTR_ERR ( css ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_add ( & ss - > sibling ,  & cgroup_dummy_root . subsys_list ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									ss - > root  =  & cgroup_dummy_root ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* our new subsystem will be attached to the dummy hierarchy. */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_css ( css ,  ss ,  cgroup_dummy_top ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Now  we  need  to  entangle  the  css  into  the  existing  css_sets .  unlike 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  in  cgroup_init_subsys ,  there  are  now  multiple  css_sets ,  so  each  one 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  will  need  a  new  pointer  to  it ;  done  by  iterating  the  css_set_table . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  furthermore ,  modifying  the  existing  css_sets  will  corrupt  the  hash 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  table  state ,  so  each  changed  css_set  will  need  its  hash  recomputed . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  this  is  all  done  under  the  css_set_lock . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									hash_for_each_safe ( css_set_table ,  i ,  tmp ,  cset ,  hlist )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* skip entries that we already rehashed */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( cset - > subsys [ ss - > subsys_id ] ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* remove existing entry */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										hash_del ( & cset - > hlist ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* set new value */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cset - > subsys [ ss - > subsys_id ]  =  css ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/* recompute hash and restore entry */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										key  =  css_set_hash ( cset - > subsys ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										hash_add ( css_set_table ,  & cset - > hlist ,  key ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 20:22:50 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ret  =  online_css ( css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( ret ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  err_unload ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-09 09:12:29 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* success! */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								err_unload :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/* @ss can't be mounted here as try_module_get() would fail */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_unload_subsys ( ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ret ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_load_subsys ) ;  
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_unload_subsys :  unload  a  modular  subsystem 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  the  subsystem  to  unload 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  function  should  be  called  in  a  modular  subsystem ' s  exitcall .  When  this 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  function  is  invoked ,  the  refcount  on  the  subsystem ' s  module  will  be  0 ,  so 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the  subsystem  will  not  be  attached  to  any  hierarchy . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								void  cgroup_unload_subsys ( struct  cgroup_subsys  * ss )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ss - > module  = =  NULL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  we  shouldn ' t  be  called  if  the  subsystem  is  in  use ,  and  the  use  of 
							 
						 
					
						
							
								
									
										
										
										
											2013-07-12 13:38:17 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  try_module_get ( )  in  rebind_subsystems ( )  should  ensure  that  it 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  doesn ' t  start  being  used  while  we ' re  killing  it  off . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( ss - > root  ! =  & cgroup_dummy_root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									offline_css ( cgroup_css ( cgroup_dummy_top ,  ss ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:37 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* deassign the subsys_id */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_subsys [ ss - > subsys_id ]  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* remove subsystem from the dummy root's list of subsystems */ 
							 
						 
					
						
							
								
									
										
										
										
											2011-03-22 16:30:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_del_init ( & ss - > sibling ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  disentangle  the  css  from  all  css_sets  attached  to  the  dummy 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  top .  as  in  loading ,  we  need  to  pay  our  respects  to  the  hashtable 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  gods . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry ( link ,  & cgroup_dummy_top - > cset_links ,  cset_link )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  css_set  * cset  =  link - > cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										unsigned  long  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										hash_del ( & cset - > hlist ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										cset - > subsys [ ss - > subsys_id ]  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										key  =  css_set_hash ( cset - > subsys ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										hash_add ( css_set_table ,  & cset - > hlist ,  key ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  remove  subsystem ' s  css  from  the  cgroup_dummy_top  and  free  it  - 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  need  to  free  before  marking  as  null  because  ss - > css_free  needs 
							 
						 
					
						
							
								
									
										
										
										
											2013-09-23 16:57:03 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 *  the  cgrp - > subsys  pointer  to  find  their  state . 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									ss - > css_free ( cgroup_css ( cgroup_dummy_top ,  ss ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:55 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									RCU_INIT_POINTER ( cgroup_dummy_top - > subsys [ ss - > subsys_id ] ,  NULL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-03-10 15:22:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								EXPORT_SYMBOL_GPL ( cgroup_unload_subsys ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_init_early  -  cgroup  initialization  at  system  boot 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Initialize  cgroups  at  system  boot ,  and  initialize  any 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  subsystems  that  request  early  init . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  __init  cgroup_init_early ( void )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-10-18 20:28:03 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									atomic_set ( & init_css_set . refcount ,  1 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & init_css_set . cgrp_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & init_css_set . tasks ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:11 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_HLIST_NODE ( & init_css_set . hlist ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									css_set_count  =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_cgroup_root ( & cgroup_dummy_root ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_root_count  =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									RCU_INIT_POINTER ( init_task . cgroups ,  & init_css_set ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_cgrp_cset_link . cset  =  & init_css_set ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:47 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									init_cgrp_cset_link . cgrp  =  cgroup_dummy_top ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									list_add ( & init_cgrp_cset_link . cset_link ,  & cgroup_dummy_top - > cset_links ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_add ( & init_cgrp_cset_link . cgrp_link ,  & init_css_set . cgrp_links ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* at bootup time, we don't worry about modular subsystems */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for_each_builtin_subsys ( ss ,  i )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										BUG_ON ( ! ss - > name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( strlen ( ss - > name )  >  MAX_CGROUP_TYPE_NAMELEN ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										BUG_ON ( ! ss - > css_alloc ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										BUG_ON ( ! ss - > css_free ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( ss - > subsys_id  ! =  i )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-11-14 16:58:54 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											printk ( KERN_ERR  " cgroup: Subsys %s id == %d \n " , 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
											       ss - > name ,  ss - > subsys_id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											BUG ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ss - > early_init ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cgroup_init_subsys ( ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_init  -  cgroup  initialization 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Register  cgroup  filesystem  and  / proc  file ,  and  initialize 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  any  subsystems  that  didn ' t  request  early  init . 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								int  __init  cgroup_init ( void )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-01-10 11:49:27 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									unsigned  long  key ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ,  err ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									err  =  bdi_init ( & cgroup_backing_dev_info ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  err ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_builtin_subsys ( ss ,  i )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										if  ( ! ss - > early_init ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											cgroup_init_subsys ( ss ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* allocate id for the dummy hierarchy */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Add init_css_set to the hash table */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									key  =  css_set_hash ( init_css_set . subsys ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									hash_add ( css_set_table ,  & init_css_set . hlist ,  key ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									BUG_ON ( cgroup_init_root_id ( & cgroup_dummy_root ,  0 ,  1 ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-08-05 13:53:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-07-31 09:50:50 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									err  =  idr_alloc ( & cgroup_dummy_root . cgroup_idr ,  cgroup_dummy_top , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											0 ,  1 ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( err  <  0 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-04-14 11:36:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_root_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2010-08-05 13:53:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgroup_kobj  =  kobject_create_and_add ( " cgroup " ,  fs_kobj ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! cgroup_kobj )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										err  =  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									err  =  register_filesystem ( & cgroup_fs_type ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-08-05 13:53:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( err  <  0 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kobject_put ( cgroup_kobj ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
									
										
										
										
											2010-08-05 13:53:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-29 01:00:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									proc_create ( " cgroups " ,  0 ,  NULL ,  & proc_cgroupstats_operations ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								out :  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( err ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										bdi_destroy ( & cgroup_backing_dev_info ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2007-10-18 23:39:30 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
									return  err ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroup: use a dedicated workqueue for cgroup destruction
Since be44562613851 ("cgroup: remove synchronize_rcu() from
cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
freeing is performed from a work item from that point on and a later
commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two
steps"), moves css offlining to workqueue too.
As cgroup destruction isn't depended upon for memory reclaim, the
destruction work items were put on the system_wq; unfortunately, some
controller may block in the destruction path for considerable duration
while holding cgroup_mutex.  As large part of destruction path is
synchronized through cgroup_mutex, when combined with high rate of
cgroup removals, this has potential to fill up system_wq's max_active
of 256.
Also, it turns out that memcg's css destruction path ends up queueing
and waiting for work items on system_wq through work_on_cpu().  If
such operation happens while system_wq is fully occupied by cgroup
destruction work items, work_on_cpu() can't make forward progress
because system_wq is full and other destruction work items on
system_wq can't make forward progress because the work item waiting
for work_on_cpu() is holding cgroup_mutex, leading to deadlock.
This can be fixed by queueing destruction work items on a separate
workqueue.  This patch creates a dedicated workqueue -
cgroup_destroy_wq - for this purpose.  As these work items shouldn't
have inter-dependencies and mostly serialized by cgroup_mutex anyway,
giving high concurrency level doesn't buy anything and the workqueue's
@max_active is set to 1 so that destruction work items are executed
one by one on each CPU.
Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
separate core_initcall().  In the future, we probably want to reorder
so that workqueue init happens before cgroup_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Shawn Bohrer <shawn.bohrer@gmail.com>
Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
Cc: stable@vger.kernel.org # v3.9+
											 
										 
										
											2013-11-22 17:14:39 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  __init  cgroup_wq_init ( void )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  There  isn ' t  much  point  in  executing  destruction  path  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  parallel .   Good  chunk  is  serialized  with  cgroup_mutex  anyway . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Use  1  for  @ max_active . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  We  would  prefer  to  do  this  in  cgroup_init ( )  above ,  but  that 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  is  called  before  init_workqueues ( ) :  so  leave  this  until  after . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgroup_destroy_wq  =  alloc_workqueue ( " cgroup_destroy " ,  0 ,  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( ! cgroup_destroy_wq ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								core_initcall ( cgroup_wq_init ) ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  proc_cgroup_show ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   -  Print  task ' s  cgroup  paths  into  seq_file ,  one  line  for  each  hierarchy 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   -  Used  for  / proc / < pid > / cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *   -  No  need  to  task_lock ( tsk )  on  this  tsk - > cgroup  reference ,  as  it 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     doesn ' t  really  matter  if  tsk - > cgroup  changes  after  we  read  it , 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:43 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *     and  we  take  cgroup_mutex ,  keeping  cgroup_attach_task ( )  from  changing  it 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *     anyway .   No  need  to  check  that  tsk - > cgroup  ! =  NULL ,  thanks  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     the_top_cgroup_hack  in  cgroup_exit ( ) ,  which  sets  an  exiting  tasks 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     cgroup  to  top_cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* TODO: Use a proper seq_file iterator */  
						 
					
						
							
								
									
										
										
										
											2013-04-19 23:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								int  proc_cgroup_show ( struct  seq_file  * m ,  void  * v )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  pid  * pid ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  task_struct  * tsk ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									char  * buf ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									int  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroupfs_root  * root ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  - ENOMEM ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									buf  =  kmalloc ( PAGE_SIZE ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! buf ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  - ESRCH ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									pid  =  m - > private ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									tsk  =  get_pid_task ( pid ,  PIDTYPE_PID ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! tsk ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										goto  out_free ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									retval  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-01-07 18:07:41 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									for_each_active_root ( root )  { 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										int  count  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_printf ( m ,  " %d: " ,  root - > hierarchy_id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-24 15:21:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										for_each_root_subsys ( root ,  ss ) 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											seq_printf ( m ,  " %s%s " ,  count + +  ?  " , "  :  " " ,  ss - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:19 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( strlen ( root - > name ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											seq_printf ( m ,  " %sname=%s " ,  count  ?  " , "  :  " " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												   root - > name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_putc ( m ,  ' : ' ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										cgrp  =  task_cgroup_from_root ( tsk ,  root ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										retval  =  cgroup_path ( cgrp ,  buf ,  PAGE_SIZE ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( retval  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  out_unlock ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_puts ( m ,  buf ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										seq_putc ( m ,  ' \n ' ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out_unlock :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									put_task_struct ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out_free :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									kfree ( buf ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								out :  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  retval ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/* Display information about each subsystem and each hierarchy */  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  proc_cgroupstats_show ( struct  seq_file  * m ,  void  * v )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									seq_puts ( m ,  " #subsys_name \t hierarchy \t num_cgroups \t enabled \n " ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												cgroups: revamp subsys array
This patch series provides the ability for cgroup subsystems to be
compiled as modules both within and outside the kernel tree.  This is
mainly useful for classifiers and subsystems that hook into components
that are already modules.  cls_cgroup and blkio-cgroup serve as the
example use cases for this feature.
It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
which modular subsystems can use to register and depart during runtime.
The net_cls classifier subsystem serves as the example for a subsystem
which can be converted into a module using these changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear.  Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section of
the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which includes
a more advanced rework of the rudimentary reference counting introduced in
patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves.  In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy).  The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys was
the only one in the dummy cgroup (and therefore all css_sets should be
unlinked from it) - however, as all css_sets always stay attached to the
dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
issue should also make sure these cases are addressed in the subsystem
loading and unloading code.
This patch:
Make subsys[] able to be dynamically populated to support modular
subsystems
This patch reworks the way the subsys[] array is used so that subsystems
can register themselves after boot time, and enables the internals of
cgroups to be able to handle when subsystems are not present or may
appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2010-03-10 15:22:07 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  ideally  we  don ' t  want  subsystems  moving  around  while  we  do  this . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup_mutex  is  also  necessary  to  guarantee  an  atomic  snapshot  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  subsys / hierarchy  state . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									for_each_subsys ( ss ,  i ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_printf ( m ,  " %s \t %d \t %d \t %d \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   ss - > name ,  ss - > root - > hierarchy_id , 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											   ss - > root - > number_of_cgroups ,  ! ss - > disabled ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  cgroupstats_open ( struct  inode  * inode ,  struct  file  * file )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2008-03-29 03:07:28 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  single_open ( file ,  proc_cgroupstats_show ,  NULL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-10-01 15:43:56 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  const  struct  file_operations  proc_cgroupstats_operations  =  {  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:35 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. open  =  cgroupstats_open , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. read  =  seq_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. llseek  =  seq_lseek , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. release  =  single_release , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_fork  -  attach  newly  forked  task  to  its  parents  cgroup . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ child :  pointer  to  task_struct  of  forking  parent  process . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Description :  A  task  inherits  its  parent ' s  cgroup  at  fork ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  A  pointer  to  the  shared  css_set  was  automatically  copied  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  fork . c  by  dup_task_struct ( ) .   However ,  we  ignore  that  copy ,  since 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-18 17:52:07 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  it  was  not  made  under  the  protection  of  RCU  or  cgroup_mutex ,  so 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  might  no  longer  be  a  valid  cgroup  pointer .   cgroup_attach_task ( )  might 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  have  already  changed  current - > cgroups ,  allowing  the  previously 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  referenced  cgroup  group  to  be  removed  and  freed . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  At  the  point  that  cgroup_fork ( )  is  called ,  ' current '  is  the  parent 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  task ,  and  the  passed  argument  ' child '  points  to  the  child  task . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								void  cgroup_fork ( struct  task_struct  * child )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2012-10-18 17:52:07 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									task_lock ( current ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									get_css_set ( task_css_set ( current ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									child - > cgroups  =  current - > cgroups ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-18 17:52:07 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									task_unlock ( current ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									INIT_LIST_HEAD ( & child - > cg_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_post_fork  -  called  on  a  new  task  after  adding  it  to  the  task  list 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ child :  the  task  in  question 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-16 15:03:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Adds  the  task  to  the  list  running  through  its  css_set  if  necessary  and 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  call  the  subsystem  fork ( )  callbacks .   Has  to  be  after  the  task  is 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  visible  on  the  task  list  in  case  we  race  with  the  first  call  to 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:26 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  cgroup_task_iter_start ( )  -  to  guarantee  that  the  new  task  ends  up  on  its 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-16 15:03:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  list . 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								void  cgroup_post_fork ( struct  task_struct  * child )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-16 15:03:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-02-08 03:37:27 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  use_task_css_set_links  is  set  to  1  before  we  walk  the  tasklist 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  under  the  tasklist_lock  and  we  read  it  here  after  we  added  the  child 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  to  the  tasklist  under  the  tasklist_lock  as  well .  If  the  child  wasn ' t 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  yet  in  the  tasklist  when  we  walked  through  it  from 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  cgroup_enable_task_cg_lists ( ) ,  then  use_task_css_set_links  value 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  should  be  visible  now  due  to  the  paired  locking  and  barriers  implied 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  by  LOCK / UNLOCK :  it  is  written  before  the  tasklist_lock  unlock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  in  cgroup_enable_task_cg_lists ( )  and  read  here  after  the  tasklist_lock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  lock  on  fork . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( use_task_css_set_links )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-18 17:40:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										task_lock ( child ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( list_empty ( & child - > cg_list ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											list_add ( & child - > cg_list ,  & task_css_set ( child ) - > tasks ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-18 17:40:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										task_unlock ( child ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-16 15:03:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Call  ss - > fork ( ) .   This  must  happen  after  @ child  is  linked  on 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css_set ;  otherwise ,  @ child  might  change  state  between  - > fork ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  and  addition  to  css_set . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( need_forkexit_callback )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-05 10:57:03 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  fork / exit  callbacks  are  supported  only  for  builtin 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  subsystems ,  and  the  builtin  section  of  the  subsys 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  array  is  immutable ,  so  we  don ' t  need  to  lock  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  subsys  array  here .  On  the  other  hand ,  modular  section 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  of  the  array  can  be  freed  at  module  unload ,  so  we 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  can ' t  touch  that . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										for_each_builtin_subsys ( ss ,  i ) 
							 
						 
					
						
							
								
									
										
										
										
											2012-10-16 15:03:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ss - > fork ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												ss - > fork ( child ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2012-10-16 15:03:14 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  cgroup_exit  -  detach  cgroup  from  exiting  task 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ tsk :  pointer  to  task_struct  of  exiting  process 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-23 15:24:09 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  @ run_callback :  run  exit  callbacks ? 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Description :  Detach  cgroup  from  @ tsk  and  release  it . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Note  that  cgroups  marked  notify_on_release  force  every  task  in 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  them  to  take  the  global  cgroup_mutex  mutex  when  exiting . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  could  impact  scaling  on  very  large  systems .   Be  reluctant  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  use  notify_on_release  cgroups  where  very  high  task  exit  scaling 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  is  required  on  large  systems . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  the_top_cgroup_hack : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     Set  the  exiting  tasks  cgroup  to  the  root  cgroup  ( top_cgroup ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     We  call  cgroup_exit ( )  while  the  task  is  still  competent  to 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     handle  notify_on_release ( ) ,  then  leave  the  task  attached  to  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     root  cgroup  in  each  hierarchy  for  the  remainder  of  its  exit . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     To  do  this  properly ,  we  would  increment  the  reference  count  on 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     top_cgroup ,  and  near  the  very  end  of  the  kernel / exit . c  do_exit ( ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     code  we  would  add  a  second  cgroup  function  call ,  to  drop  that 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     reference .   This  would  just  create  an  unnecessary  hot  spot  on 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     the  top_cgroup  reference  count ,  to  no  avail . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     Normally ,  holding  a  reference  to  a  cgroup  without  bumping  its 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     count  is  unsafe .    The  cgroup  could  go  away ,  or  someone  could 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     attach  us  to  a  different  cgroup ,  decrementing  the  count  on 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     the  first  cgroup  that  we  never  incremented .   But  in  this  case , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     top_cgroup  isn ' t  going  away ,  and  either  task  has  PF_EXITING  set , 
							 
						 
					
						
							
								
									
										
										
										
											2008-02-07 00:14:43 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *     which  wards  off  any  cgroup_attach_task ( )  attempts ,  or  task  is  a  failed 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     fork ,  never  visible  to  cgroup_attach_task . 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								void  cgroup_exit ( struct  task_struct  * tsk ,  int  run_callbacks )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Unlink  from  the  css_set  task  list  if  necessary . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  Optimistically  check  cg_list  before  taking 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 *  css_set_lock 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! list_empty ( & tsk - > cg_list ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										write_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! list_empty ( & tsk - > cg_list ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-03-22 16:30:13 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											list_del_init ( & tsk - > cg_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:36 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										write_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* Reassign the task to the init_css_set. */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									task_lock ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cset  =  task_css_set ( tsk ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									RCU_INIT_POINTER ( tsk - > cgroups ,  & init_css_set ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( run_callbacks  & &  need_forkexit_callback )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-05 10:57:03 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  fork / exit  callbacks  are  supported  only  for  builtin 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  subsystems ,  see  cgroup_post_fork ( )  for  details . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										for_each_builtin_subsys ( ss ,  i )  { 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ss - > exit )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												struct  cgroup_subsys_state  * old_css  =  cset - > subsys [ i ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												struct  cgroup_subsys_state  * css  =  task_css ( tsk ,  i ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
												ss - > exit ( css ,  old_css ,  tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									task_unlock ( tsk ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-07 17:02:20 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									put_css_set_taskexit ( cset ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:34 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  check_for_release ( struct  cgroup  * cgrp )  
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:06:07 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									if  ( cgroup_is_releasable ( cgrp )  & & 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									    list_empty ( & cgrp - > cset_links )  & &  list_empty ( & cgrp - > children ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:06:07 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  Control  Group  is  currently  removeable .  If  it ' s  not 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 *  already  queued  for  a  userspace  notification ,  queue 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:06:07 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										 *  it  now 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										int  need_schedule_work  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-03-01 15:06:07 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										raw_spin_lock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:53 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! cgroup_is_dead ( cgrp )  & & 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										    list_empty ( & cgrp - > release_list ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											list_add ( & cgrp - > release_list ,  & release_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											need_schedule_work  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										raw_spin_unlock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( need_schedule_work ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											schedule_work ( & release_agent_work ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Notify  userspace  when  a  cgroup  is  released ,  by  running  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  configured  release  agent  with  the  name  of  the  cgroup  ( path 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  relative  to  the  root  of  cgroup  file  system )  as  the  argument . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Most  likely ,  this  user  command  will  try  to  rmdir  this  cgroup . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  This  races  with  the  possibility  that  some  other  task  will  be 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  attached  to  this  cgroup  before  it  is  removed ,  or  that  some  other 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  user  task  will  ' mkdir '  a  child  cgroup  of  this  cgroup .   That ' s  ok . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  presumed  ' rmdir '  will  fail  quietly  if  this  cgroup  is  no  longer 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  unused ,  and  this  cgroup  will  be  reprieved  from  its  death  sentence , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  to  continue  to  serve  a  useful  existence .   Next  time  it ' s  released , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  we  will  get  notified  again ,  if  it  still  has  ' notify_on_release '  set . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  The  final  arg  to  call_usermodehelper ( )  is  UMH_WAIT_EXEC ,  which 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  means  only  wait  until  the  task  is  successfully  execve ( ) ' d .   The 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  separate  release  agent  task  is  forked  by  call_usermodehelper ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  then  control  in  this  thread  returns  here ,  without  waiting  for  the 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  release  agent  task .   We  don ' t  bother  to  wait  because  the  caller  of 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  this  routine  has  no  use  for  the  exit  status  of  the  release  agent 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  task ,  so  no  sense  holding  our  caller  up  for  that . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  void  cgroup_release_agent ( struct  work_struct  * work )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									BUG_ON ( work  ! =  & release_agent_work ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									raw_spin_lock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									while  ( ! list_empty ( & release_list ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										char  * argv [ 3 ] ,  * envp [ 3 ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										char  * pathbuf  =  NULL ,  * agentbuf  =  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * cgrp  =  list_entry ( release_list . next , 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
														    struct  cgroup , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
														    release_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:40:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										list_del_init ( & cgrp - > release_list ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										raw_spin_unlock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										pathbuf  =  kmalloc ( PAGE_SIZE ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										if  ( ! pathbuf ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  continue_free ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( cgroup_path ( cgrp ,  pathbuf ,  PAGE_SIZE )  <  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  continue_free ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										agentbuf  =  kstrdup ( cgrp - > root - > release_agent_path ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! agentbuf ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											goto  continue_free ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										i  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										argv [ i + + ]  =  agentbuf ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										argv [ i + + ]  =  pathbuf ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										argv [ i ]  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										i  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* minimal command environment */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										envp [ i + + ]  =  " HOME=/ " ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										envp [ i + + ]  =  " PATH=/sbin:/bin:/usr/sbin:/usr/bin " ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										envp [ i ]  =  NULL ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										/* Drop the lock while we invoke the usermode helper,
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  since  the  exec  could  involve  hitting  disk  and  hence 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  be  a  slow  process  */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										call_usermodehelper ( argv [ 0 ] ,  argv ,  envp ,  UMH_WAIT_EXEC ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										mutex_lock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-07-25 01:46:59 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 continue_free : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( pathbuf ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										kfree ( agentbuf ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										raw_spin_lock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
									
										
										
										
											2009-07-25 16:47:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									raw_spin_unlock ( & release_list_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2007-10-18 23:39:38 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									mutex_unlock ( & cgroup_mutex ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  int  __init  cgroup_disable ( char  * str )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgroup_subsys  * ss ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									char  * token ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									int  i ; 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									while  ( ( token  =  strsep ( & str ,  " , " ) )  ! =  NULL )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( ! * token ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											continue ; 
							 
						 
					
						
							
								
									
										
										
										
											2012-09-13 09:50:55 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-06-25 11:53:37 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										/*
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  cgroup_disable ,  being  at  boot  time ,  can ' t  know  about 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 *  module  subsystems ,  so  we  don ' t  worry  about  them . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										for_each_builtin_subsys ( ss ,  i )  { 
							 
						 
					
						
							
								
									
										
										
										
											2008-04-04 14:29:57 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( ! strcmp ( token ,  ss - > name ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												ss - > disabled  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												printk ( KERN_INFO  " Disabling %s control group " 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													"  subsystem \n " ,  ss - > name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								__setup ( " cgroup_disable= " ,  cgroup_disable ) ;  
						 
					
						
							
								
									
										
											 
										
											
												cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.
	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)
Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -> rest a while(release a lock) -> conitunue from interrupted
	  memcg's hierarchical reclaim does this.
	- When subsys->use_id is set, # of css in the system is limited to
	  65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											 
										 
										
											2009-04-02 16:57:25 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  css_from_dir  -  get  corresponding  css  from  the  dentry  of  a  cgroup  dir 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ dentry :  directory  dentry  of  interest 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  subsystem  of  interest 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Must  be  called  under  RCU  read  lock .   The  caller  is  responsible  for 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  pinning  the  returned  css  if  it  needs  to  be  accessed  outside  the  RCU 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  critical  section . 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-14 11:20:01 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								struct  cgroup_subsys_state  * css_from_dir ( struct  dentry  * dentry ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													 struct  cgroup_subsys  * ss ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-14 11:20:01 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-13 11:01:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									WARN_ON_ONCE ( ! rcu_read_lock_held ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									/* is @dentry a cgroup dir? */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! dentry - > d_inode  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									    dentry - > d_inode - > i_op  ! =  & cgroup_dir_inode_operations ) 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-14 11:20:01 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  ERR_PTR ( - EBADF ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cgrp  =  __d_cgrp ( dentry ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-26 18:40:56 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  cgroup_css ( cgrp ,  ss )  ? :  ERR_PTR ( - ENOENT ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-14 11:20:01 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-19 10:05:24 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/**
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  css_from_id  -  lookup  css  by  id 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ id :  the  cgroup  id 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  @ ss :  cgroup  subsys  to  be  looked  into 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Returns  the  css  if  there ' s  valid  one  with  @ id ,  otherwise  returns  NULL . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Should  be  called  under  rcu_read_lock ( ) . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_subsys_state  * css_from_id ( int  id ,  struct  cgroup_subsys  * ss )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup  * cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_lockdep_assert ( rcu_read_lock_held ( )  | | 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   lockdep_is_held ( & cgroup_mutex ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   " css_from_id() needs proper protection " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									cgrp  =  idr_find ( & ss - > root - > cgroup_idr ,  id ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( cgrp ) 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-27 14:27:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										return  cgroup_css ( cgrp ,  ss ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-19 10:05:24 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  NULL ; 
							 
						 
					
						
							
								
									
										
										
										
											2011-02-14 11:20:01 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# ifdef CONFIG_CGROUP_DEBUG 
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  struct  cgroup_subsys_state  *  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								debug_css_alloc ( struct  cgroup_subsys_state  * parent_css )  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									struct  cgroup_subsys_state  * css  =  kzalloc ( sizeof ( * css ) ,  GFP_KERNEL ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									if  ( ! css ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										return  ERR_PTR ( - ENOMEM ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  css ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  void  debug_css_free ( struct  cgroup_subsys_state  * css )  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									kfree ( css ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  u64  debug_taskcount_read ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												struct  cftype  * cft ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  cgroup_task_count ( css - > cgroup ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  u64  current_css_set_read ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												struct  cftype  * cft ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  ( u64 ) ( unsigned  long ) current - > cgroups ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  u64  current_css_set_refcount_read ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
									
										
										
										
											2013-06-14 11:17:19 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													 struct  cftype  * cft ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									u64  count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-21 15:52:04 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									count  =  atomic_read ( & task_css_set ( current ) - > refcount ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  current_css_set_cg_links_read ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
													 struct  cftype  * cft , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													 struct  seq_file  * seq ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  css_set  * cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_lock ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									cset  =  rcu_dereference ( current - > cgroups ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry ( link ,  & cset - > cgrp_links ,  cgrp_link )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  cgroup  * c  =  link - > cgrp ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										const  char  * name ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										if  ( c - > dentry ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											name  =  c - > dentry - > d_name . name ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											name  =  " ? " ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:23 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_printf ( seq ,  " Root %d group %s \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											   c - > root - > hierarchy_id ,  name ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									rcu_read_unlock ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define MAX_TASKS_SHOWN_PER_CSS 25 
  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  int  cgroup_css_links_read ( struct  cgroup_subsys_state  * css ,  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												 struct  cftype  * cft ,  struct  seq_file  * seq ) 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									struct  cgrp_cset_link  * link ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_lock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									list_for_each_entry ( link ,  & css - > cgroup - > cset_links ,  cset_link )  { 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  css_set  * cset  =  link - > cset ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										struct  task_struct  * task ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										int  count  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2013-06-12 21:04:49 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
										seq_printf ( seq ,  " css_set %p \n " ,  cset ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										list_for_each_entry ( task ,  & cset - > tasks ,  cg_list )  { 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
											if  ( count + +  >  MAX_TASKS_SHOWN_PER_CSS )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												seq_puts ( seq ,  "   ... \n " ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											}  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
												seq_printf ( seq ,  "   task %d \n " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
													   task_pid_vnr ( task ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
											} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									read_unlock ( & css_set_lock ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									return  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  u64  releasable_read ( struct  cgroup_subsys_state  * css ,  struct  cftype  * cft )  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2013-08-08 20:11:24 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									return  test_bit ( CGRP_RELEASABLE ,  & css - > cgroup - > flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  struct  cftype  debug_files [ ]  =   {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " taskcount " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  debug_taskcount_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " current_css_set " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  current_css_set_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " current_css_set_refcount " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  current_css_set_refcount_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " current_css_set_cg_links " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_seq_string  =  current_css_set_cg_links_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " cgroup_css_links " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_seq_string  =  cgroup_css_links_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. name  =  " releasable " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
										. read_u64  =  releasable_read , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									} , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									{  } 	/* terminate */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  cgroup_subsys  debug_subsys  =  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. name  =  " debug " , 
							 
						 
					
						
							
								
									
										
										
										
											2012-11-19 08:13:38 -08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. css_alloc  =  debug_css_alloc , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
									. css_free  =  debug_css_free , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. subsys_id  =  debug_subsys_id , 
							 
						 
					
						
							
								
									
										
										
										
											2012-04-01 12:09:55 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
									. base_cftypes  =  debug_files , 
							 
						 
					
						
							
								
									
										
										
										
											2009-09-23 15:56:20 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# endif  /* CONFIG_CGROUP_DEBUG */